Mozilla's effort to crowdsource datasets for voice recognition applications such as digital assistants has expanded to include 3 more languages, and soon many others:
Mozilla launched the first fruits of its Common Voice datasets in English back in November, a collection that contained some 500 hours of speech and constituted 400,000 recordings from 20,000 individuals. Today, Mozilla officially kick starts the process of collecting voice data for three more languages — French, German, and — a little randomly — Welsh. Another 40 tongues are currently being prepped for the data collection process, with the likes of Brazilian Portuguese, Chinese (Taiwan), Indonesian, Polish, and Dutch already halfway toward being ready to start crowdsourcing voice data.
[...] "We believe these interfaces shouldn't be controlled by a few companies as gatekeepers to voice-enabled services, and we want users to be understood consistently, in their own languages and accents," said Mozilla's chief innovation officer, Katharina Borchert, in a blog post.
The Common Voice project serves a purpose similar to that of other open-license projects that have emerged to counter privately owned platforms. OpenStreetMap is a good example of a similarly crowdsourced project that gives developers open and freely usable maps of the world, without the costs or restrictions of rival services such as Google Maps.
(Score: 2) by Phoenix666 on Friday June 08 2018, @04:56AM (1 child)
Germans preen about their long word, "Donaudampfschifffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft," too, but it's not a big deal when you know how to chunk it: "Donau dampfschifffahrts elektrizitäten hauptbetriebswerkbau unterbeamten gesellschaft."
Washington DC delenda est.
(Score: 2) by kazzie on Friday June 08 2018, @11:31AM
Let me help you chunk it then:
Llanfair-pwll-gwyn-gyll-go-ger-y-chwyrn-drobwll-llantysilio-gogo-goch.
Or if you want it split per syllable:
Llan-fair-pwll-gwyn-gyll-go-ger-y-chwyrn-dro-bwll-llan-ty-silio-go-go-goch.
(You may already know that it is a contrived lengthening of the original name Llanfairpwllgwyngyll to attract tourists on the new-fangled railway. Other non-contrived placenames include Llanfairmathafarneithaf, Abergynolwyn, and Llanfairtalhaearn)