Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Friday June 08 2018, @01:08AM   Printer-friendly
from the dialects++ dept.

Mozilla's effort to crowdsource datasets for voice recognition applications such as digital assistants has expanded to include 3 more languages, and soon many others:

Mozilla launched the first fruits of its Common Voice datasets in English back in November, a collection that contained some 500 hours of speech and constituted 400,000 recordings from 20,000 individuals. Today, Mozilla officially kick starts the process of collecting voice data for three more languages — French, German, and — a little randomly — Welsh. Another 40 tongues are currently being prepped for the data collection process, with the likes of Brazilian Portuguese, Chinese (Taiwan), Indonesian, Polish, and Dutch already halfway toward being ready to start crowdsourcing voice data.

[...] "We believe these interfaces shouldn't be controlled by a few companies as gatekeepers to voice-enabled services, and we want users to be understood consistently, in their own languages and accents," said Mozilla's chief innovation officer, Katharina Borchert, in a blog post.

The Common Voice project serves a purpose similar to that of other open-license projects that have emerged to counter privately owned platforms. OpenStreetMap is a good example of a similarly crowdsourced project that gives developers open and freely usable maps of the world, without the costs or restrictions of rival services such as Google Maps.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by Phoenix666 on Friday June 08 2018, @04:56AM (1 child)

    by Phoenix666 (552) on Friday June 08 2018, @04:56AM (#690206) Journal

    Germans preen about their long word, "Donaudampfschifffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft," too, but it's not a big deal when you know how to chunk it: "Donau dampfschifffahrts elektrizitäten hauptbetriebswerkbau unterbeamten gesellschaft."

    --
    Washington DC delenda est.
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by kazzie on Friday June 08 2018, @11:31AM

    by kazzie (5309) Subscriber Badge on Friday June 08 2018, @11:31AM (#690272)

    Let me help you chunk it then:

    Llanfair-pwll-gwyn-gyll-go-ger-y-chwyrn-drobwll-llantysilio-gogo-goch.

    Or if you want it split per syllable:

    Llan-fair-pwll-gwyn-gyll-go-ger-y-chwyrn-dro-bwll-llan-ty-silio-go-go-goch.

    (You may already know that it is a contrived lengthening of the original name Llanfairpwllgwyngyll to attract tourists on the new-fangled railway. Other non-contrived placenames include Llanfairmathafarneithaf, Abergynolwyn, and Llanfairtalhaearn)