Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Friday June 08 2018, @01:08AM   Printer-friendly
from the dialects++ dept.

Mozilla's effort to crowdsource datasets for voice recognition applications such as digital assistants has expanded to include 3 more languages, and soon many others:

Mozilla launched the first fruits of its Common Voice datasets in English back in November, a collection that contained some 500 hours of speech and constituted 400,000 recordings from 20,000 individuals. Today, Mozilla officially kick starts the process of collecting voice data for three more languages — French, German, and — a little randomly — Welsh. Another 40 tongues are currently being prepped for the data collection process, with the likes of Brazilian Portuguese, Chinese (Taiwan), Indonesian, Polish, and Dutch already halfway toward being ready to start crowdsourcing voice data.

[...] "We believe these interfaces shouldn't be controlled by a few companies as gatekeepers to voice-enabled services, and we want users to be understood consistently, in their own languages and accents," said Mozilla's chief innovation officer, Katharina Borchert, in a blog post.

The Common Voice project serves a purpose similar to that of other open-license projects that have emerged to counter privately owned platforms. OpenStreetMap is a good example of a similarly crowdsourced project that gives developers open and freely usable maps of the world, without the costs or restrictions of rival services such as Google Maps.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0) by Anonymous Coward on Friday June 08 2018, @01:57AM (11 children)

    by Anonymous Coward on Friday June 08 2018, @01:57AM (#690156)

    Just what have you got against Welsh, you insensitive clod.

    • (Score: 2) by takyon on Friday June 08 2018, @02:02AM (4 children)

      by takyon (881) <{takyon} {at} {soylentnews.org}> on Friday June 08 2018, @02:02AM (#690157) Journal

      Mae'n iaith farw.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 1, Informative) by Anonymous Coward on Friday June 08 2018, @02:22AM (2 children)

        by Anonymous Coward on Friday June 08 2018, @02:22AM (#690164)

        Rumors of it's death are premature,
            https://en.wikipedia.org/wiki/Welsh_language [wikipedia.org]

        The Welsh Language (Wales) Measure 2011 gave the Welsh language official status in Wales,[14] making it the only language that is de jure official in any part of the United Kingdom, with English being de facto official. The Welsh language, along with English, is also a de jure official language of the National Assembly for Wales.[15]

        • (Score: 2) by Phoenix666 on Friday June 08 2018, @04:48AM

          by Phoenix666 (552) on Friday June 08 2018, @04:48AM (#690203) Journal

          Welsh is pretty healthy relative to world languages. Manx and Cornish are closer to being extinct, though I think Manx and Cornishmen have been trying to revive those tongues.

          The BBC has world service in Welsh as well as a whole series of lessons in Welsh. Its pronunciation is pretty straightforward for English speakers, with the exception of the double-L. To my ear it's quite beautiful, almost like elvish, e.g. mynyddoedd ("mountains").

          As a bonus there is a robust body of folk songs in Welsh that aid the learner (a lot of people learn languages through music). "Ar Hyd Y Nos" and "Suo-gan" are two of the better known, with the latter having been performed in many TV shows and films like "Empire of the Sun" (a young Christian Bale sings it while watching Japanese fighters take off).

          --
          Washington DC delenda est.
        • (Score: 2) by kazzie on Friday June 08 2018, @11:26AM

          by kazzie (5309) on Friday June 08 2018, @11:26AM (#690269)

          ... making [Welsh] the only language that is de jure official in any part of the United Kingdom, with English being de facto official.

          Incidentally, every law of the UK Parliament is still given Royal Assent in Norman French [wikipedia.org].

      • (Score: 2) by kazzie on Friday June 08 2018, @11:21AM

        by kazzie (5309) on Friday June 08 2018, @11:21AM (#690268)

        Ond rwyt ti'n ei defnyddio hi, felly sut allai fod yn farw?

        Neu efallai dy fod ti yn farw, ac yn siarad yr iaith. Help, Sombîs!

    • (Score: 0) by Anonymous Coward on Friday June 08 2018, @02:50AM (1 child)

      by Anonymous Coward on Friday June 08 2018, @02:50AM (#690175)

      The experts are still debating whether Welsh is really a language, you overly sensitive clod!

      • (Score: 0) by Anonymous Coward on Saturday June 09 2018, @01:26PM

        by Anonymous Coward on Saturday June 09 2018, @01:26PM (#690781)

        The consensus is in. It's not really a language. It's just the rock of rocks being bounced around in a barrel.

    • (Score: 4, Funny) by RamiK on Friday June 08 2018, @03:12AM (2 children)

      by RamiK (1813) on Friday June 08 2018, @03:12AM (#690185)

      Just what have you got against Welsh

      He bit his tongue trying to pronounce Llanfair­pwllgwyngyll­gogery­chwyrn­drobwll­llan­tysilio­gogo­goch [wikipedia.org].

      --
      compiling...
      • (Score: 2) by Phoenix666 on Friday June 08 2018, @04:56AM (1 child)

        by Phoenix666 (552) on Friday June 08 2018, @04:56AM (#690206) Journal

        Germans preen about their long word, "Donaudampfschifffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft," too, but it's not a big deal when you know how to chunk it: "Donau dampfschifffahrts elektrizitäten hauptbetriebswerkbau unterbeamten gesellschaft."

        --
        Washington DC delenda est.
        • (Score: 2) by kazzie on Friday June 08 2018, @11:31AM

          by kazzie (5309) on Friday June 08 2018, @11:31AM (#690272)

          Let me help you chunk it then:

          Llanfair-pwll-gwyn-gyll-go-ger-y-chwyrn-drobwll-llantysilio-gogo-goch.

          Or if you want it split per syllable:

          Llan-fair-pwll-gwyn-gyll-go-ger-y-chwyrn-dro-bwll-llan-ty-silio-go-go-goch.

          (You may already know that it is a contrived lengthening of the original name Llanfairpwllgwyngyll to attract tourists on the new-fangled railway. Other non-contrived placenames include Llanfairmathafarneithaf, Abergynolwyn, and Llanfairtalhaearn)

    • (Score: 3, Interesting) by kazzie on Friday June 08 2018, @11:19AM

      by kazzie (5309) on Friday June 08 2018, @11:19AM (#690267)

      Including Welsh isn't necessarily such a random thing to do. There is well-established research into interfacing technology with Welsh speech, see here [techiaith.cymru] or here [bbc.co.uk] for examples of recent work.

      There's also the fact that as a Celtic language, it is relatively unrelated to Germanic, Romance, etc. languages. Including Welsh in your training material for a multi-lingual AI solution should give better results for a wide array of languages in the future.

  • (Score: 0) by Anonymous Coward on Friday June 08 2018, @08:21AM

    by Anonymous Coward on Friday June 08 2018, @08:21AM (#690238)

    How Mozilla thinks:

    - Firefox users are leaving, and we don't have money to fix the problems.
    - Let's spend money on building an OS.
    - Now we have even less money to fix the browser, and even more users are leaving.
    - Let's spend money on a voice assistant.

  • (Score: 0) by Anonymous Coward on Friday June 08 2018, @02:02PM

    by Anonymous Coward on Friday June 08 2018, @02:02PM (#690315)

    I was wondering where the 500M / year in finding has been going to. Mozilla is a jobs program for a bunch of people out in the Valley. Browsers are not interesting or being funded by VC anymore; all the funding is going to AI and Voice Assistants so the people at Mozilla are padding their resumes for their next startup job. Expect an announcement in the next 6 months about the Firefox AI.

    I am calling it right now: Fire-I. I should probably trademark that or something.

  • (Score: 0) by Anonymous Coward on Friday June 08 2018, @05:50PM (1 child)

    by Anonymous Coward on Friday June 08 2018, @05:50PM (#690420)

    So now I have to worry about my web browser listening to my personal conversations? It's not like i can talk faster than I can type, so what's the point? I'm going to have to cut all the speaker and mic wires whenever I buy a new computer.

    • (Score: 0) by Anonymous Coward on Saturday June 09 2018, @01:30PM

      by Anonymous Coward on Saturday June 09 2018, @01:30PM (#690784)

      It already can.
      Chrome already does.
      Jojn us. Drink the koolaid.
      Google knows and loves you.
      Mozilla wants to know and love you.
      Join us.
      Peace.
      Out.

(1)