Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Thursday July 20 2017, @08:02AM   Printer-friendly
from the speak-up! dept.

Mozilla wants to crowdsource thousands of hours of voice recordings for an open source voice recognition engine:

The Mozilla Foundation launched "Common Voice," which is a crowdsourced initiative to build an open source data set for voice recognition applications.

Many technology companies believe that voice control will be embedded into most devices in the future. This is why Apple, Google, Amazon, Microsoft, Baidu, and others are all trying to put their own voice-controlled artificial intelligence assistants into as many devices as they can and as fast as they can, in order to gain market share before the competition.

The problem with this, according to Mozilla, is that voice controlled technologies could end up being dominated by proprietary technology and data sets, which aren't made available to startups and academics. As some large companies already benefit from billion-dollar revenues, it could later become too difficult for startups to catch up with the big players. Though[sic] Common Voice, Mozilla aims to democratize voice recognition technology.

You could use this to build (the easy part of) a personal assistant that either does not use the cloud, or does so on your terms.


Original Submission

Related Stories

Mozilla's Common Voice Collecting French, German, and Welsh Samples, Prepping 40 More Languages 16 comments

Mozilla's effort to crowdsource datasets for voice recognition applications such as digital assistants has expanded to include 3 more languages, and soon many others:

Mozilla launched the first fruits of its Common Voice datasets in English back in November, a collection that contained some 500 hours of speech and constituted 400,000 recordings from 20,000 individuals. Today, Mozilla officially kick starts the process of collecting voice data for three more languages — French, German, and — a little randomly — Welsh. Another 40 tongues are currently being prepped for the data collection process, with the likes of Brazilian Portuguese, Chinese (Taiwan), Indonesian, Polish, and Dutch already halfway toward being ready to start crowdsourcing voice data.

[...] "We believe these interfaces shouldn't be controlled by a few companies as gatekeepers to voice-enabled services, and we want users to be understood consistently, in their own languages and accents," said Mozilla's chief innovation officer, Katharina Borchert, in a blog post.

The Common Voice project serves a purpose similar to that of other open-license projects that have emerged to counter privately owned platforms. OpenStreetMap is a good example of a similarly crowdsourced project that gives developers open and freely usable maps of the world, without the costs or restrictions of rival services such as Google Maps.


Original Submission

Mozilla Reportedly Working on a Voice-Controlled Web Browser 34 comments

Mozilla may be working on a voice-controlled browser

Mozilla may be working on a voice-controlled platform of its own. A listing for an all-hands internal meeting appeared about what seems like a new project: Scout. "With the Scout app, we start to explore browsing and consuming content with voice," it read. It's very unclear what the platform may or may not end up doing, as the meeting is focused on technical requirements for a "voice browser" that would, as a stated example, be able to read users an article about polar bears.

[...] CNET interpreted Scout to be a new voice-controlled web browser. With Google, Apple, Amazon and Microsoft falling over themselves refining their voice assistant technology (with Facebook not far behind), it's unsurprising that Mozilla would join the fray. Given the company's decades of web platform experience, a browser is surely simpler to implement than a new proprietary speaker. Plus, vocal navigation through a browser setup is probably easier for the average person to grasp.

So that's why they needed Common Voice.

Related: Mozilla's Common Voice Collecting French, German, and Welsh Samples, Prepping 40 More Languages


Original Submission

Mozilla Expands Common Voice Database to 18 Languages, With More on the Way 7 comments

Mozilla updates Common Voice dataset with 1,400 hours of speech across 18 languages

Mozilla wants to make it easier for startups, researchers, and hobbyists to build voice-enabled apps, services, and devices. Toward that end, it's today releasing the latest version of Common Voice, its open source collection of transcribed voice data that now comprises over 1,400 hours of voice samples from 42,000 contributors across 18 languages, including English, French, German, Dutch, Hakha-Chin, Esperanto, Farsi, Basque, Spanish, Mandarin Chinese, Welsh, and Kabyle.

It's one of the largest multi-language dataset of its kind, Mozilla claims — substantially larger than the Common Voice corpus it made publicly available eight months ago, which contained 500 hours (400,000 recordings) from 20,000 volunteers in English — and the corpus will soon grow larger still. The organization says that data collection efforts in 70 languages are actively underway via the Common Voice website and mobile apps.

Common Voice home page. Also at Engadget.

Previously: Mozilla's "Common Voice": Voice Recognition Without Google, Amazon, Baidu, Apple, Microsoft, etc.
Mozilla's Common Voice Collecting French, German, and Welsh Samples, Prepping 40 More Languages


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Insightful) by Absolutely.Geek on Thursday July 20 2017, @08:26AM (9 children)

    by Absolutely.Geek (5328) on Thursday July 20 2017, @08:26AM (#541851)

    Open source options for important tech is always good.

    --
    Don't trust the police or the government - Shihad: My mind's sedate.
    • (Score: 3, Insightful) by Kunasou on Thursday July 20 2017, @08:34AM (1 child)

      by Kunasou (4148) on Thursday July 20 2017, @08:34AM (#541852)

      This sounds good :)!
      But, I hope this doesn't end like Firefox OS and other Mozilla failed projects :|...

      • (Score: 0) by Anonymous Coward on Thursday July 20 2017, @08:43AM

        by Anonymous Coward on Thursday July 20 2017, @08:43AM (#541853)

        If they do it right, it will be created and won't need active development except for finding new voices for your vocaloid waifu slave.

    • (Score: 5, Interesting) by ledow on Thursday July 20 2017, @09:03AM (2 children)

      by ledow (5567) on Thursday July 20 2017, @09:03AM (#541857) Homepage

      There are already a ton of options, lots of them very mature and very old.

      CMU Sphinx springs to mind.

      Problem is that it's a hard thing to do, and generally you do a shitty job no matter how much "data" you throw at it precisely because it's such a hard job. I wouldn't be surprised if that software wasn't actually already a part of something like Alexa, etc.

      Speech recognition is something I don't even bother with. Literally, the second something wants to recognise my speech, I just jab buttons to turn it off or try to get through to a human. I have hundreds of iPads in my workplace, literally Siri is useless for anything. I once had a guy come in to talk to our staff who told them they could write their reports using Dragon apps on iPads. That was a scene of hilarity. The demonstration worked (because he used very particular phrases that he'd obviously tried a lot), but when it comes to anything approaching natural speech you might as well just spend 20 minutes finding a computer and type it in, it'll be quicker in the long run no matter what your typing speed is.

      My car has voice recognition. I have - without exaggeration - never got it to do what it's supposed to. Even "Play USB", or "Phone Home"... it just can't pick it up reliably, even in a silent, non-moving car. And it's actually more distracting TRYING to make it work than it would be just to pull over and press the button on the screen. It's not just me - almost everyone I know has the same problem. And some of them have very selective memory, when they come into my office and try to convince me of buying an Apple Watch or whatever, and it takes them three goes and a lot of "Oh, it never normally does this" just to get it to read out the damn time or whatever. I don't have a strange accent, I can be understood by everyone, I can adjust my voice to any speed, clarity, pronunciation, etc. required. It still doesn't work.

      Personally, I'd rather we didn't waste time on that when decades-old projects still can't get close to making it useful, and invested that talent on something else.

      • (Score: 3, Touché) by TheRaven on Thursday July 20 2017, @10:15AM

        by TheRaven (270) on Thursday July 20 2017, @10:15AM (#541868) Journal

        CMU Sphinx springs to mind

        Sphinx used to be pretty good, but the last time I tried building it I gave up fixing the dependencies on non-standard GCC behaviour that prevented it from building with anything other than an old GCC (I think 4.4 or earlier, might have been 4.5, and it also didn't build with one much older than the ones that it did work with). The contact listed on their web site never replied and I gave up trying to package it. The underlying technology is great, but it really needs someone to clean up the code. Maybe the Mozilla folk will do this...

        My car has voice recognition. I have - without exaggeration - never got it to do what it's supposed to.

        Command recognition generally works a lot better than dictation. Even the stuff that's been built into OS X for a decade or so is pretty reliable, if it has a fairly small set of commands to recognise.

        --
        sudo mod me up
      • (Score: 2) by coolgopher on Thursday July 20 2017, @10:39AM

        by coolgopher (1157) on Thursday July 20 2017, @10:39AM (#541872)

        I admit to using Cortana in the car to read out & reply to text messages. I'd give "her" 8/10 already to be honest. I'm surprised at how well it works despite the road noise.

    • (Score: 3, Insightful) by driverless on Thursday July 20 2017, @12:03PM (3 children)

      by driverless (4770) on Thursday July 20 2017, @12:03PM (#541885)

      Sure, but in this case it's another pointless adventure by an organisation that has completely lost its way. It's just another thing to burn up resources on their way to irrelevance as they ignore their core products, or at least ignore what made their core products popular in the first place.

      • (Score: 2, Funny) by Anonymous Coward on Thursday July 20 2017, @01:41PM (2 children)

        by Anonymous Coward on Thursday July 20 2017, @01:41PM (#541902)

        Yep. Mozilla fired Brendan Eich to appease a bunch of cucked beta gay transsexual feminist lesbian Muslim black Mexican liberal commies with green mohawks. (*checks to see if anything can be added to that list*)

        Everything they do is bad now! They rooned it! They roon everything!

        • (Score: 0) by Anonymous Coward on Thursday July 20 2017, @02:44PM (1 child)

          by Anonymous Coward on Thursday July 20 2017, @02:44PM (#541916)

          Go back to voat, bro.

          • (Score: 0) by Anonymous Coward on Thursday July 20 2017, @05:33PM

            by Anonymous Coward on Thursday July 20 2017, @05:33PM (#541985)

            Get a sarcasm meter brah

  • (Score: 2) by jasassin on Thursday July 20 2017, @08:47AM (3 children)

    by jasassin (3566) <jasassin@gmail.com> on Thursday July 20 2017, @08:47AM (#541854) Homepage Journal

    It wants its Dragon Naturally Speaking back.

    --
    jasassin@gmail.com GPG Key ID: 0xE6462C68A9A3DB5A
    • (Score: 4, Funny) by driverless on Thursday July 20 2017, @12:25PM

      by driverless (4770) on Thursday July 20 2017, @12:25PM (#541886)

      It wants its Dragon Naturally Speaking back.

      If it wants its Dragon Naturally Speaking back then it puts the lotion on its skin. Or else it gets the Firefox Australis again.

    • (Score: 3, Informative) by rob_on_earth on Thursday July 20 2017, @01:47PM (1 child)

      by rob_on_earth (5485) on Thursday July 20 2017, @01:47PM (#541904) Homepage

      I have discussed this with people before and the all say "but you have to train it".
      I do not care if I have to train it I just want the basic functionality some people (sadly not me) had using Dragon, even on Linux using a complex shim.
      Preferably I want it to work on a Raspberry Pi without any network connectivity.
      I also do not need 100% reliability.

      Too much to ask?

      • (Score: 0) by Anonymous Coward on Thursday July 20 2017, @10:18PM

        by Anonymous Coward on Thursday July 20 2017, @10:18PM (#542083)

        Does the Pi have the computational power to handle this? I have never looked at language processing so no idea about the overhead.

  • (Score: -1, Offtopic) by lx on Thursday July 20 2017, @08:56AM (7 children)

    by lx (1915) on Thursday July 20 2017, @08:56AM (#541856)

    For instance by no longer using Google analytics on the about:addons page [github.com].

    • (Score: 2) by jasassin on Thursday July 20 2017, @09:23AM (6 children)

      by jasassin (3566) <jasassin@gmail.com> on Thursday July 20 2017, @09:23AM (#541863) Homepage Journal

      For instance by no longer using Google analytics on the about:addons page [github.com].

      If you read the thread to the bottom, they already fixed it. They released a hotfix, now you just have to enable DoNotTrack.

      --
      jasassin@gmail.com GPG Key ID: 0xE6462C68A9A3DB5A
      • (Score: 1, Troll) by Anonymous Coward on Thursday July 20 2017, @10:24AM (5 children)

        by Anonymous Coward on Thursday July 20 2017, @10:24AM (#541870)

        opt-out is hardly a fix, more like a cop-out

        • (Score: 2, Interesting) by moondrake on Thursday July 20 2017, @01:38PM (4 children)

          by moondrake (2658) on Thursday July 20 2017, @01:38PM (#541901)

          Why? They use GA to analyze user data on that page (which is useful for running a site) and worked out a deal with google that the data is only for their own (i.e. not 3rd party use). Of course, maybe google as a service provider is breaking that agreement.

          If you do not like GA because it does track you (for mozilla's own use), you can turn it off by using DNT, which you have done already because if the only page you browse to in your life that is using GA is the firefox addon page, you have been enjoying a very miserable online experience anyway.

          If you are really that paranoid about thinks like GA and tracking, you should not be using firefox anyway.

          This thread could be just OT, but I think you are all just trolling.

          • (Score: 0) by Anonymous Coward on Friday July 21 2017, @07:03AM

            by Anonymous Coward on Friday July 21 2017, @07:03AM (#542252)

            Because DNT increases your fingerprintability.

          • (Score: 0) by Anonymous Coward on Friday July 21 2017, @08:20AM (2 children)

            by Anonymous Coward on Friday July 21 2017, @08:20AM (#542270)

            Why? Because you shouldn't rely on Google for anything, given its reputation for being vehemently anti-privacy. If Firefox needs user data, it should build its own solution and have an option to disable it without relying on DNT.

            • (Score: 0) by Anonymous Coward on Saturday July 22 2017, @07:47AM (1 child)

              by Anonymous Coward on Saturday July 22 2017, @07:47AM (#542800)

              I am sure you are going to volunteer for that.

              • (Score: 0) by Anonymous Coward on Saturday July 22 2017, @08:39AM

                by Anonymous Coward on Saturday July 22 2017, @08:39AM (#542815)

                We already have plenty of people who have volunteered. https://en.wikipedia.org/wiki/Category:Free_web_analytics_software [wikipedia.org]

                Mozilla is taking a user hostile position here. They apparently value their convenience over user privacy.

  • (Score: 0) by Anonymous Coward on Thursday July 20 2017, @09:10AM

    by Anonymous Coward on Thursday July 20 2017, @09:10AM (#541859)

    This is awesome news. Another major benefit here is offline work. The current obsession with on-demand distributed server software (sorry, "cloud computing"), is that a single issue with the servers or online access and your entire system is dead. Even Amazon Web Services has gone down before and that brought down countless sites along with it. Microsoft's servers have gone countless times. And then we get to the user. Even in developed nations internet access suffers outages. In the United States bandwidth caps, often unstated, can result in throttling or excess charge to users. And then you get to less developed countries internet disruption or simply poorly implemented infrastructure is a standard. And undeveloped doesn't mean primitive nations. Thailand is certainly applies as described here even though it's one of the more common destinations for digital entrepreneur types. And then there are hacks/security issues on top of these issues.

    I think a sort of synergy between the third party redundancy/updates/consistency offered by remote servers paired with the performance/reliability/privacy/security of local software is the best of both worlds. So for instance in this case the remote server could offer integrity checks and database updates while all actual work and interfacing with the database could be done offline. In my opinion the obsession with servers is mostly just rent (and data) seeking behavior. From the perspective of consumer benefit, the pros/cons do not strongly favor server based access and likely will not for the foreseeable future. However, it provides a simple mechanism of profit for companies.

  • (Score: 4, Interesting) by KritonK on Thursday July 20 2017, @09:46AM (6 children)

    by KritonK (465) on Thursday July 20 2017, @09:46AM (#541865)

    An interesting part of the project is the validation, where you get to listen to various samples and verify that the speakers actually said what they were asked to say. You get to listen to some wonderful and (some not so wonderful) accents from all over the English-speaking world.

    This is actually a necessary step. Apart from the inevitable "my hovercraft is full of eels", that some wise guy is bound to say instead of the requested phrase, a short sampling showed that some samples contain mispronounced, misread, and doubly read words. I chose to consider all of those as mistakes. There are also extraneous noises, which voice recognition software should be able to filter out, so I considered these samples correct, as long as the speaker said what they were asked to say.

    Then there's the matter of who validates the validators. I hope that Mozilla is adopting an "n out of n identical validations or back to the pool" approach, to prevent incorrect validations made either by malice or mistake.

    I was really impressed by the amount of expression that many readers put in their voice when reading the samples. On the other hand, there was this guy, who managed to drone on, and on, and on, even though he was reading a single sentence!

    The above should have getting a large corpus of spoken English samples covered eventually, leaving only the remaining n-1 languages and dialects spoken throughout the world and beyond (e.g., Klingon).

    • (Score: 1, Interesting) by Anonymous Coward on Thursday July 20 2017, @09:52AM

      by Anonymous Coward on Thursday July 20 2017, @09:52AM (#541866)

      If a high enough percentage of the samples are correct, maybe the system can validate by itself.

    • (Score: 2) by c0lo on Thursday July 20 2017, @12:38PM

      by c0lo (156) Subscriber Badge on Thursday July 20 2017, @12:38PM (#541891) Journal

      Then there's the matter of who validates the validators.

      Example: if you need a text typed with at most 1/1,000,000 characters mistaken and a good typist has a mistake rate of 1/1000, it's just as costly to hire a typist and a proof-reader or two typists inputting the same text independently - in the latter case, the chance of an independent double mistake affecting the same character is 1/1,000,000.

      --
      https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
    • (Score: 3, Funny) by DannyB on Thursday July 20 2017, @02:24PM (3 children)

      by DannyB (5839) Subscriber Badge on Thursday July 20 2017, @02:24PM (#541912) Journal

      who validates the validators

      Mozilla could use a failed meta-mod system like the green site does. That worked out very whale.

      --
      The lower I set my standards the more accomplishments I have.
      • (Score: 2) by kazzie on Thursday July 20 2017, @05:40PM (2 children)

        by kazzie (5309) Subscriber Badge on Thursday July 20 2017, @05:40PM (#541990)

        Would it work whale enough to understand this speech [hrwiki.org]?

        • (Score: 0) by Anonymous Coward on Thursday July 20 2017, @05:46PM

          by Anonymous Coward on Thursday July 20 2017, @05:46PM (#541993)

          Sever your leg please, it's the greatest day!

        • (Score: 2) by DannyB on Thursday July 20 2017, @06:02PM

          by DannyB (5839) Subscriber Badge on Thursday July 20 2017, @06:02PM (#542006) Journal

          I don't know. But try this one:

          I saw a snike. Eat head beeg blow oys and a huge tile.

          Or try this:

          Dem gubmit foaks shore is ignert.

          Maw brothuh from jaw-jah eaze too.

          Hay sez duh peekup trook brokes down.

          I sez, ov curse eat brokes down. Yaw deedn't puts no all in duh in-jun.

          --
          The lower I set my standards the more accomplishments I have.
  • (Score: 2) by Snospar on Thursday July 20 2017, @10:23AM (1 child)

    by Snospar (5366) Subscriber Badge on Thursday July 20 2017, @10:23AM (#541869)

    I just tried recording some samples using my phone and didn't even bother submitting the resulting recordings as they sounded awful. Audio was stuttering and choppy. Did a comparison recording locally on the phone and it didn't have any audio issues.

    Are they doing heavy compression or processing on their servers?

    The recordings to review are of wildly differing quality.

    --
    Huge thanks to all the Soylent volunteers without whom this community (and this post) would not be possible.
    • (Score: 0) by Anonymous Coward on Thursday July 20 2017, @10:20PM

      by Anonymous Coward on Thursday July 20 2017, @10:20PM (#542084)

      Yes, they have to minimize bandwidth and "erroneous data" processing since it is an online service with millions of hits. Fuck the "cloud", those MBA shitheads and VC "lock in" entrepreneurs need to go back to school and re-learn human values.

  • (Score: 4, Interesting) by Lemming on Thursday July 20 2017, @10:35AM

    by Lemming (1053) on Thursday July 20 2017, @10:35AM (#541871)

    I got my Mycroft Mark 1 from their Kickstarter campaign this week, I've yet to unpack it but I will certainly experiment with it this weekend. The Mark 1 is a hardware device like Google Home or Amazon Echo, but you also can run the software on a Linux pc or a Raspberry Pi.

    For the speech-to-text, it uses pocketsphinx, a lightweight speech recognition engine (https://github.com/cmusphinx/pocketsphinx [github.com]) to detect the wake word ("Hey Mycroft" by default), and then sends recorded audio to an external STT engine. By default it uses their own Mycroft Home server, but I seem to understand you can also pair it with Google STT. I suppose this could also be used with Mozilla's Common Voice in the future.

    Website: https://mycroft.ai/ [mycroft.ai]
    GitHub repos: https://github.com/MycroftAI [github.com]
    Documentation: https://docs.mycroft.ai/ [mycroft.ai]

  • (Score: 1) by Tara Li on Thursday July 20 2017, @04:23PM

    by Tara Li (6248) on Thursday July 20 2017, @04:23PM (#541965)

    For starters, they can suck down the librevox corpus and begin with that.

  • (Score: 3, Interesting) by krishnoid on Thursday July 20 2017, @04:51PM (5 children)

    by krishnoid (1156) on Thursday July 20 2017, @04:51PM (#541976)

    Why do these voice recognition toolkits start with English, which doesn't use a phonetic alphabet. Shouldn't it be easier and/or more reliable to perform voice recognition on a phonetic-alphabet language, or wouldn't that make a difference?

    • (Score: 2) by DannyB on Thursday July 20 2017, @06:10PM

      by DannyB (5839) Subscriber Badge on Thursday July 20 2017, @06:10PM (#542010) Journal

      For developers and testers fluent in other languages, there is nothing stopping them from building a similar learning / training / verification system. My best wishes, but I only speak English. Well, I mean, the language spoken in North America.

      Q. What do you call someone who speaks two languages?
      A. Bilingual

      Q. What do you call someone who speaks three languages?
      A. Trilingual

      Q. What do you call someone who speaks only one language?
      A. American

      --
      The lower I set my standards the more accomplishments I have.
    • (Score: 0) by Anonymous Coward on Friday July 21 2017, @01:05AM (1 child)

      by Anonymous Coward on Friday July 21 2017, @01:05AM (#542126)

      The sooner we get everybody speaking the same language, the better.

      Supporting junk languages would delay progress.

      • (Score: 0) by Anonymous Coward on Saturday July 22 2017, @08:42AM

        by Anonymous Coward on Saturday July 22 2017, @08:42AM (#542816)

        Another aggressive mandarin spotted.

    • (Score: 0) by Anonymous Coward on Friday July 21 2017, @01:59AM (1 child)

      by Anonymous Coward on Friday July 21 2017, @01:59AM (#542141)

      Because English has about 12k different unique syllables compared with a language like Mandarin that's only got about 1600. If it can handle English, then chances are it can handle other languages with some adjustment. English is also an incredibly popular language with many speakers that have the time and money necessary to fund the project.

      • (Score: 1, Informative) by Anonymous Coward on Friday July 21 2017, @04:17AM

        by Anonymous Coward on Friday July 21 2017, @04:17AM (#542177)

        Your examples call to mind the fact that Chinese is a tonal language [lexington.ro] whilst English is not. The difference is a stumbling block for English speakers when learning Chinese.

        Last year, we had a story [soylentnews.org] about Baidu's effort at an engine that would recognize both English and Chinese.

(1)