Microsoft on Tuesday said that its researchers have "made a major breakthrough in speech recognition."
In a paper [PDF] published a day earlier, Microsoft machine learning researchers describe how they developed an automated system that can recognize recorded speech as well as a professional transcriptionist.
Using the NIST 2000 dataset of recorded calls, Microsoft's software performed slightly (0.4 per cent) better than the error rate the company attributes to professional transcriptionists (5.9 per cent) for the Switchboard portion of the data, in which strangers discuss a specified topic.
There goes your bright future as a court recorder...
(Score: 0) by Anonymous Coward on Thursday October 20 2016, @08:51PM
One too many promises about speech recognition in the world already. And I don't give a damn that their program beat typist(s) at accuracy in a contest. I only care that in real world conditions their program does better than a blooded transcriptionist in everyday working conditions, especially in getting Chinese, Indian, and Italian accented (multiple ones in different dialects) correct at medical or legal transcription. Then I'll buy Microsoft's alleged breakthrough. Maybe.
(Score: 2) by goodie on Friday October 21 2016, @02:53PM
Was thinking the same thing... The day I can give out interview audio to have them transcribed quickly and properly, it will become interesting. If anybody has done interview transcriptions, it's long, boring, and mind-numbing. Yes it makes you know your data by heart but it takes a very long time (about 5/6 hours per hour of interview). There can be issues with audio quality, people eating/chewing gum, interruptions, accents etc. that make it even harder to work. In the past, some of the software (e.g., Dragon something, I forgot) could technically work but required a lot of training for each voice to transcribe. Needless to say, unless you're constantly talking to yourself, it's not very useful... The other thing that a professional transcriber can pick up on much better than an algorithm (for now) is tone, hesitation etc. These can be very important in some contexts (in fact, more relevant than what people say). So losing those can be detrimental to the analysis. Perhaps that's why a good transcription costs an arm and a leg.
(Score: 0) by Anonymous Coward on Friday October 21 2016, @07:40PM
Is the accuracy/error rate the best Microsoft's system can do and it's better than a pro human that's only given one-pass? Or were the humans allowed to have multiple passes at deciphering the audio?