Submitted via IRC for qkontinuum.
Microsoft is making the tools that its own researchers use to speed up advances in artificial intelligence available to a broader group of developers by releasing its Computational Network Toolkit on GitHub.
The researchers developed the open-source toolkit, dubbed CNTK, out of necessity. Xuedong Huang, Microsoft's chief speech scientist, said he and his team were anxious to make faster improvements to how well computers can understand speech, and the tools they had to work with were slowing them down.
So, a group of volunteers set out to solve this problem on their own, using a homegrown solution that stressed performance over all else.
The effort paid off.
In internal tests, Huang said CNTK has proved more efficient than four other popular computational toolkits that developers use to create deep learning models for things like speech and image recognition, because it has better communication capabilities
"The CNTK toolkit is just insanely more efficient than anything we have ever seen," Huang said.
(Score: 1, Interesting) by Anonymous Coward on Thursday January 28 2016, @01:02AM
https://status.github.com/messages
(Score: 0) by Anonymous Coward on Thursday January 28 2016, @01:55AM
Nope. M$ Fanbois. They appear to be more intelligent on average than those iFanbois but still are lemmings walking off a cliff.
(Score: 2) by MichaelDavidCrawford on Thursday January 28 2016, @03:15AM
there is some written material hosted at GitHub that does not promote a harmonious society.
Yes I Have No Bananas. [gofundme.com]
(Score: 3, Interesting) by hendrikboom on Thursday January 28 2016, @02:01AM
Didn't Google just release their own deep learning tools [blogspot.ca], and make a course on it available as well?
(Score: 2, Funny) by Anonymous Coward on Thursday January 28 2016, @02:32AM
Tell us how you really feel.
(Score: 0) by Anonymous Coward on Thursday January 28 2016, @06:12AM
Tell us how you really feel.
"I feel pretty! Oh so pretty! So pretty, and happy, and gay!"
Microsoft Opensource. So gay. (Not that there is anything wrong with that, but a spokeman named Huang? C'mon!)
(Score: 2) by MichaelDavidCrawford on Thursday January 28 2016, @02:48AM
I sing baritone, I can do bass OK when I try. I also have a very slow, rather plodding way of talking.
I have never one been able to get a voice response system to understand me. Sometimes it works to press 0 repeatedly until I get a live human - but not always.
If you'd like to hear for yourself I was interviewed on CNN [youtu.be] a while back. A good test vector for your AI would be to determine whether your speech understander could make an intelligible text transcript of the YouTube's audio track.
Yes I Have No Bananas. [gofundme.com]
(Score: 2) by hendrikboom on Thursday January 28 2016, @04:23AM
I sing double bass. Well, actually almost an octave below bass. I found out when I joined a choir and had trouble getting appreciable volume with a few of the low bass notes and was told that I was singing them an octave too low. Of course when I corrected that and started singing in the proper register I *really* couldn't sing the high bass notes, and there were a lot more of them.
But voice-response systems seem to understand me.
-- hendrik
(Score: 2) by MichaelDavidCrawford on Thursday January 28 2016, @03:17AM
If I had to write a speech recognition program I wouldn't have the first clue. It's not like one could just do:
if ( word == "OK" )
return 1;
There must be some established ways this is done, but I'm too lazy to Google it. Can you help a Brother out?
Yes I Have No Bananas. [gofundme.com]
(Score: 4, Informative) by Post-Nihilist on Thursday January 28 2016, @04:30AM
The naive technique is to manully record a bunch of phonemes, chop them into small slightly overlapping chunk (10ms to 20ms) , do an fft (signal to frequency distribution) and train a hidden Markov model of your samples. To get fancier, you can replace the hmm with some number of layered support vector machines. The next step is to get more language specific by replacing the fft with a tuned wavelet transform, and the phonemes by word roots, prefixes and suffixes.
Or you forget all of the above and use the whole sample as the input to a huge neural network and layer layers upon layers on that humongous network, train it, remove the useless neurons and call it Cortana
Be like us, be different, be a nihilist!!!
(Score: 1) by zugedneb on Friday January 29 2016, @04:50AM
Generally, all things done with neural networks can be done with "other" statistical methods, but with more work and less magic.
The post above me mentions FFT and using the spectra of the signal to match it with some pattern, but there are some other methods more suitable...
When you have time, you could read up on principal component analysis, it is used for recognition and compression of various signals...
old saying: "a troll is a window into the soul of humanity" + also: https://en.wikipedia.org/wiki/Operation_Ajax
(Score: 1) by zugedneb on Friday January 29 2016, @05:03AM
Say, you have a signal, such as a piece of ekg or some speech, and cal it a "rule"...
This rule can be expressed as the sum of other, simpler, rules, like the cosine function in the fourier transform.
But if you have a signal, you can use statistical methods to ask it "what is the rule components you are made of" and that would be principal component analysis.
The signal form witch the components have been extracted from can be rebuilt as a weighted sum of these components, just like with fourier transform, but these components can also be used to determine how other signals match the first one...
The PCA can itself be used as material to train neural networks, it is called preprocessing, since to use just the raw signal to train the neural network would need an immense network...
This is a deep topic, the maths is a bit on the harder side =)
old saying: "a troll is a window into the soul of humanity" + also: https://en.wikipedia.org/wiki/Operation_Ajax