Stories
Slash Boxes
Comments

SoylentNews is people

posted by hubie on Sunday September 15, @02:53AM   Printer-friendly
from the hopefully-useful-AND-correct dept.

In groups people screen out chatter around them - and now technology can do the same:

It's the perennial "cocktail party problem" - standing in a room full of people, drink in hand, trying to hear what your fellow guest is saying.

In fact, human beings are remarkably adept at holding a conversation with one person while filtering out competing voices.

However, perhaps surprisingly, it's a skill that technology has until recently been unable to replicate.

And that matters when it comes to using audio evidence in court cases. Voices in the background can make it hard to be certain who's speaking and what's being said, potentially making recordings useless.

Electrical engineer Keith McElveen, founder and chief technology officer of Wave Sciences, became interested in the problem when he was working for the US government on a war crimes case.

"What we were trying to figure out was who ordered the massacre of civilians. Some of the evidence included recordings with a bunch of voices all talking at once - and that's when I learned what the "cocktail party problem" was," he says.

"I had been successful in removing noise like automobile sounds or air conditioners or fans from speech, but when I started trying to remove speech from speech, it turned out not only to be a very difficult problem, it was one of the classic hard problems in acoustics.

"Sounds are bouncing round a room, and it is mathematically horrible to solve."

The answer, he says, was to use AI to try to pinpoint and screen out all competing sounds based on where they originally came from in a room.

This doesn't just mean other people who may be speaking - there's also a significant amount of interference from the way sounds are reflected around a room, with the target speaker's voice being heard both directly and indirectly.

In a perfect anechoicchamber - one totally free from echoes - one microphone per speaker would be enough to pick up what everyone was saying; but in a real room, the problem requires a microphone for every reflected sound too.

[...] And, he adds: "We knew there had to be a solution, because you can do it with just two ears."

[...] What they had come up with was an AI that can analyse how sound bounces around a room before reaching the microphone or ear.

"We catch the sound as it arrives at each microphone, backtrack to figure out where it came from, and then, in essence, we suppress any sound that couldn't have come from where the person is sitting," says Mr McElveen.

The effect is comparable in certain respects to when a camera focusses on one subject and blurs out the foreground and background.

"The results don't sound crystal clear when you can only use a very noisy recording to learn from, but they're still stunning."

The technology had its first real-world forensic use in a US murder case, where the evidence it was able to provide proved central to the convictions.

[...] Since then, other government laboratories, including in the UK, have put it through a battery of tests. The company is now marketing the technology to the US military, which has used it to analyse sonar signals.

[...] Eventually it aims to introduce tailored versions of its product for use in audio recording kit, voice interfaces for cars, smart speakers, augmented and virtual reality, sonar and hearing aid devices.

So, for example, if you speak to your car or smart speaker it wouldn't matter if there was a lot of noise going on around you, the device would still be able to make out what you were saying.

[...] "The math in all our tests shows remarkable similarities with human hearing. There's little oddities about what our algorithm can do, and how accurately it can do it, that are astonishingly similar to some of the oddities that exist in human hearing," says McElveen.

"We suspect that the human brain may be using the same math - that in solving the cocktail party problem, we may have stumbled upon what's really happening in the brain."


Original Submission

This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Touché) by mhajicek on Sunday September 15, @08:04AM (4 children)

    by mhajicek (51) on Sunday September 15, @08:04AM (#1372735)

    ...were born with that power. If two people are talking at once, I just hear a wall of noise.

    --
    The spacelike surfaces of time foliations can have a cusp at the surface of discontinuity. - P. Hajicek
    • (Score: 4, Informative) by Gaaark on Sunday September 15, @10:36AM

      by Gaaark (41) on Sunday September 15, @10:36AM (#1372740) Journal

      Yup: yay autism!

      Like the hearing doctor told me; i hear the forest while trying to hear a single tree.

      --
      --- Please remind me if I haven't been civil to you: I'm channeling MDC. I have always been here. ---Gaaark 2.0 --
    • (Score: 4, Interesting) by Rosco P. Coltrane on Sunday September 15, @12:05PM (2 children)

      by Rosco P. Coltrane (4757) on Sunday September 15, @12:05PM (#1372742)

      I'm not sure how old you are, but losing the ability to discriminate sound sources is a sign of early age-related hearing loss. Or in other words, older folks often start having trouble following conversations in noisy places before they even show measurable signs of hearing loss when they go see a doctor.

      It happened to both my parents. My Dad especially had excellent discrimination abilities and lost it all in a matter of months some time in his 60's. I inherited his discrimination abilities: one of my favorite hobbies is to listen to a piece of music, identify all the instruments and where they sound from, and work out the melody and rhythm for each particular instrument. I dread the day what happened to my parents happens to me too because it will be quite a loss for me.

      • (Score: 2) by mhajicek on Sunday September 15, @05:32PM

        by mhajicek (51) on Sunday September 15, @05:32PM (#1372763)

        It's been this was for as long as I can remember.

        --
        The spacelike surfaces of time foliations can have a cusp at the surface of discontinuity. - P. Hajicek
      • (Score: 2) by gnuman on Sunday September 15, @09:36PM

        by gnuman (5013) on Sunday September 15, @09:36PM (#1372776)

        I'm not sure how old you are, but losing the ability to discriminate sound sources is a sign of early age-related hearing loss. Or in other words, older folks often start having trouble following conversations in noisy places before they even show measurable signs of hearing loss when they go see a doctor.

        I've also *always* had a problem with this. It's by far easier to pick up musical instruments in a band, than people yammering away with their BS over each other. And ability to pick up emotional context is also a disaster in these situations. For empathic people, quiet rooms, so we can concentrate on all the parts of the conversation (verbal and not), is by far the best. Loud bars are just walls of noise and useless for communication purposes. I have no idea how anyone meets anyone there.

        My wife, who may actually have some hearing loss, has no issues in these situations.

        So I wonder if this ability is not just about hearing but audio processing in the brain.

  • (Score: 5, Insightful) by Rosco P. Coltrane on Sunday September 15, @08:43AM (2 children)

    by Rosco P. Coltrane (4757) on Sunday September 15, @08:43AM (#1372736)

    And that matters when it comes to using audio evidence in court cases

    Mmm-yeees... I'm sure this technology was developed to get better court transcripts, and not at all to make TVs, cars and fridges better at spying on people.

    • (Score: 5, Insightful) by HiThere on Sunday September 15, @01:30PM

      by HiThere (866) on Sunday September 15, @01:30PM (#1372747) Journal

      Ir may well have been. What a technology is developed for is often quite different from the most common use. E.g. logarithms were developed to solve artillery ranging problems.

      --
      Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
    • (Score: 2) by Mykl on Monday September 16, @01:38AM

      by Mykl (1112) on Monday September 16, @01:38AM (#1372792)

      If it even works at all. If we are relying on an AI to say "It was Bob who said to rob the old lady" in an otherwise indistinguishable recording then I would need more than just the AI's word for it to convict.

      Do the authorities have the ability to 'lean on the scale' to ensure that they get the result they need from the AI by tweaking the parameters?

  • (Score: 1, Interesting) by Anonymous Coward on Sunday September 15, @03:00PM

    by Anonymous Coward on Sunday September 15, @03:00PM (#1372752)

    The math is not so relevant here, IMHO. Our brains track the pitch and the tone of the voice. It's like following a green light in a mess of other colors - not a physics problem, just focus on the right property. The AI doesn't give a shit about math and reflections and modeling - it gloms onto whatever discriminates signal A from signal B.

  • (Score: 0) by Anonymous Coward on Sunday September 15, @10:23PM

    by Anonymous Coward on Sunday September 15, @10:23PM (#1372779)

    "We catch the sound as it arrives at each microphone, backtrack to figure out where it came from, and then, in essence, we suppress any sound that couldn't have come from where the person is sitting," says Mr McElveen.

    Cool...now take into account that one or more people at the cocktail party (one of who may be the speaker) is moving through the crowd while talking.

(1)