Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.
posted by chromas on Wednesday July 10 2019, @10:24AM   Printer-friendly
from the Madame-dum-die-dum-dum-Defarge dept.

ETH Zurich:

To store the data, the two doctoral students and their colleague, Master's student Gabriel Voirol, make minimal changes to the music. In contrast to other scientists' attempts in recent years, the researchers state that their new approach allows higher data transfer rates with no audible effect on the music. "Our goal was to ensure that there was no impact on listening pleasure," Eichelberger says.

Tests the researchers have conducted show that in ideal conditions, their technique can transfer up to 400 bits per second without the average listener noticing the difference between the source music and the modified version (see also the audio sample). Given that under realistic conditions a degree of redundancy is necessary to guarantee transmission quality, the transfer rate will more likely be some 200 bits -- or around 25 letters -- per second. "In theory, it would be possible to transmit data much faster. But the higher the transfer rate, the sooner the data becomes perceptible as interfering sound, or data quality suffers," Tanner adds.

The researchers from ETH Zurich's Computer Engineering and Networks Laboratory use the dominant notes in a piece of music, overlaying each of them with two marginally deeper and two marginally higher notes that are quieter than the dominant note. They also make use of the harmonics (one or more octaves higher) of the strongest note, inserting slightly deeper and higher notes here, too. It is all these additional notes that carry the data. While a smartphone can receive and analyse this data via its built-in microphone, the human ear doesn't perceive these additional notes.

[...] To tell the decoder algorithm in the smartphone where it needs to look for data, the scientists use very high notes that the human ear can barely register: they replace the music in the frequency range 9.8-10 kHz with an acoustic data stream that carries the information on when and where across the rest of the music's frequency spectrum to find the data being transmitted.

Eichelberger M, Tanner S, Voirol G, Wattenhofer R: Imperceptible Audio Communication[pdf]. 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, 12-17 May 2019


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 1) by shrewdsheep on Wednesday July 10 2019, @10:58AM (5 children)

    by shrewdsheep (5215) on Wednesday July 10 2019, @10:58AM (#865336)

    ...sounds like a missed opportunity to make the data squeak: do you hear that sound of compression?

    • (Score: 3, Interesting) by JoeMerchant on Wednesday July 10 2019, @11:45AM (2 children)

      by JoeMerchant (3937) on Wednesday July 10 2019, @11:45AM (#865342)

      What's a "normal" .mp3 data rate, like 128Kbps? Side channel and you're done: 128.4Kbps. Unless we're trying to be clever and chirp the data out in the actual audio (which is what this "sounds like.") In that case this is sort of the opposite of MP3 research which tried to store only those things important to human perception of audio.

      Point of curiosity, if MP3 and this really achieve their respective goals, then MP3 encoding should pretty much destroy this data stream, since it's unimportant to human perception of the audio.

      I did something very similar with steganography in still images - very high data rates, imperceptible differences in the image, but I needed to use .png or other lossless encoding schemes - and the resulting compressed files did grow as you stuffed more information in them, even though you couldn't see the differences. Encoding with .jpg would completely wipe out my encoding scheme.

      --
      🌻🌻 [google.com]
      • (Score: 2) by Pino P on Wednesday July 10 2019, @03:25PM (1 child)

        by Pino P (4721) on Wednesday July 10 2019, @03:25PM (#865402) Journal

        Point of curiosity, if MP3 and this really achieve their respective goals, then MP3 encoding should pretty much destroy this data stream, since it's unimportant to human perception of the audio.

        It turns out that existing ATRAC, MP3, AC-3, AAC, Vorbis, and Opus encoders do not perfectly "achieve their respective goals." This is howCinavia and other existing watermarking schemes manage to slip by the slightly imperfect psychoacoustic models in these encoders.

        • (Score: 2) by JoeMerchant on Wednesday July 10 2019, @03:55PM

          by JoeMerchant (3937) on Wednesday July 10 2019, @03:55PM (#865412)

          Yeah, how long ago was all that bruhaha about watermarking on audio files? Seems like almost 20 years now.

          --
          🌻🌻 [google.com]
    • (Score: 3, Insightful) by takyon on Wednesday July 10 2019, @06:20PM (1 child)

      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Wednesday July 10 2019, @06:20PM (#865452) Journal

      They should disguise the stenography data as vinyl hiss.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 2) by Bot on Thursday July 11 2019, @04:31PM

        by Bot (3902) on Thursday July 11 2019, @04:31PM (#865848) Journal

        Maybe you meant "steganography".

        --
        Account abandoned.
  • (Score: 4, Insightful) by Fnord666 on Wednesday July 10 2019, @12:58PM (3 children)

    by Fnord666 (652) on Wednesday July 10 2019, @12:58PM (#865358) Homepage

    Manuel Eichelberger and Simon Tanner, two ETH doctoral students, store data in music. This means, for example, that background music can contain the access data for the local Wi-Fi network, and a mobile phone’s built-in microphone can receive this data. “That would be handy in a hotel room,” Tanner says, “since guests would get access to the hotel Wi-Fi without having to enter a password on their device.”

    I'm sure this is how it'll be used. No one would think to use this to send advertising to you, deliver malware, use it as a C&C channel, track viewing/listening habits or track your location in an environment. Nope, definitely will be used for the benefit of the user.

    • (Score: 0) by Anonymous Coward on Wednesday July 10 2019, @01:24PM (1 child)

      by Anonymous Coward on Wednesday July 10 2019, @01:24PM (#865365)

      Comrade Hellary uses it to send spy messages to Putin. Every third note is morse code.

      • (Score: 0) by Anonymous Coward on Wednesday July 10 2019, @07:01PM

        by Anonymous Coward on Wednesday July 10 2019, @07:01PM (#865465)

        Comrade Trumplestilskin uses it to send spy messages to Putin. Every third note is morse code.

        There. FTFY.

    • (Score: 2) by hemocyanin on Wednesday July 10 2019, @03:30PM

      by hemocyanin (186) on Wednesday July 10 2019, @03:30PM (#865405) Journal
  • (Score: 2) by Alfred on Wednesday July 10 2019, @01:37PM

    by Alfred (4006) on Wednesday July 10 2019, @01:37PM (#865367) Journal
    There are some kinds of *music* that would improve with increasing number of audio artifacts.
  • (Score: 5, Interesting) by AthanasiusKircher on Wednesday July 10 2019, @01:45PM (5 children)

    by AthanasiusKircher (5291) on Wednesday July 10 2019, @01:45PM (#865371) Journal

    Sorry, but I call major BS on that claim of "no audible effect on the music." Did anyone actually listen to the sample recordings in TFA? They have a comparison of the original (listen to that first) and the modified version.

    Okay, the claim of "no effect" is softened later in TFA. As it says in the second paragraph in the summary:

    without the average listener noticing the difference between the source music and the modified version (see also the audio sample)

    First, I'll qualify this by saying I'm NOT "the average listener." (I'll explain more in a note below.**) But I think even to the average listener, something will sound "different" about the recording containing data, even if it's not noticeably "different music." To my ear, the recording with data is the same performance, but it sounds like it was recorded in a bathroom (or similar incredibly live musical space with a lot of reverb) and processed by a very different (and incompetent) audio engineer. Basically, the original version sounds "crisp" and clear (little reverb), with very good balance among instruments. The manipulated version inappropriately foregrounds a few instruments while significantly backgrounding others. Meanwhile, the reverb is distracting and definitely sounds like it contains a lot of "junk" -- the background instruments sound a bit muddy, and even the foregrounded instruments have a noticeable "ring" that's different.

    Now, would I recognize the doctored recording as having "something wrong with it"? Not necessarily, depending on circumstances. As I said, it sounds like it has a lot of reverb, which is unusual for Big Band recordings (the style of the track). If I were listening in a quiet environment (as I am now), I would question the recording engineer's choices. But I'm not saying it stands out as obviously "doctored." However, in comparison to the original, it's clearly a very different audio track, and I'm pretty sure even an "average listener" could pick out the difference.

    Thus, this is NOT something that could easily just hide information in a well-known recording, at least not based on the sample they provide. Listeners will perceive a difference compared to a track they know well. Note also that they provide an instrumental demonstration -- I imagine the kind of manipulation they are doing will generate significant artifacts in vocal music that will be more recognizable as "off," which automatically makes this less useful except for Muzak tracks in elevators and lobbies or maybe commercials and other situations where non-vocal music is the norm. But maybe a hotel lobby is their primary application, in which case a recording with reverb might be less recognized when broadcast in a space that also might have a lot of reverb.

    Also, the vague descriptors they provide about their method are pretty obvious. There are lots of well-known auditory "illusions" (I discuss more in the note below) where you can make changes to an auditory signal which will be unnoticeable to most listeners. These include manipulating harmonics (which only might have a subtle shift on the timbre of a sound) as they describe, as well as using "masking" to distract listeners from changes. (Masking effects: by strategically placing "noise" with more random frequencies around, you can distract from deliberate changes to a signal. I assume this is the reason for the reverb and "junk" -- i.e., noise -- I'm hearing in the manipulated recording.)

    All of this is pretty obvious to anyone who knows about audio perception. The challenge, I imagine, is to introduce these changes to an existing audio file. If you had access to individual tracks of instruments that make up a recording, it would be trivial to encode a lot of data with little change of it being heard at all. Doing it with a composed track means you need to isolate specific instrument sounds and manipulate them, which is a lot more challenging without introducing artifacts. I assume that's why they say in TFA that their technique works best with a track that contains notable prominent notes (probably a foregrounded melodic instrument) which would allow them to isolate it better and also hide data in the background among the reverb and the reduced volume of the rest of the instruments.

    Not very sophisticated conceptually, but still probably hard to pull off in practice, even with this level of noticeable manipulation.

    ----
    **NOTE: I have significant musical training from a young age, which is shown to cause serious changes in auditory perception. I first discovered this in college when taking a class on the Psychology of Music and the professor put on a CD of "auditory illusions" (kind of like visual illusions, where dots that aren't moving appear to move and so forth). There are tons of known examples where things are staying the same in an audio sample but appear to be changing, or they are changing, but appear to stay the same.

    However, these work very differently for those trained in music for a very long time. I was taking the class with a friend who also had a lot of musical training. And we were looking around the room like, "Huh? You guys don't hear THAT??" The professor then explained that this is also well-known in music cognition: musically trained individuals have such different brain processing in the auditory cortex that they are generally excluded from most studies of sound perception, so as not to skew the data. (Unless it's a study specifically on musically trained people...) Piano tuners and the like are sometimes even more sensitive to these things.

    Another place this tends to show up is in perception of Autotune. If you look at online discussion of Autotune, it's clear that the vast majority of listeners can't hear the artifacts produced even in pretty serious Autotune. But to me, it's quite noticeable even when intended to be subtle. Although I accept it can be used deliberately as an effect, it really perturbs me when it's used to cover up pop singers who really can't sing. Or, even worse, when it detracts from a performance by people who obviously CAN sing and have been irrationally and unnecessarily doctored to sound more "robotic." It ruined the movie The Greatest Showman for me (which wasn't a great movie anyway...), as I've heard a live recording of the cast in rehearsal, and they were amazing. The movie audio sounds "processed" and awful at random points though -- why did they have to do that to people who CAN sing incredibly well? It removes their expressiveness and distinctiveness....

    Anyhow, back on topic, because I know how these auditory "illusions" work, I could come up with a dozen different ways to "hide" data off the top of my head, some of which it sounds like TFA is exploiting.

    • (Score: 2) by JoeMerchant on Wednesday July 10 2019, @02:05PM (1 child)

      by JoeMerchant (3937) on Wednesday July 10 2019, @02:05PM (#865378)

      This is a rehash of the "digital watermarking" kerfuffle back in early mp3.com days...

      Some people will notice, most won't. If it bothers you, don't pay for it. Plenty of options out there today.

      --
      🌻🌻 [google.com]
      • (Score: 3, Interesting) by AthanasiusKircher on Wednesday July 10 2019, @04:47PM

        by AthanasiusKircher (5291) on Wednesday July 10 2019, @04:47PM (#865424) Journal

        This is a rehash of the "digital watermarking" kerfuffle back in early mp3.com days...

        Actually, I think this is quite different. The difference is the intention, which influences the required implementation.

        Digital watermarking was mostly used for copyright ID purposes and such. Basically, a very weak audio signal (essentially inaudible) was introduced into the file. While too weak to be audible to listeners, it uniquely identified the file. And it was very difficult to remove without screwing up the actual audio, because it was effectively "noise" in the background at a very low level. The only way to mask it would be to introduce much stronger noise and then remove that noise, in the process wrecking the actual audio signal you want.

        The difference in this method is that they want to introduce a signal that can be detectable when the audio is played. The digital watermarking method is easy to spot when analyzing a file on a computer, but a microphone listening to that recording broadcast in a hotel lobby likely wouldn't be able to differentiate between the watermarked file and a non-watermarked version, because the "noise" for the watermarking is so low.

        TFA's method wants to make a signal in the audio that, say, your phone could pick up just by hearing the audio broadcast in a hotel lobby, yet is not perceptible as different to human ears. As I said, I think that would be easy to do with access to the individual instrument tracks before a recording is mixed. The sample recording in TFA, however, is very noticeably different.

        Good watermarking technique provides very little distortion, and I know tests show very few people can notice a difference in the audio sound when you listen back-to-back. However, if you haven't, please listen to the two samples in TFA. I bet the majority of listeners could probably notice the two tracks sound "different" (unlike with watermarking). They may not be able to identify precisely what's "different," and they may not be bothered by the distortion, but I bet they can hear it, say, in a triangle test. It's not subtle at all.

    • (Score: 2) by Rupert Pupnick on Wednesday July 10 2019, @02:58PM

      by Rupert Pupnick (7277) on Wednesday July 10 2019, @02:58PM (#865390) Journal

      Agree with your comments, and would add that the average listener today is probably using earbuds, and as such, I would not put a lot of credence in their assessment.

    • (Score: 2) by KritonK on Wednesday July 10 2019, @05:33PM

      by KritonK (465) on Wednesday July 10 2019, @05:33PM (#865435)

      I'm not musically trained like you, but I do have a somewhat eclectic taste in music and prefer CDs to MP3s.

      Putting on headphones and with a desktop computer's fans humming close to my head, I listened to the two samples, thinking incorrectly that the first one was the original sample. My reaction was:

      Sample a: Uh, ok.
      Sample b: Wow!

      Sample b was so much brighter than sample a, which sounded pretty awful, compared to sample b.

    • (Score: 2) by RamiK on Wednesday July 10 2019, @08:28PM

      by RamiK (1813) on Wednesday July 10 2019, @08:28PM (#865484)

      Doing it with a composed track means you need to isolate specific instrument sounds and manipulate them

      It's noticeably chirping a (2nd harmonic series?) overtone like a scratched record on the very first notes which are played solo trumpet.

      Some instruments might be fine with this. But small bodied wind and string instruments with short sustain don't have loud overtones to begin with so you're going to notice those for sure.

      --
      compiling...
  • (Score: 3, Insightful) by Rupert Pupnick on Wednesday July 10 2019, @03:06PM

    by Rupert Pupnick (7277) on Wednesday July 10 2019, @03:06PM (#865398) Journal

    There’s this thing called Time Division Multiplexing that’s been used to carry large numbers of phone calls over a single digital trunk for about half a century. What ever is the point of trying mess with the overtones of recorded program material to send a low speed data stream when you can just multiplex it in. Aside from which, if you do alter a recording in this way, the receiver is going to need the original recording as a baseline to decode and extract the encoded information. It may not be BS, but it is completely useless, except maybe in some spook application.

  • (Score: 3, Informative) by datapharmer on Wednesday July 10 2019, @03:13PM (1 child)

    by datapharmer (2702) on Wednesday July 10 2019, @03:13PM (#865399)

    Not sure what is supposed to be novel here... steganography has been around for decades. Heck even mp3 stego goes back 20+ years. Here's an example: https://www.petitcolas.net/steganography/mp3stego/ [petitcolas.net]

    • (Score: 2) by Alfred on Wednesday July 10 2019, @05:37PM

      by Alfred (4006) on Wednesday July 10 2019, @05:37PM (#865437) Journal
      I just put all my secret messages in the notes/comments field of the ID3 tags and boom no audible difference
(1)