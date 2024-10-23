Any sound can now be perfectly replicated by a combination of whistles, clicks, and hisses, with implications for sound processing across the media landscape:
Researchers have been looking for ways to decompose sound into its basic ingredients for over 200 years. In the 1820s, French scientist Joseph Fourier proposed that any signal, including sounds, can be built using sufficiently many sine waves. These waves sound like whistles, each have their own frequency, level and start time, and are the basic building blocks of sound.
However, some sounds, such as the flute and a breathy human voice, may require hundreds or even thousands of sines to exactly imitate the original waveform. This comes from the fact that such sounds contain a less harmonical, more noisy structure, where all frequencies occur at once. One solution is to divide sound into two types of components, sines and noise, with a smaller number of whistling sine waves and combined with variable noises, or hisses, to complete the imitation.
Even this 'complete' two-component sound model has issues with the smoothing of the beginnings of sound events, such as consonants in voice or drum sounds in music. A third component, named transient, was introduced around the year 2000 to help model the sharpness of such sounds. Transients alone sound like clicks. From then on, sound has been often divided into three components: sines, noise, and transients.
The three-component model of sines, noise and transients has now been refined by researchers at Aalto University Acoustics Lab, using ideas from auditory perception, fuzzy logic, and perfect reconstruction.
Doctoral researcher Leonardo Fierro and professor Vesa Välimäki realized the way that people hear the different components and separate whistles, clicks, and hisses is important. If a click gets spread in time, it starts to ring and sound noisier; by contrast, focusing on very brief sounds might cause some loss of tonality.
[...] 'The new sound decomposition method opens many exciting possibilities in sound processing,' says professor Välimäki. 'The slowing down of sound is currently our main interest. It is striking that for example in sports news, the slow-motion videos are always silent. The reason is probably that the sound quality in current slow-down audio tools is not good enough. We have already started developing better time-scale modification methods, which use a deep neural network to help stretch some components.'
The high-quality sound decomposition also enables novel types of music remixing techniques. One of them leads to distortion-free dynamic range compression. Namely, the transient component often contains the loudest peaks in the sound waveform, so simply reducing the level of the transient component and mixing it back with the others can limit the peak-to-peak value of audio.
Journal Reference:
Fierro, L. & Välimäki, V. (2023). Enhanced Fuzzy Decomposition of Sound Into Sines, Transients, and Noise. Journal of the Audio Engineering Society. doi: 10.17743/jaes.2022.0077
(Score: 5, Insightful) by bzipitidoo on Wednesday October 25, @05:33PM
Guess we're going to see some big improvements on Opus. I noticed that its predecessor, Vorbis, struggled with staccato noises. What pushed Vorbis the hardest was a Wagner piece (Tannhauser Venusberg music?) with clacking wooden blocks. Might've been just that one orchestration that added the wooden blocks. AoTuV Vorbis compensated by increasing the bit rate much more than I saw it do for any other music I threw at it-- 50% above what I'd specified.
(Score: 3, Interesting) by JoeMerchant on Wednesday October 25, @05:34PM (15 children)
I wonder about the relative "quality" of a 128kbps compression of audio using this decomposition vs .mp3.
Sounds promising, and also very hard to actually evaluate psycho-acoustically.
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 3, Interesting) by RS3 on Wednesday October 25, @07:17PM (14 children)
For years I've scoffed at things like "low-oxygen copper" wire, and the far too many gimmicky selling points for audio stuff. I grew up with music, got interested in home "hifi", especially speakers. Didn't have budget to mess with the high-end stuff, but built my own, tested, learned.
Always hated the harsh sound of "PA" systems. More recently, maybe 20 years ago, I fell into running sound systems. They've come a very long way, and it turns out I'm pretty good. All of this to say- I have a somewhat discerning ear.
Well, I've worked with / under some top people, especially in the recording world. One guy has several nominations, and at least 1 Grammy in recording engineering, and other awards in other award worlds (like gospel). Another guy- not sure if he has awards, but you see his eyes light up when he turns a knob on a recreated vintage compressor, but you barely hear a difference. IE, there really is such a thing as "golden ears". I wonder if I had gotten into music / audio production at a young age I might have developed "golden ears". I'm definitely an EQ freak.
Point is, some people quickly hear the various distortions and artifacts of audio compression, and blind tests have proven they do hear it. I might be used to higher frequency distortion and aliasing in general, so a 128kbps mp3 might not sound excellent, but tolerable. Maybe it's all about what you're used to? And we all have our sensitivities and "triggers".
When I export to .mp3 I usually use 192kbps for music (or better if the person wants it) and 128 for pure speech / misc. background music.
(Score: 5, Funny) by DannyB on Wednesday October 25, @07:41PM (7 children)
I discriminate by age of the music.
Anything newer than, oh, about 1990'ish is safe to export to high quality 64kbps mp3 without losing anything.
In the interest of preventing childhood obesity, I'm going to eat all the Halloween candy by myself.
(Score: 2) by RS3 on Wednesday October 25, @08:26PM
HUH?
(Score: 3, Interesting) by RS3 on Wednesday October 25, @08:41PM (5 children)
To be sirius for a moment, I'd say it really depends on the quality of recording, production, copying, playback equipment. All kinds of compromises and tricks were done to get the most out of magnetic tape recording. Recording engineers knew how tape is quite non-linear, and does its own form of compression, so the main trick was to know how "hard" to hit the recording heads and tape depending on what you were recording and how. That contributed to much of the "warmth" many people like about older analog recordings, esp. analog masters. I think the market grew with the recording techniques. I'm talking about the really good stuff. There were plenty of stinko recordings back in the day.
Now with digital, it's a whole new world and recording / mixing / mastering engineers are trying to get "warmth" without sacrificing anything. Personally I think the unpleasantness of newer DDD recording is in the compression. Too many compressors (that I use every week) are quite linear, and I think a logarithmic compression curve, more like magnetic tape saturation, maybe would sound better. It would allow things to sort of pop out more where they should. One trick is to use softer attack on the compression, but even that is a time thing rather than a sound level thing.
But all of that, and the very common practice of super compressing the whole mix makes it even less acoustically pleasing. For sure some of it was trying to make your CD as LOUD as possible. But some of it goes to quantization error if you use less than all 16 bits. Gotta get your money's worth of them bits, right? Going to 24 bits, as in DVD audio, makes an amazing difference, as you may know.
I dunno- not enough hours in the day to try everything, and I'm probably going to semi-retire from it soon. Too many other things I want to do.
(Score: 3, Interesting) by JoeMerchant on Thursday October 26, @02:51AM (4 children)
>a logarithmic compression curve, more like magnetic tape saturation, maybe would sound better.
That isn't a software option in every DAW by now? Seems like it easily could be.
>the very common practice of super compressing the whole mix
I like Heart. Jupiter's Darling just about made me throw up on the first listen. That compression profile seems to me only suitable for listening through underpowered PA speakers whilst riding in the cargo compartment jumpseats of a C-130, with active air defense flak outside.
>not enough hours in the day to try everything, and I'm probably going to semi-retire from it soon.
I did a MIDI controlled monophonic digital synthesizer for my EE senior project, and expanded it to a parallel processing synthesis system for my Masters' thesis. But such things didn't pay money in the real world in the early 1990s. Coincidentally, I (my whole digital synthesis class, actually) got a tour of the BeeGees Middle Ear studio and demo of a new sampling synth on Miami Beach in late 1989, and ended up working for a medical device company right next door (shared the parking lot) from 1990 until the med device company moved across town in 2001. In all those years I don't think I ever saw anything notable happening around the studio - maybe things were happening inside and I just never was in the parking lot at the times to see people go in or out. Well, I take that back - reflecting on the 8000 or so times I was in that parking lot (morning, lunch, after work and there was a while I'd come in early and ride my bike around the beach before work), I think I did see a flamboyantly dressed character or two around the entrance of Middle Ear once, maybe twice.
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 2) by Rich on Thursday October 26, @10:43AM (3 children)
That wasn't Dade, by chance?
(Score: 2) by JoeMerchant on Thursday October 26, @11:54AM (2 children)
Dade Couunty, yes. Don't know a Dade company.
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 2) by Rich on Thursday October 26, @12:18PM (1 child)
That would have been Dade Reagents Company / Dade International, makers of the Dimension clinical analyzers. They merged with Behring Diagnostics to form Dade Behring (wiki: "With 2006 revenue of more than $1.7 billion, Dade Behring was the world's largest company solely dedicated to clinical diagnostics"), which was eventually acquired by Siemens and is now part of the "Siemens Healthineers" subsidiary. At some point in time however, Dade seem to have relocated from Miami to Deerfield, IL, and there were press articles from 1997 that Dade is about to close two factories in Miami.
(Score: 2) by JoeMerchant on Thursday October 26, @12:31PM
Ah, we were small time med devices. I tried to interview with Cordis pacemakers after graduation, but they had recently been padlocked / shut down by the FDA (for good reason.). Also interviewed with Coulter blood analyzers (automatic cell counters, gas analysis etc.). Their parking lot alone (overflowing, reserved for names on dozens of spaces) put me off working there, and me not speaking fluent Spanish pretty much kept me unhired by them anyway.
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 3, Interesting) by JoeMerchant on Wednesday October 25, @07:46PM (5 children)
Yeah, 128 was the "standard" when Napster was a thing and it was a crappy standard, I could easily tell when a restaurant was playing 128 compressed music, without even trying to listen for it.
Sort of like ultra high performance cars, most people are too old to really enjoy audiophile equipment by the time they can afford it.
Worse than fast cars, when your ears age with your own personal profile of deaf tones and tinnitus triggers, you both lack the sense-sensitivity to tell the difference between quality reproduction gear and emperor's new clothes low oxygen copper, etc. However, you probably will strongly prefer some gear to others based on your personal hearing profile.
In my left ear I prefer the Bang and Olfson, but my right ear prefers Bose... ;-)
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 2) by RS3 on Wednesday October 25, @08:43PM (3 children)
Yeah, and like cars, if younger people get their hands on good stuff they tend to push it and destroy it. (but not me, no never)
Okay, that's very interesting...
(Score: 3, Funny) by JoeMerchant on Wednesday October 25, @09:26PM (2 children)
First: I have the same sports car I bought when I was 24, in 1991... it started at 100hp, but turboed to 200hp in 1997 and did a 240hp V6 swap in 2021, haven't destroyed it yet...
And, of course the ear-brand preference thing was a joke ;-) but, I had an ear infection once which gave me very specific tinnitus pain which was only set off by certain people, really one in particular... very strange having to try not to wince everytime she spoke.
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 3, Funny) by Mykl on Wednesday October 25, @10:18PM (1 child)
And how is your wife these days?
(Score: 2) by JoeMerchant on Thursday October 26, @01:58AM
Yeah, thankfully not her - this was a consultant visiting the office for a couple of days and her voice sounded normal, not unusually high or low or otherwise distinguished, except for that exquisite pain in the Bose ear every time she spoke.
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 0) by Anonymous Coward on Thursday October 26, @04:01AM
> In my left ear I prefer the Bang and Olfson, but my right ear prefers Bose... ;-)
Brings to mind a line I heard in the sound reinforcement business c.1980 (a friend owned a sound company back then)--
No highs, no lows, must be Bose.
(Score: 4, Informative) by ledow on Wednesday October 25, @09:05PM (4 children)
Fourier was very clever.
A 1D Fourier transform (actually used as part of a "discrete cosine transformation" (DCT)) is basically how we get compressed MP3 from an audio signal.
A 2D Fourier transform is how we get a compressed JPEG from a 2D image.
A 3D Fourier transform is how we get a compressed MPEG/etc. from a 2D image changing over time.
All of it just looking for simple, regular "waves" that make up a complex signal across a certain dimension (e.g. through time in a movie, etc.).
It takes less data to store the parameters of the waves (especially if you sacrifice the lower-order waves that humans can't see/hear or which won't make much overall difference to the signal) than it does to store the data they were obtained from, and it's quite easy to reconstruct a good approximation of the original data from the wave parameters.
One simple mathematical trick has made our modern world consume literally billions of petabytes less data than it would take in a raw form.
I highly recommend this explanation (and live demonstrations) of how this works:
https://www.jezzamon.com/fourier/ [jezzamon.com]
(Score: 2) by krishnoid on Wednesday October 25, @09:57PM
That *is* a whole yotta bytes saved.
(Score: 2) by JoeMerchant on Thursday October 26, @02:54AM (2 children)
Meanwhile the Laplace transform is used primarily to torture EE students and never heard from again after graduation for most of them.
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 2) by Rich on Thursday October 26, @12:36PM (1 child)
The Laplace transform (not that I could do it on a sheet of paper now...) is relevant for modeling impulse response. I haven't seen sources, but I assume that the quite popular Kemper modeling guitar amplifiers use it at least in some way to reconstruct the behaviour of classic tube amps. Actually quite on topic for TFA.
The TFA's quoted main interest of slowing down audio is done quite well (for music anyway) by "PaulStretch", which I think is mostly Fourier-resynthesis based. (See for example https://www.youtube.com/watch?v=FsJdplLB1Bs [youtube.com] for how it turns even the infamous Window 95 startup sound into a chillout track, or https://www.youtube.com/watch?v=XiKWfcy-Z70 [youtube.com] how a Radiohead song gets turned into an ambient concept album.)
Finally, wrt the thread start, at least the classic codecs don't use a 3D transform for the time dimension, but rather increasingly nifty motion estimation.
(Score: 2) by JoeMerchant on Thursday October 26, @06:57PM
To be fair, I would be willing to bet there is Laplace transform going on in MRI imaging because... all commercial MRI imagers (that I have worked with) return voxel values on the imaginary plane, either as real and imaginary components or as a vector magnitude and direction.
The MRI images that radiologists read are all based on the magnitude values, however... it so happens that the phase angle of the voxel values vary linearly with (small) changes in temperature, which usually isn't too interesting unless you're intentionally heating (or, I suppose, cooling) body tissue and you want to do it in a controlled manner.
To anyone, or anyone you know and care for, who may have a brain tumor - especially a brain tumor labeled as "inoperative" - for a long time now there has been a device called "Gamma Knife" which uses ionizing radiation (Gamma rays) in a Ghostbusters "cross the streams" fashion where lethal dose is achieved at the intersection of the beams, hopefully on that tumor (to be fair, they are pretty good at targeting in the brain). Unfortunately: A) this frequently leads to something called "necrotic fringe" which is just as bad as it sounds, particularly in your brain, because it spreads... B) Gamma Knife devices were widely sold, and EXPENSIVE, so many owners of these devices are still recommending them for first-line treatment even today, C) at least a couple of devices based on MRI monitored heating of tumors exist and have been in use for 10+ years, but they are still relatively rare. The distinction is important: properly thermally ablated tumor tissue (such as you get when you monitor the heat application in real-time in an MRI) does not develop necrotic fringe. These thermal ablation devices are often employed to "clean up" necrotic fringe, which is a shame because they could better be used to kill the tumor in the first place with high rates of successful treatment and low rates of side effects.
10+ years ago I worked for a company developing "Image-guided thermal ablation with MR-based thermometry" using a fiber optic probe (very thin) to deliver laser-heat to tumors inside the brain. I understand there are also devices now which deliver the heat with ultrasound: not even a small hole and probe required. As with everything, each device has their own strengths and weaknesses, but if you or a friend are referred for Gamma Knife therapy I would strongly suggest researching the alternatives and getting serious about finding a 2nd opinion that is at least open to the use of "Image-guided thermal ablation with MR-based thermometry".
The phase angle changes from those Laplace transforms are how they know when a tumor has been cooked "just enough" to die, with minimal impact to surrounding tissues - also a nice thing in the brain.
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 2) by inertnet on Wednesday October 25, @10:28PM
That sounds like it could be used for voice recognition, identifying people by voice.