Benn Jordan’s latest video proposes a way to fight back when generative AI music services rip off music for their data sets. It’s not ready for prime time yet, but it does offer a window into the wild, wonderful world of adversarial noise poisoning attacks.
Now, if you run in circles like mine, you’ve already gotten, “Hey, have you seen this new Benn Jordan video?” already, and I suspect gotten as far as watching it, but here you go:
Benn’s approaches should have some real legs. There are two reasons to be optimistic. One, this family of techniques works on audio, so it covers the so-called “analog loophole“: it functions anywhere sound is heard. Two, there’s a potential to use different methods, thus obfuscating the results. You can also validate the results, meaning these could be updated if services react.
It’s funny; when I spoke to Roland’s Paul McCabe about that company’s AI initiatives, I suggested a speculative design where you could press a button and block a performance from being trained. Benn actually went directly to the data science researchers to find out how that could be done – even in a live instrumental performance. So yes, this is possible. (Of course, you count as a CDM reader if your favorite music in the entire video is the targeted pressure wave attack at 22:00. The real AI attack would be inaudible to humans, not audible as it is here but – that’s my kind of music.)
Most important of all, though: these methods reveal how training sets and generative audio relate. Like Benn, I have interest in generative sound, algorithmic music, machine learning. It’s not about being pro- or anti-AI like this is a sport. We’re talking about the critical examination of a technology that is sucking up a huge amount of resources and reshaping the world around us. What these techniques do – even if the generative models find ways to circumvent them – is reveal something about how the technology works. It busts some of the mythmaking at a time when the world needs transparency, not another Wizard of Oz trick.
The big gotcha – spoiler alert – is that this requires high-end GPUs and a massive amount of electricity and time to pull off. Computation doesn’t magically consume less power on its own, either – least of all with semiconductor trade wars looming. But now that the idea is out there, the challenge would be devising a more efficient method; this at least works as a proof of concept.
In short, I’m for it. And I do expect a fear of training will stop some people from sending music to streaming services. It’s not hard to envision, as Benn does, a world where distributors license this technology to give producers added peace of mind. Remember in the early 2000s when we worried about protecting music from human fans? Now we’re protecting it from generative AI.
It’s worth watching the video, though, because the whole world of adversarial noise gets fascinating – and is a way to imagine hacking increasing AI and surveillance. So this is all about more than just the Poisonify concept (though that’s already essential).
Into the data science
Here’s more on the harmony cloaking tools developed at University of Tennessee Knoxville:
You can’t hear it, but University of Tennessee tool ‘cloaks’ songs to protect music from AI [Knoxville News]
The site/paper also has a survey:
HarmonyCloak: Making Music Unlearnable for Generative AI
The instrument classification attack, as far as I know, is novel.
Even if it didn’t find its market in digital distribution and DSPs, as Benn notes in the video, the AI detection algorithm research Benn did also remains compelling:
Benn Jordan has made an algorithm that can detect if music has been made by AI or not
You’ll find a lot on adversarial noise, in different contexts – because that can be a method of training neural network classifiers and a way of attacking those systems. (There’s “friendly” and “unfriendly,” basically – even though I know that conflicts with what the word “adversarial” normally means. Think of it as “I’m challenging you to a game of chess to teach you something” versus “I’m challenging you to a game of chess to mind control you.” Sort of.)
And this stuff is moving fast. Here’s Los Alamos National Laboratory, the folks who have never been associated with anything other than friendly uses of science and technology:
New AI defense method shields models from adversarial attacks
Or some 2022 proceedings of the International Conference of Neural Information Processing Systems:
Friendly noise against adversarial noise: a powerful defense against data poisoning attacks
2020 and IBM Watson: [PDF]
Noise is Inside Me! Generating Adversarial Perturbations with Noise Derived from Natural Filters
Targeted pressure wave attacks as discussed in the video are being deployed against machines, but they’re also known for use in sonic weapons against humans. That’s likely what was causing serious neurological symptoms in pulsed microwave energy in Cuba, the so-called “Havana Syndrome” – sound as a neuro-weapon.
But as for using sound against machines, here you go:
Neural Adversarial Attacks with Random Noises
And the same broader category of mechanisms that can be used to attack can be used in training:
Modeling Adversarial Noise for Adversarial Training
Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking
An actual data scientist might have some better ideas; I just play one on TV.
Anyway, yes, my idea of fun is making music, I really don’t find genAI music to be fun, but I do also enjoy imagining generating unheard musics by rewiring machine listening to categorize things incorrectly.
And, I mean, obviously we need to do some kind of music compilation of (audible) adversarial noise attacks, though I guess we should be careful how we distribute it. I think I’m only interested in a malfunctioning Alexa, so this might convince me to try to buy someone’s older model just to mess with it. I … also liked the screwed-up musical results better.
If that meandering discourse put you to sleep, here, let me wake you up again with rage.
Here’s the clip that inspired the image at top; it’s a quote from Suno.ai founder Mikey Shulman:
Oh, sorry, that is the text as spoken with an AI-generated version of Anakin Skywalker’s voice. (I was just inspired by this conversation; the sentence construction seemed similar.) Here is the original clip in context, in which he also says people don’t like running (again, maybe suggesting a Darth Vader sort of solution):
“It’s not really enjoyable to make music now… it takes a lot of time, it takes a lot of practice, you have to get really good at an instrument or really good at a piece of production software. I think the majority of people don’t enjoy the majority of time they spend making… pic.twitter.com/zkv73Bhmi9
— Mike Patti (@mpatti) January 11, 2025