David Courtier-Dutton – Co-founder, SoundOut
Over the past 12 months the impact of AI on the creative industries has been both dramatic and un-nerving and the music industry has been far from immune. Artists and music organisations are beginning to question the implications for their business and what, if anything, they should do now to mitigate this.
Taking a view from 30,000 feet, the creative industries appear to be under assault, already we are seeing the first examples of AI generated movies (here is an early example).
We can imagine a time - not too far in the future - where we log onto Netflix and simply request an original action movie based in our home town featuring a 30 year old Bruce Willis and a 22 year old Marylyn Monroe with a tragic start, action based light hearted narrative, a brutal twist, a happy ending and a hip hop soundtrack. And that original movie will be created on the fly and streamed to us in real time. If AI can do that (and it’s not far off), creating a compelling hit song does not sound like a big ask.
Back at ground level, recent examples of the capability of deep fakes and original gen-AI are proliferating. From Johnny Cash singing Barbie Girl to deep fake videos that are indistinguishable from the real thing – there are hundreds on YouTube. But is AI generated music a passing fad or an terminal threat to creators?
Music creativity is under siege
The face of musical creativity has been unceremoniously slammed into the wall of generative AI in a potentially existential moment that could threaten the entire future of the music industry as we know it.
Until quite recently the threat of computers supplanting composers and artists seemed laughable. Examples of computer generated music were pedestrian in their sophistication and easily dismissed as a sideshow. However, over the past 6 months, things have changed at an unprecedented speed. Most recently with the release of Meta’s AudioCraft and Futureverse JEN-1, AI music generators that could already give stock music libraries a run for their money.
Foundation models (such as GPT-4) can be thought of as astonishingly bright polymaths or newly minted super graduates able to learn a new skill purely by what is called ‘fine tuning’. In a music context think of them as a potentially gifted musician who needs only to be shown how music is composed to become an expert overnight. This music education is easily done – just train them on 10,000 high quality compositions and they will be able to create original compositions on demand based on any free text instructions.
But there is an argument that computers don’t do emotions - so a computer can never truly be creative and create emotionally engaging music. However we must remember that for the consumer it is the emotion they feel rather than the emotion used to create it that will be important. A song can move us - even if we know don’t know the artist or the backstory and AI can certainly ‘learn’ how to create emotional music. There are other considerations too:
The neural networks that train foundation models are deliberately structured similar to the
human brain, they start with the most rudimentary connections but learn over time through experience, gradually hardwiring billions of ‘neural’ connections. This is the same process as how children learn. The difference of course is that one relies on silicon hardware, the other on meat hardware, but both are powered by electrical impulses and both learn over time through experience. But emotion is not a meat attribute, it is something that that has evolved over millions of years and there is no fundamental reason why it might not evolve in computers.
Interestingly (scarily) some foundation models are already displaying ‘emergent’ abilities – skills they mysteriously possess that they have not been explicitly trained on – a result of spontaneous ‘neural’ activity.
We do not understand how the brain works, it is a ‘black box’ and so are foundation models like GPT-4, as a result it is far too early to state with any level of confidence that tomorrow’s models will not become sentient (self-aware) – they have been designed to help and empathise with humans and the ability to do this on a truly emotional level is something they may realise is an ability that is firmly within their core remit and can improve their effectiveness.
Finally, recent academic research suggests that GPT-4 is already more creative than over 90% of humans. This does not mean that GPT-4 is yet intrinsically more creative than most humans, just that the content it produces is perceived to be more creative and, for an industry reliant on consumer perception, that may be all that matters.
If that were not enough, here at SoundOut we have an emotional DNA framework of music generated from a huge consumer study involving 500,000 consumers. The framework maps the emotional correlations between over 200 human emotions within a musical context and, using AI, enables us to measure the precise emotional impact of any music composition without any human involvement. No-one has asked to licence this yet, but if they did it could relatively easily be used as an additional training set for an existing Music Generative AI model and spit out emotionally powerful music based on any custom emotional requirement.
The commercial implications for music
Over the past few years we seen the cost of media consumption plummet to almost zero and it now seems we are entering an age where the cost of creation is following a similar path.
Consumers are fickle and gravitate to wherever they perceive the best content to be available in the most accessible and cost effective format. If AI generated music loses its negative connotations and becomes perceived to be as (or more) enjoyable than the human generated alternative then that is where consumers (and brands) will gravitate.
This will not happen overnight, as change typically takes time but (see below) it is probably less than 12 months away.
This is not just an observation, the industry has seen the writing on the wall and is now piling in. In the past month we have seen reports of Universal and Warner forging deals with the largest AI companies. This is not to block them from the music industry, but to work in partnership ensure they retain a decent slug of revenues from the new music industry that may, or may not, benefit artists and composers going forward. They rightly state that artists will have the option to opt in or out of licencing their voices - but there may come a time when not opting in equates to opting out of a major source of revenue.
Who should be safe and who should be concerned?
First and foremost established artists with substantial fan bases are likely to be safe. Those that ‘made it’ before 2024 have built genuine relationships with their fan bases and have a body of work already fused into the public subconscious. I don’t see the live music scene being impacted either, as that delivers a unique emotional connection with the artist that computers genuinely cannot replicate – we’ll need AI androids before that happens. So Taylor Swift should be OK.
However I believe the $1 billion stock music industry could be the first to come under major threat. Their commercial model is based on licencing large volumes of music for commercial use. If brands and filmmakers can get AI to custom compose and sync music to perfectly match a scene or commercial in a style and with desired emotional punch at a fraction of the cost of library music why wouldn’t they? It is then a small step to compete with custom music compositions.
From there the focus will be on new hit creation – as that is where the consumer money is. A million streams of an AI song on Spotify will generate around $3,000. Boomy, the AI music app, allows you to create and release up to 250 songs a month – at a cost of $30/month. It has created over 14 million songs over the past 4 years (some would say all mediocre), around 14% of the worlds recorded music – and this was all with old, outdated AI. But if we get to a point where 1,000 new songs a day with vocals licenced from established artists and melodies and lyrics optimized for hit potential start dropping, its going to be harder than ever for new musicians to emerge and prosper.
I appreciate that this all sounds a bit bleak. While there are a flurry of law suits underway that will help to clarify what can be used to train AI models and the copyright position of AI music, don’t be fooled into thinking that this will halt the flood. People who think the law will ride to the rescue, or rely on a misguided belief that humans have a monopoly on creativity may all be disappointed. The current state of AI already has people concerned, but we are only at the start of the AI arms race…
What does the short term future look like?
We are not in the middle of the AI revolution. We are just at the very start. Generative AI today is where Napster was in 1999 and it took around 12 years and $100m venture funding for us to get to mass adoption of Spotify. In contrast, the first half of 2023 alone saw over $15 billion invested into Generative AI/Foundation model companies, up almost 500% on 2022 (Pitchbook). There are just a handful at the moment but within 2 years there will be hundreds - each in turn powering thousands of fine-tuned AI applications targeting almost every niche imaginable.
While creatives may be pondering the potential risks to their livelihoods in the medium term, the stakes are much higher.
Chat GPT already has a measured IQ of 155, just short of Einstein at 160, and things are not going to slow down, indeed quite the opposite. The hardware powering the brains behind these foundation models is getting monstrously more powerful. Nvidia, the company that makes over 90% of the processors to drive the Gen-AI market is now selling a new version (the H100) that is up to 30 times faster than the one Chat GPT was trained on just 2 years ago.
The next generation of foundation models are less than 6 months away and these are expected to be over 10 times as powerful as what we have today – read that as an IQ of over 1,500 (yes - one thousand five hundred). By the end of 2024 they are likely to be 100 times as powerful as today seamlessly ingesting blended text, audio, video and imagery in the same way as humans do (but a million times faster). That means that humanity will have access to an unlimited number of super-powered genius minds in any and every discipline imaginable – and that includes music.
There is also a growing belief that AGI, or artificial general intelligence is not far off. This is the point at which machines think for themselves without needing instruction, exceeding the power of the human mind and, for the first time in history, relegating humans to the second smartest beings on earth (and dolphins to third). While the dolphins may not care, when computers start making autonomous decisions, most of the worlds experts believe this would be an existential threat to humanity. To be honest, whether they have emotions or not will be irrelevant at that point.
There is one final issue. Nuclear bombs don’t make bigger, badder nuclear bombs, but somebody decided to teach AI to code. This means it can reproduce. Indeed, it can already code so well that multiple experts predict that within 2 years over 50% of software programmers will be redundant (probably an even worse prognosis than for musicians). If you put together an unlimited number of super-genius AI models with IQs 10 times higher than smartest human that ever lived, an ability to program computers in seconds rather than months with even a whiff of sentience…that could translate into a self-perpetuating warp speed AI acceleration that may very swiftly decide humans are no longer required. There is a distinct possibility that GPT 5 may be the last foundation model ever created with meaningful human involvement.
In a recent, one sentence, letter AI industry experts (including the Chat GPT founder, the CEO of Google DeepMind, the CEO of Stable Diffusion and over 350 other AI industry leaders, stated:
“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
If that is the considered view of almost all of those best placed to profit from AI, the rest of the world needs to sit up and listen.
I don’t believe we are in a Chicken Licken scenario – I think for many industries, music included, the sky really is beginning to fall in. Alternately, to carry on where we are today and simply dismiss music AI as a passing fad or something that will be controlled by copyright is at best Canutian and at worst suicidal.
The reality is that we must embrace AI as best we can and hope that when the dust settles there will 1. be humans and 2. a profitable role for human creatives going forward. In the meantime get prepared, do your research and plan for the future. AI is no longer a sideshow and, within 24 months, it may control everything.
18 August 2023