Reading or copying? Searching for AI certainty in copyright law

In simple terms, generative AI systems are trained by exposure to vast data sets. They build and refine models based on this data. They learn to identify patterns – and to iterate and alter those patterns to create new content.

However, potential intellectual property issues arise at both the input and output stages: with the data an AI is trained on, and with the content it produces.

‘I’d split it into two parts’, says Osborne Clarke IP partner Robert Guthrie. ‘One is the extent to which IP rights might be infringed in developing and training these AI systems. The other is what comes out of the AI systems – if there’s some sort of content being generated, whether that’s a piece of text or a picture, that might infringe on one or more of the pieces of content that have been ingested.’

Each of these intertwined issues has enormous commercial implications for AI developers and users, and for content creators and rights holders. And industry players are keen to see them resolved.

‘There’s pressure on governments from developers to allow broad text and data mining exceptions’, says Guthrie. ‘But there’s also pressure from content owners, who say that broad exceptions would weaken their IP protections.’

Getty case

And some companies have moved beyond pressuring governments. In January, Getty Images announced that it had filed a suit in the English courts against Stable Diffusion developer Stability AI. In its statement, Getty argued that Stability had ‘unlawfully copied and processed millions of images protected by copyright’, without licence, for commercial purposes, and ‘to the detriment of the content creators’. It followed up with a suit in the US federal courts in February.

‘The Getty case is interesting,’ says Priya Nagpal, a partner and barrister in Simmons & Simmons’ IP group. ‘A lot of copyright holders suspect that AI models are being trained on their data sets.

If a computer is creating works, the first owner of the copyright is the person that gave the computer the instructions that led it to produce the work.
Andrew Moir, Herbert Smith Freehills

But the issue is, how do you prove that your data is being used in this way? In its lawsuit, Getty Images has alleged that Stable Diffusion has been producing images that have a modified version of Getty’s watermark on them, but this type of evidence may not always be available.’

For Getty, represented in the High Court by a team from Fieldfisher led by Nick Rose, the watermark is crucial. Pinsent Masons IP and IT partner Cerys Wyn Davies highlights this point. ‘The Getty case is perhaps unusual, in that the AI tool generated images that had Getty’s watermark on them. Getty is therefore able to demonstrate use of its content.’

If the courts accept this argument, Getty and other rights holders hope that they will produce a ruling that addresses the broader issue of what rights AI developers have in using data to train their systems. In Davies’ words: ‘It’s good that it’s hit the courts, as it is hoped that it will provide greater certainty for both AI developers and content developers/owners.’

But the courts may not be able to provide the certainty that companies like Getty seek. To some extent, the issues at play in the case may be covered by existing UK law. For Fladgate partner Eddie Powell: ‘What you have with Getty, at its heart, is a straightforward infringement case. You can argue all you want about the output, but if there’s infringement in the input, the output doesn’t matter.’

Virtual reality

However, novel questions arise as technological development pushes AI to the frontiers of existing legal regimes. ‘Most of what’s happening in the AI space has an analogue in the normal human experience’, says Andrew Moir, head of cyber and data security at Herbert Smith Freehills. ‘If I go into a bookshop and buy a book, no one is going to say that I’ve infringed copyright if I read it. But, if I start copying things out verbatim, at that point it would be an infringement issue. What’s happening with AI training, in terms of this analogy, is reading the book. But when it comes to the AI engine creating something from what it’s read, that’s where the copyright issue comes in. The question, then, as a matter of policy, is the extent to which AI is allowed to “read the book”.’

AI output also raises issues of authorship. Moir notes that the UK is somewhat unusual in that its existing copyright regime addresses this question. ‘The Copyright, Designs and Patents Act says that if there’s no human author, then the author of a work is the person who undertook the ‘arrangements’ necessary to create it. So if a computer is creating works, the first owner of the copyright is the person that gave the computer the instructions that led it to produce the work.

‘The issue we have now, for established AI engines, or where someone is using a third-party AI engine, is that the incremental effort involved is marginal – it could be as simple as asking the AI engine a question. Is it undermining copyright law to award first copyright to someone who, in practice, has done very little?’

Again, industry players are eager for clarity. Says Guthrie: ‘You might start to get judgments that confirm that content produced by AI systems isn’t protected and that will have significant commercial implications. So, Government will have to step in and act.’

Progress on this front has been slow. The Government launched a consultation on AI and intellectual property in 2021 and published its response in June 2022. But the response was inconclusive, and the Government is yet to make a formal recommendation.

You might start to get judgments that confirm that content produced by AI systems isn’t protected and that will have significant commercial implications.
Robert Guthrie, Osborne Clarke

Nagpal outlines the state of play. ‘Things have moved on a bit following the UK Intellectual Property Office’s consultation. In its response last summer, the UK government suggested that it would introduce a new exception to allow text and data mining for any purpose. At the moment, there is an exception for text and data mining for research for non-commercial purposes only, provided that the person carrying out the mining has lawful access to the data in question. If a website had terms of use with restrictions on text and data mining of its materials, then a developer wouldn’t have lawful access to those materials in the context of this exception.

‘But several months ago the UK government announced that it wouldn’t be moving ahead with the proposal. Then in March 2023 it published a response to a policy paper authored by Sir Patrick Valance, which suggested that text and data mining should be allowed in the UK, if the aim is to promote AI development and innovation in the UK. In its response, the government agreed that there was a need for greater clarity and has indicated that the Intellectual Property Office will publish guidance and a code of practice this summer on what sort of text and data mining will be permitted in the UK.’

Overregulation worries

The Government is keen to promote AI development. It published a white paper entitled ‘a pro-innovation approach to AI regulation’ on 29 March. The associated press release claimed that AI is ‘thriving’ in the UK, with more than 50,000 people employed and more than £3.7bn contributed to the economy in 2022 alone. Its stated aim is to ‘unleash the benefits of AI’, to ‘turbocharge growth’.

Legislators are especially concerned about overregulation. ‘The approach the UK government has taken is to rely on existing regulators and existing legislative frameworks, rather than to create new law that will likely soon be hopelessly out of date,’ says Guthrie.

April’s Digital Markets, Competition and Consumers Bill, for instance, proposes to give new powers to the Competition and Markets Authority (CMA), rather than to establish a new body to regulate AI development. The CMA launched its own consultation in May, with a report due in September.

For Guthrie, the Government’s approach to IP issues is similar. ‘They’re not trying to create hard and fast rules. They’re trying to create common industry standards, to foster a common understanding between AI developers and content right holders, to get them to agree to appropriate licensing regimes. The rationale is that those industry players are best placed to work out what best fits their mutual interests.’

Lawyers in the sector await the release of guidance with bated breath. Not least because a failure to reach agreement could prompt further action. In Guthrie’s view: ‘There is a bit of a threat from the Government that, if the current approach doesn’t work, it will look again at legislation.’

For Nagpal, ‘the fact that the government has changed its position on this demonstrates the importance of lobbying on both sides, from both creatives and AI developers.’

Many lawyers also pointed to potential issues with international alignment. ‘A lot of these issues are multinational, which raises the issue that a lot of the law here isn’t aligned at the moment,’ says Moir. ‘It’s a bit of a minefield.’

Powell concurs. ‘There would be significant problems if the UK suddenly changed the law in a way that put it out of step with other countries. If you allow text and data mining for commercial purposes in the UK, you get to the question of whether the AI’s output is an infringement. The developers’ starting point in such a case would be to say, we did the training lawfully in the UK. But then, is it an infringement if they sell the output in another jurisdiction where the training would not have been lawful?’

Alignment faces significant obstacles. For one, different countries have different priorities. ‘Europe comes at things from a consumer’s or an artist’s point of view’, argues Powell. ‘Whereas the UK at the moment comes at everything from the perspective of, “how can we make things easier for business?” So, the political framework doesn’t seem to be there for meaningful alignment.’

Moir takes a similar view. ‘There is a history of bringing copyright law into line internationally through treaties and conventions. So, alignment isn’t beyond the realm of possibility. But these things take a lot of time, and it’s unclear if governments are going to prioritise this, especially given everything else that’s going on in the world. You also have a situation where there are some quite big differences among national copyright laws. Different countries deal very differently with fair use, for example.’

‘Europe comes at things from a consumer’s or an artist’s point of view, whereas the UK at the moment comes at everything from the perspective of, “how can we make things easier for business?”.
Eddie Powell, Fladgate

Transparency would be key to any solution. As the Getty case shows, a crucial question in determining infringement is whether rights holders can prove that the AI had been trained on the materials that they allege it infringed.

Here, says Moir, a core difference between AI and humans provides some scope for certainty. Unlike humans, AI systems can, at least in theory, produce a record of all the data that they were exposed to.

‘It may be that AI systems have to give an audit trail setting out where every ingredient of a work comes from’, he says. ‘It wouldn’t be impossible from a technical perspective. But I suspect that the reason AI systems don’t do it is because it would create a gargantuan amount of data to keep track of the historical state of the AI engine over time.’

For all the concerns raised about regulation hampering innovation, there are no signs that progress on AI is slowing. ‘Huge strides in development have taken place’, notes Guthrie. ‘So current law is not hampering it that much.’

Moir goes further. ‘It’s not as if people are saying, “let’s pull up the stumps on research, on commercialising this technology”. At the moment, companies are taking a calculated risk. They’re trying to address these issues in their terms and conditions as best they can. But, to the extent that they can’t, they’re still moving ahead.’

For Powell, this only increases the importance of clear rules and guidance. ‘Rights holders need to be clever about controlling and exercising their IP rights. It’s similar to the threat posed by Napster and Pirate Bay. And what they taught us is that you have to provide a legal revenue-generating route. Because if you don’t, rights holders won’t get anything.’