Applications of machine learning are no longer limited solely to technological advancement and development. Artificial Intelligence has started establishing its presence in creative and intellectual areas of art and inventorship as well. With AI programmes like Midjourney, StableDiffusion, DALL.E-2 that use prompts to generate images, and giant tech corporations investing in the creation of generative AI, questions on IP ownership and ethical sourcing of data set for training the algorithm loom over the horizon.
After the AI boom in the 1980s and its quick collapse later in the decade, companies should have been hesitant to invest in this field again. However, AI as an investment avenue has proven to be extremely appealing to the public. Experts suggest that this “frenzy” of companies to be the pioneers of the new age AI revolution, ignoring constraints of research and development, lack of legislation and an ‘ethical compass’ can lead to an economic parallel to the bursting of the “dot com bubble”.
Artists across the globe are now approaching legal institutions to get relief for unauthorized use of copyrighted material and on the other end of the spectrum, to copyright work generated through the use of AI. An established legal doctrine on these issues, still does not exist. However, a need for guidelines, if not a strict legislation is desperately felt by those in creative industries. Since the intersection of copyright and AI is a vast topic for discussion, we shall narrow down our dialogue to two main issues which raise concern. Firstly, does the use of copyrighted material in training data sets of AI to generate images come under the fair use doctrine? Secondly, who does the copyright of the generated image belong to (if at all), to the programmer, user or the AI itself?
Table of Contents
The fair use conundrum
Artificially intelligent systems rely on data sets- vast resources, consisting of thousands, sometimes millions of images or literary works stored in digital format. This data set is then used in reinforcement learning by the machine, wherein the machine becomes proficient in recognizing recurring patterns and giving outcomes based on the same. However, because this data is often scraped from the web by algorithms in such a large number, consent for the use of copyrighted material in the data set may not be there. Surprisingly, this practice of scouring the web for building data sets is legal (in the USA). Nevertheless, we must question the infringement of the copyrighted material in context of its use and how that use hampers the rights of the original author.
The lack of consent from the owner of copyrighted material has raised questions of whether or not it is ethical to train the AI using these images and whether or not it constitutes infringement of copyright? We must also consider that while training the machine using the data set consisting of copyrighted materials, we also reproduce the work to teach the machine via trial and error, effectively undermining the original author’s right to reproduce his work.
The companies and programmers behind these AI systems argue that the use of copyrighted material in training has a defense under the fair use doctrine. The determination whether or not a work is fair use depends on the nature and character of use, purpose of original work, amount of original work in subsequent work and the effect on potential market or the economic value of the original work. The economic loss and amount of original work used (transformative nature) are usually the most important determinants in such cases, as observed in Perfect 10 v. Amazon and The Authors Guild inc. v. Google inc.
The defense of copyrighted material comprising a very small amount of the data used in training the machines has also been taken by Google when discussing their Gmail smart reply feature. Companies, like OpenAI have stated that their purpose of use of the original material is ‘transformative’ and hence has the defense of fair use. Regardless, AI companies are fighting tooth and nail to establish arguments that minimize the company’s liability, ensure fair use protection and protect further commercial use and development of AI using protected IP. However, due to increasing commercialization of such AI programs, we are unsure until when can corporations continue to dodge liability and exploit legal loopholes.
Artists are continuing to file class action lawsuits against companies which own AI systems like StableDiffusion stating that by allowing companies to use copyrighted material to train AI, we are essentially causing irreversible damage to artists and human creativity. However, it must be clarified that AI systems do not operate by combining the data set to give an output like a collage. Rather, they analyze the common patterns and generate outputs based on these mathematical sequences.
Who has a copyright claim over the output?
Patent offices across the world have been very hesitant in granting IP protection to works not generated by humans. In Naruto v. Slater it was established that non-human entities were incapable of being granted copyright protection. A similar theme can be observed in other cases where non-human entities were denied any IP protection in their own name. This stance can be attributed to the non-recognition of artificially intelligent machines as legal persons having the capacity to enter legal relationships. This lack of recognition also prevents them from holding IP rights.
According to experts, users can technically claim copyright over the generated work arguing that their ‘prompts’ and intellectual input which entails editing and fine-tuning the images creates automatic right over the result of the AI. This defense, however, is circumstantially admissible and cannot be a hard and fast rule. The defendants would be required to prove the degree of human involvement to claim that there was intellectual contribution and skill involved in the creation of the work. In fact, protection can be revoked by patent offices stating that the degree of human involvement was not satisfactory, as in the case of Zarya of the Dawn where despite using an AI art generator (Midjourney), the user only listed herself as the author of the original work with no mention of AI in the claim. The USPO clarified that the AI was not used as a tool (like a photographer and a camera) but an unpredictable generator (like a client who hires an artist).
Concluding remarks- what’s next?
While we may not have an established legal doctrine that can offer protection to copyright holders and also promote innovation and involvement of AI technology in creative fields yet, parties on either side of the issue are preparing their cases. As a response to alleged violation of copyright law, Getty images has filed a lawsuit against StableDiffusion and banned all AI generated content from its website to protect users. They have even questioned the eligibility of AI-generated works for IP protection, stating that the outcome may be substantially similar to copyrighted material.
On the other hand, big corporations are preparing to increase funding and investment in a highly unregulated field with little legal guiding principles. Experts have predicted how it is only a matter of time before governments bring about legal changes in response to these challenges. India has not yet established a precedent or legislation where authorship of AI and ethics of use of copyrighted data sets has been clarified. However, the copyright office did face a dilemma with granting an Indian art-AI program (RAGHAV) copyright protection. It is only a matter of time before India would need to take action instead of being a patient observer as the world changes, since AI integration is an undeniable truth of the future.
This article is authored by Ishita Singh and Tushar Gadia, penultimate year students at Institute of Law Nirma University