Is ChatGPT just a copycat?

Large language models like ChatGPT are trained on data that are in the public domain, and it also includes pirated content. Since the launch of ChatGPT in November 2022, it has garnered both excitement and concern over the use of pirated content to train its underlying model, sparking public debate over copyright infringement and fair use of copyrighted materials to train artificial intelligence (AI) models.

Professor Simon Chesterman, who is the NUS Vice Provost (Educational Innovation), Dean of NUS College, and Senior Director of AI Governance at AI Singapore, highlights how there is a need to establish a balance between the rights of creators and the interests of the wider public in distributing and using their works. He adds that when training AI models, approaches like allowing content creators to “opt out” from their data being used and refining AI models with only public domain and licensed data can be considered, to build fair compensation mechanisms for using materials from content creators in AI model training.

Read more here.