In recent months, AI technologies, especially Generative AI, have become a hot topic. It seems every company wants to provide the service, but consumers are worried about the ethical and legal position of the output. In particular, people are worried about prosecutable plagiarism coming from these systems. In order to help alleviate some of those fears, Microsoft has created the Copilot Copyright Commitment, a promise to take responsibility for any issues from its AI Copilot technologies.
How Generative AI works
The way these Generative AI systems work is in three steps: ingest, interpret, and output. In the ingestion phase, the system finds, collects, or is fed a large collection of content. If the system is about text, then tons of textual content is fed into the system. If the system is about images, then tons of images are fed into the system. This is the content that everything will be based upon going forward.
The next phase is interpretation. This is the actual training process that we hear about all the time. Using the content that has been fed into it, the system then finds patterns and trends. That series of patterns is what allows the system to create knowledge graphs and virtual neural connections between concepts and data. That data is then used to understand the world, or at least the data it is aware of.
The final phase is to create output. That output is based on the data that has been fed into the system and the interpretation of that data. When asked to create output, the system goes back to its training and the data that it is based upon. Using those neural connections, it finds things similar to what it needs to create and builds a conglomerate of that data, which appears to be a new creation. However, it is really just a mash-up of other content that was fed into it, as these systems are incapable of actually creating something new.
The concerns over Generative AI
Because the output is really just a rehashing of the content fed into it, consumers are understandably worried that the output could ethically or legally violate the works of the original content creators. In particular, systems that train on data without the permission of those creators could be an ethical violation.
Avram has proven several times over the past few months that content generated by these AI systems have taken whole sentences and passages from the original works and presented them as new content. In a Piltch Point episode, he showed off some of his research showing Google SGE stealing content wholesale with no attribution, links, or acknowledgements.
Microsoft's Copilot solution
So, if Google is reproducing copyrighted materials and calling it new, the other companies are likely in the same boat. Because of this, Microsoft is trying to get out ahead of the problem. Rather than saying that their content is always 100% new and unique, they are saying that if there is a problem, it is on us not on you. The announcement says, in part,
Specifically, if a third party sues a commercial customer for copyright infringement for using Microsoft's Copilots or the output they generate, we will defend the customer and pay the amount of any adverse judgments or settlements that result from the lawsuit, as long as the customer used the guardrails and content filters we have built into our products.
This is a nice gesture, but it doesn't necessarily solve the problem. Sure, if you get sued for copyright infringement, Microsoft will defend you and pay and legal damages. But, there's a lot more than just legal damages involved. Companies hit by these lawsuits, especially those who might lose, also lose their reputation.
Take, for example, a new website. Instead of writing an article about a particular computer processor from scratch, they use a generative AI system to write it from concept. The AI system goes to its catalog of data it has ingested and interpreted and writes the article. But, because it's a new device, there is very little data available. So, it finds things that are close and it pulls whole passages from a product review from another website. When the original website finds the theft and sues, Microsoft's protection will not undo the reputation hit that comes along with a news site being accused of stealing content from another site. No amount of Microsoft's legal dollars is going to help repair that reputation.