Microsoft will assume responsibility for Copilot plagiarized content

posted Sunday Sep 10, 2023 by Scott Ertz

In recent months, AI technologies, especially Generative AI, have become a hot topic. It seems every company wants to provide the service, but consumers are worried about the ethical and legal position of the output. In particular, people are worried about prosecutable plagiarism coming from these systems. In order to help alleviate some of those fears, Microsoft has created the Copilot Copyright Commitment, a promise to take responsibility for any issues from its AI Copilot technologies.

How Generative AI works

The way these Generative AI systems work is in three steps: ingest, interpret, and output. In the ingestion phase, the system finds, collects, or is fed a large collection of content. If the system is about text, then tons of textual content is fed into the system. If the system is about images, then tons of images are fed into the system. This is the content that everything will be based upon going forward.

The next phase is interpretation. This is the actual training process that we hear about all the time. Using the content that has been fed into it, the system then finds patterns and trends. That series of patterns is what allows the system to create knowledge graphs and virtual neural connections between concepts and data. That data is then used to understand the world, or at least the data it is aware of.

The final phase is to create output. That output is based on the data that has been fed into the system and the interpretation of that data. When asked to create output, the system goes back to its training and the data that it is based upon. Using those neural connections, it finds things similar to what it needs to create and builds a conglomerate of that data, which appears to be a new creation. However, it is really just a mash-up of other content that was fed into it, as these systems are incapable of actually creating something new.

The concerns over Generative AI

Because the output is really just a rehashing of the content fed into it, consumers are understandably worried that the output could ethically or legally violate the works of the original content creators. In particular, systems that train on data without the permission of those creators could be an ethical violation.

Avram has proven several times over the past few months that content generated by these AI systems have taken whole sentences and passages from the original works and presented them as new content. In a Piltch Point episode, he showed off some of his research showing Google SGE stealing content wholesale with no attribution, links, or acknowledgements.

Microsoft's Copilot solution

So, if Google is reproducing copyrighted materials and calling it new, the other companies are likely in the same boat. Because of this, Microsoft is trying to get out ahead of the problem. Rather than saying that their content is always 100% new and unique, they are saying that if there is a problem, it is on us not on you. The announcement says, in part,

Specifically, if a third party sues a commercial customer for copyright infringement for using Microsoft's Copilots or the output they generate, we will defend the customer and pay the amount of any adverse judgments or settlements that result from the lawsuit, as long as the customer used the guardrails and content filters we have built into our products.

This is a nice gesture, but it doesn't necessarily solve the problem. Sure, if you get sued for copyright infringement, Microsoft will defend you and pay and legal damages. But, there's a lot more than just legal damages involved. Companies hit by these lawsuits, especially those who might lose, also lose their reputation.

Take, for example, a new website. Instead of writing an article about a particular computer processor from scratch, they use a generative AI system to write it from concept. The AI system goes to its catalog of data it has ingested and interpreted and writes the article. But, because it's a new device, there is very little data available. So, it finds things that are close and it pulls whole passages from a product review from another website. When the original website finds the theft and sues, Microsoft's protection will not undo the reputation hit that comes along with a news site being accused of stealing content from another site. No amount of Microsoft's legal dollars is going to help repair that reputation.

F5 Live: Refreshing Technology

September 10, 2023 - Episode 654

Sunday Sep 10, 2023 (01:52:19)

Description

This week, Microsoft is defending Copilot users, ReedPop is abandoning E3, X wants to keep its secrets, and Sony sues a TV Museum for copyright violation.

Participants

Scott Ertz

Host

Scott is a developer who has worked on projects of varying sizes, including all of the PLUGHITZ Corporation properties. He is also known in the gaming world for his time supporting the rhythm game community, through DDRLover and hosting tournaments throughout the Tampa Bay Area. Currently, when he is not working on software projects or hosting F5 Live: Refreshing Technology, Scott can often be found returning to his high school days working with the Foundation for Inspiration and Recognition of Science and Technology (FIRST), mentoring teams and helping with ROBOTICON Tampa Bay. He has also helped found a student software learning group, the ASCII Warriors, currently housed at AMRoC Fab Lab.

Avram Piltch

Host

Avram's been in love with PCs since he played original Castle Wolfenstein on an Apple II+. Before joining Tom's Hardware, for 10 years, he served as Online Editorial Director for sister sites Tom's Guide and Laptop Mag, where he programmed the CMS and many of the benchmarks. When he's not editing, writing or stumbling around trade show halls, you'll find him building Arduino robots with his son and watching every single superhero show on the CW.

Opening

Powered by TeknoAXE

Nifty Gifties

Powered by Microsoft Store

Microsoft will assume responsibility for Copilot plagiarized content

Piltch Point with Avram Piltch

Powered by PureVPN

Extra Life

Powered by Eksa

E3 is almost certainly done for good as ReedPop abandons sinking ship

The past decade or so has been rough for the former gaming behemoth: the Electronic Entertainment Expo, better known as E3. The managing organization, the ESA, has been a powder keg of chaos and general incompetence. The COVID-19 pandemic, or more specifically the lockdowns, created massive problems for the already struggling event. ReedPop jumped in to help try to right the ship, but this week, the company said it was ending its relationship with E3 and the ESA was on its own going forward.

News From the Tubes

Powered by Malwarebytes

X Corp. sues California to keep its secret sauce from the users

Since Elon Musk took over Twitter, things have been chaotic, to say the least. Between name changes, policy changes, and a nearly complete employee change, keeping up with what's going on can be a challenge. One thing that was promised upon takeover was that the new Twitter, which is now called X, would be open, transparent, and less regulated. That hasn't exactly been the reality, and California's content moderation law AB 587 looks to force open the doors revealing the way social networks, including X, work inside.

* DRM Not Included

Powered by Amazon Prime

Preservation vs Copyright: Sony issues strikes against TV Museum

There is a fine line between what does and doesn't fall under fair use. That line came into clear view this week as the Museum of Classic Chicago TV received a series of copyright strikes from Sony Pictures Entertainment. The strikes revolve around episodes of Bewitched from the 1960s that had been posted to YouTube under the concept of preservation. The strikes would have terminated the channel had SPE followed through on the threat, but backed down when Chief Curator Rick Klein removed the content.

Microsoft will assume responsibility for Copilot plagiarized content - The UpStream