OpenAI’s news blues

A New York Times lawsuit won’t be the last challenge to LLMs’ free-riding. PLUS: A note on Substack and Nazis

Jan 03, 2024

Happy New Year! Today, let’s talk about one of the biggest pieces of news from the break — and explore why the days of freely training large language models on every public website are likely over for good.

The news, of course, is that the New York Times is suing OpenAI. Here are Michael M. Grynbaum and Ryan Mac:

The Times is the first major American media organization to sue the companies, the creators of ChatGPT and other popular A.I. platforms, over copyright issues associated with its written works. The lawsuit, filed in Federal District Court in Manhattan, contends that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information.
The suit does not include an exact monetary demand. But it says the defendants should be held responsible for “billions of dollars in statutory and actual damages” related to the “unlawful copying and use of The Times’s uniquely valuable works.” It also calls for the companies to destroy any chatbot models and training data that use copyrighted material from The Times.

First, a reminder / disclosure: I co-host the Hard Fork podcast for the Times. As such, I want to be clear any opinions here are my own, and in fact I’m not going to speak to the merits of the case at all — other than to say that I think the complaint makes for compelling reading. (Particularly the portions alleging that ChatGPT regurgitated long sections of Times articles verbatim after being prompted.)

OpenAI, for its part, told the Times reporters that it had been in negotiations with the paper and was “surprised and disappointed” by the lawsuit.

I’m interested in the case because I think generative AI has the potential to reshape the economics of journalism and the web to favor the builders of AI models over digital publishers. Already I find myself regularly looking up non-critical information via AI chatbot rather than Google, a habit that is generally faster than the alternatives but also deprives publishers of the advertising revenue they might otherwise get from me visiting their websites.

The Times case is important because it tests the legality of these fast-growing services on copyright grounds. The question is whether the specific ways that LLMs process data will be found to be covered by fair use — an as-yet unsettled but critical question for the future of digital media.

The Times reporters have the relevant argument from the complaint:

“Defendants seek to free-ride on The Times’s massive investment in its journalism,” the complaint says, accusing OpenAI and Microsoft of “using the Times’s content without payment to create products that substitute for The Times and steal audiences away from it.”

In Bloomberg, Noah Feldman weighs arguments for and against the Times’ complaint and concludes that the company is right to believe that LLMs are using its work in a way that harms its business interests. At the same time, he writes, copyright law alone will likely be insufficient to protect journalism:

Here is where the fundamental public interest in the maintenance of the free press becomes relevant to the fair use question. If you can get information more cheaply from an LLM than from the New York Times, you might drop your subscription. But if everyone did that, there would be no New York Times at all. Put another way, OpenAI and Microsoft need the New York Times and other news organizations to exist if they are to provide reliable news as part of their service. Rationally and economically, therefore, they ought to be obligated to pay for the information they are using.
Fitting this powerful public interest into copyright law won’t be simple for the courts. Literal copying is the easiest form of infringement to punish. In ordinary legal circumstances, if LLMs change words sufficiently to be summarizing rather than copying, that weakens the Times’ case. Yet summaries in different words would still be sufficient to kill the Times and similar organizations — and leave us newsless.

For these reasons, AI model makers and publishers are both incentivized to build partnerships here. And the past few months have seen OpenAI making a handful of licensing deals, including with the Associated Press and Politico and Business Insider parent company Axel Springer.

But as the Times case shows, not every publisher will come to terms with LLM makers. What then?

The recent history of Google and Meta offers one possibility.

As Google search and Facebook grew in popularity during the 2010s, they became key destinations for consumers to get news. They also built advertising products far superior to those of most publishers, and the news business continued to shrink as a result.

Platforms’ outsized success is often perceived to have come at the expense of journalism, leading to calls for Google and Meta to subsidize the industry. Most publishers continue to benefit more from Google and Meta than the reverse: tech companies send a ton of traffic to publishers, which publishers monetize via ads. But platforms can mostly live without news.

In any case, platforms did cut various deals with digital publishers over the years, just as OpenAI is seeking to do now. But as social sharing habits shifted, news became less important to Meta and Facebook than it had been in the 2010s, and the company wound down its partnerships. And while news remained more important to Google as a necessary component of search, the company’s journalisms initiatives never amounted to more than a rounding error in its annual budget.

Then in 2021, regulators in Australia had an idea. Why not tax Google and Meta for the right to display links to news publishers? The country passed a law requiring the platforms to negotiate fees with publishers or be forced into binding arbitration with an Australian panel; Meta briefly stopped displaying news links in the country as a result. In the end, though, both platforms caved.

Now similar legislation is popping up around the globe. Most recently, Canada passed a link tax of its own. New Zealand and the United Kingdom have floated their own variations.

I don’t like the idea of taxing companies for displaying links; the web relies on people being able to freely link to one another, and the laws passed in Australia and Canada are little more than shakedowns that do almost nothing to ensure that the resulting revenues are spent on journalism.

At the same time, now that two countries have shown it is possible to wring a few dollars out of platforms this way, many more seem likely to follow.

And that seems notable in the context of OpenAI, which also displays links to publishers when citing sources in ChatGPT. (Microsoft, which uses OpenAI’s APIs in its Bing search results, does so as well.)

Assuming OpenAI and its rivals continue to grow quickly — possibly at the expense of the digital publishers they are citing in links that few people ever click on — it doesn’t seem unlikely that they could be brought into a regulatory scheme similar to the one Google and Meta are now already under.

A better solution would be for AI developers to pay a fair price to any publisher from whom they are deriving significant, ongoing value. If they don’t, though, the experience of Meta and Google suggests that eventually big publishers will get paid one way or another.

Substack and Nazis

Thanks to everyone who wrote to us over the break regarding Substack’s response to criticism that it is hosting and monetizing dozens of extremist publications, including some that are openly advocating for genocide against Jewish people.

In the wake of Jonathan M. Katz’s November article on the subject in The Atlantic, 247 Substack writers published an open letter asking the company to clarify its policies. On December 21, Substack co-founder Hamish McKenzie delivered the company’s answer in a blog post. While allowing that “we don’t like Nazis either” and stating that Substack wished “no-one held those views,” McKenzie said that “some people do hold those and other extreme views.”

“Given that, we don't think that censorship (including through demonetizing publications) makes the problem go away — in fact, it makes it worse,” he wrote. “We believe that supporting individual rights and civil liberties while subjecting ideas to open discourse is the best way to strip bad ideas of their power.”

This statement appeared to be a tacit admission that Substack is not always enforcing its own content guidelines, which state that “Substack cannot be used to publish content or fund initiatives that incite violence based on protected classes.” It also seems all but certain to worsen the problem by inviting Nazis to Substack and telling them explicitly that they can make money there. (Substack takes 10 percent of subscription revenue from every paid site on the network, including Platformer.)

Content moderation often involves difficult trade-offs, but this is not one of those cases. Rolling out a welcome mat for Nazis is, to put it mildly, inconsistent with our values here at Platformer. We have shared this in private discussions with Substack and are scheduled to meet with the company later this week to advocate for change.

Meanwhile, we’re now building a database of extremist Substacks. Katz kindly agreed to share with us a full list of the extremist publications he reviewed prior to publishing his article, most of which were not named in the piece. We’re currently reviewing them to get a sense of how many accounts are active, monetized, display Nazi imagery, or use genocidal rhetoric.

We plan to share our findings both with Substack and, if necessary, its payments processor, Stripe. Stripe’s terms prohibit its service from being used by “any business or organization that a. engages in, encourages, promotes or celebrates unlawful violence or physical harm to persons or property, or b. engages in, encourages, promotes or celebrates unlawful violence toward any group based on race, religion, disability, gender, sexual orientation, national origin, or any other immutable characteristic.”

It is our hope that Substack will reverse course and remove all pro-Nazi material under its existing anti-hate policies. If it chooses not to, we will plan to leave the platform.

We’ll share our complete findings in another post when it is ready. In the meantime, we want to hear from you. Have unsubscribed or considered unsubscribing from Platformer or other Substacks over this issue? If so, email us and we’ll share a selection of the feedback with Substack.

Talk about this edition with us in Discord: This link will get you in for the next week.

Talk to us

Send us tips, comments, questions, and copyrighted LLM output: casey@platformer.news and zoe@platformer.news.

Sponsor a Newsletter

pandaexplosion

Jan 3, 2024

very impressed with your approach to substack, particularly the bit about sharing findings with stripe.

Expand full comment

Jesse J. Anderson

Yes, I moved my own newsletter (extrafocus.com) off of Substack and have unfollowed dozens more.

I've cancelled some of my paid subscriptions, and will likely cancel the rest in the next month or so (giving some of my favorite publications, like The Platformer, a chance to arrange moving somewhere else).

Sadly, I have little faith in Substack changing their ways. This latest news was not entirely unexpected given some other past comments and behavior.

11 more comments...

Platformer

OpenAI’s news blues

A New York Times lawsuit won’t be the last challenge to LLMs’ free-riding. PLUS: A note on Substack and Nazis

Substack and Nazis

Governing

Industry

Those good posts

Talk to us

Discussion about this post