OpenAI’s news blues
A New York Times lawsuit won’t be the last challenge to LLMs’ free-riding. PLUS: A note on Substack and Nazis
Happy New Year! Today, let’s talk about one of the biggest pieces of news from the break — and explore why the days of freely training large language models on every public website are likely over for good.
The news, of course, is that the New York Times is suing OpenAI. Here are Michael M. Grynbaum and Ryan Mac:
The Times is the first major American media organization to sue the companies, the creators of ChatGPT and other popular A.I. platforms, over copyright issues associated with its written works. The lawsuit, filed in Federal District Court in Manhattan, contends that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information.
The suit does not include an exact monetary demand. But it says the defendants should be held responsible for “billions of dollars in statutory and actual damages” related to the “unlawful copying and use of The Times’s uniquely valuable works.” It also calls for the companies to destroy any chatbot models and training data that use copyrighted material from The Times.
First, a reminder / disclosure: I co-host the Hard Fork podcast for the Times. As such, I want to be clear any opinions here are my own, and in fact I’m not going to speak to the merits of the case at all — other than to say that I think the complaint makes for compelling reading. (Particularly the portions alleging that ChatGPT regurgitated long sections of Times articles verbatim after being prompted.)
OpenAI, for its part, told the Times reporters that it had been in negotiations with the paper and was “surprised and disappointed” by the lawsuit.
I’m interested in the case because I think generative AI has the potential to reshape the economics of journalism and the web to favor the builders of AI models over digital publishers. Already I find myself regularly looking up non-critical information via AI chatbot rather than Google, a habit that is generally faster than the alternatives but also deprives publishers of the advertising revenue they might otherwise get from me visiting their websites.
The Times case is important because it tests the legality of these fast-growing services on copyright grounds. The question is whether the specific ways that LLMs process data will be found to be covered by fair use — an as-yet unsettled but critical question for the future of digital media.
The Times reporters have the relevant argument from the complaint:
“Defendants seek to free-ride on The Times’s massive investment in its journalism,” the complaint says, accusing OpenAI and Microsoft of “using the Times’s content without payment to create products that substitute for The Times and steal audiences away from it.”
In Bloomberg, Noah Feldman weighs arguments for and against the Times’ complaint and concludes that the company is right to believe that LLMs are using its work in a way that harms its business interests. At the same time, he writes, copyright law alone will likely be insufficient to protect journalism:
Here is where the fundamental public interest in the maintenance of the free press becomes relevant to the fair use question. If you can get information more cheaply from an LLM than from the New York Times, you might drop your subscription. But if everyone did that, there would be no New York Times at all. Put another way, OpenAI and Microsoft need the New York Times and other news organizations to exist if they are to provide reliable news as part of their service. Rationally and economically, therefore, they ought to be obligated to pay for the information they are using.
Fitting this powerful public interest into copyright law won’t be simple for the courts. Literal copying is the easiest form of infringement to punish. In ordinary legal circumstances, if LLMs change words sufficiently to be summarizing rather than copying, that weakens the Times’ case. Yet summaries in different words would still be sufficient to kill the Times and similar organizations — and leave us newsless.
For these reasons, AI model makers and publishers are both incentivized to build partnerships here. And the past few months have seen OpenAI making a handful of licensing deals, including with the Associated Press and Politico and Business Insider parent company Axel Springer.
But as the Times case shows, not every publisher will come to terms with LLM makers. What then?
The recent history of Google and Meta offers one possibility.
As Google search and Facebook grew in popularity during the 2010s, they became key destinations for consumers to get news. They also built advertising products far superior to those of most publishers, and the news business continued to shrink as a result.
Platforms’ outsized success is often perceived to have come at the expense of journalism, leading to calls for Google and Meta to subsidize the industry. Most publishers continue to benefit more from Google and Meta than the reverse: tech companies send a ton of traffic to publishers, which publishers monetize via ads. But platforms can mostly live without news.
In any case, platforms did cut various deals with digital publishers over the years, just as OpenAI is seeking to do now. But as social sharing habits shifted, news became less important to Meta and Facebook than it had been in the 2010s, and the company wound down its partnerships. And while news remained more important to Google as a necessary component of search, the company’s journalisms initiatives never amounted to more than a rounding error in its annual budget.
Then in 2021, regulators in Australia had an idea. Why not tax Google and Meta for the right to display links to news publishers? The country passed a law requiring the platforms to negotiate fees with publishers or be forced into binding arbitration with an Australian panel; Meta briefly stopped displaying news links in the country as a result. In the end, though, both platforms caved.
Now similar legislation is popping up around the globe. Most recently, Canada passed a link tax of its own. New Zealand and the United Kingdom have floated their own variations.
I don’t like the idea of taxing companies for displaying links; the web relies on people being able to freely link to one another, and the laws passed in Australia and Canada are little more than shakedowns that do almost nothing to ensure that the resulting revenues are spent on journalism.
At the same time, now that two countries have shown it is possible to wring a few dollars out of platforms this way, many more seem likely to follow.
And that seems notable in the context of OpenAI, which also displays links to publishers when citing sources in ChatGPT. (Microsoft, which uses OpenAI’s APIs in its Bing search results, does so as well.)
Assuming OpenAI and its rivals continue to grow quickly — possibly at the expense of the digital publishers they are citing in links that few people ever click on — it doesn’t seem unlikely that they could be brought into a regulatory scheme similar to the one Google and Meta are now already under.
A better solution would be for AI developers to pay a fair price to any publisher from whom they are deriving significant, ongoing value. If they don’t, though, the experience of Meta and Google suggests that eventually big publishers will get paid one way or another.
Substack and Nazis
Thanks to everyone who wrote to us over the break regarding Substack’s response to criticism that it is hosting and monetizing dozens of extremist publications, including some that are openly advocating for genocide against Jewish people.
In the wake of Jonathan M. Katz’s November article on the subject in The Atlantic, 247 Substack writers published an open letter asking the company to clarify its policies. On December 21, Substack co-founder Hamish McKenzie delivered the company’s answer in a blog post. While allowing that “we don’t like Nazis either” and stating that Substack wished “no-one held those views,” McKenzie said that “some people do hold those and other extreme views.”
“Given that, we don't think that censorship (including through demonetizing publications) makes the problem go away — in fact, it makes it worse,” he wrote. “We believe that supporting individual rights and civil liberties while subjecting ideas to open discourse is the best way to strip bad ideas of their power.”
This statement appeared to be a tacit admission that Substack is not always enforcing its own content guidelines, which state that “Substack cannot be used to publish content or fund initiatives that incite violence based on protected classes.” It also seems all but certain to worsen the problem by inviting Nazis to Substack and telling them explicitly that they can make money there. (Substack takes 10 percent of subscription revenue from every paid site on the network, including Platformer.)
Content moderation often involves difficult trade-offs, but this is not one of those cases. Rolling out a welcome mat for Nazis is, to put it mildly, inconsistent with our values here at Platformer. We have shared this in private discussions with Substack and are scheduled to meet with the company later this week to advocate for change.
Meanwhile, we’re now building a database of extremist Substacks. Katz kindly agreed to share with us a full list of the extremist publications he reviewed prior to publishing his article, most of which were not named in the piece. We’re currently reviewing them to get a sense of how many accounts are active, monetized, display Nazi imagery, or use genocidal rhetoric.
We plan to share our findings both with Substack and, if necessary, its payments processor, Stripe. Stripe’s terms prohibit its service from being used by “any business or organization that a. engages in, encourages, promotes or celebrates unlawful violence or physical harm to persons or property, or b. engages in, encourages, promotes or celebrates unlawful violence toward any group based on race, religion, disability, gender, sexual orientation, national origin, or any other immutable characteristic.”
It is our hope that Substack will reverse course and remove all pro-Nazi material under its existing anti-hate policies. If it chooses not to, we will plan to leave the platform.
We’ll share our complete findings in another post when it is ready. In the meantime, we want to hear from you. Have unsubscribed or considered unsubscribing from Platformer or other Substacks over this issue? If so, email us and we’ll share a selection of the feedback with Substack.
Talk about this edition with us in Discord: This link will get you in for the next week.
Governing
X lost its bid to block a California law that requires companies to disclose content moderation policies and how they’re enforced. (Peter Blumberg and Malathi Nayak / Bloomberg)
Facebook suspended the far-right Libs of TikTok account for violating community standards, but restored it after owner Chaya Raichik appealed. (Tracy Connor / Daily Beast)
Donald Trump’s former fixer Michael D. Cohen accidentally gave his lawyer fake legal citations made up by Google Bard, not realizing that Bard is a generative AI service, he says. (Benjamin Weiser and Jonah E. Bromwich / The New York Times)
The effective altruism movement is influencing policymakers to prioritize the existential risks of AI. Now, effective altruists are clashing with AI optimists and others concerned about AI’s more immediate dangers. (Brendan Bordelon / Politico)
US Supreme Court Justice John Roberts is urging “caution and humility” as AI reshapes judicial work. (John Kruzel / Reuters)
Google will settle a 2020 lawsuit that alleged Chrome’s Incognito mode still tracks user data. (Eric Bangeman / Ars Technica)
A large network of YouTube channels spreading Chinese propaganda was leveraging AI-generated voice-overs. It got more than 120 million views before anyone at YouTube realized what was going on. Not great! (Tiffany Hsu and Steven Lee Myers / The New York Times)
Some apps that allow people to earn money, like eBay and Airbnb, will report those earnings to tax authorities, under a new international agreement that includes the US, UK, and many European countries. (Ben Lovejoy / 9to5Mac)
EU competition chief Margrethe Vestager argues that the AI Act will enhance technology and create “legal certainty” for companies. (Javier Espinoza / Financial Times)
Industry
OpenAI reportedly topped $1.6 billion in annualized revenue due to the growth of ChatGPT — a 20 percent jump over two months. (Maria Heeter, Amir Efrati and Stephanie Palazzolo / The Information)
Alexander Reben, an MIT-educated technologist who studies the impact of innovations on creativity, will become OpenAI’s first artist-in-residence. (Leslie Katz / The New York Times)
Microsoft’s AI Image Creator, powered by DALL-E 3, can be used to generate realistic violent images of women, minorities, politicians, and celebrities. (Geoffrey A. Fowler / Washington Post)
The Microsoft Copilot app is now being rolled out on iOS and iPadOS, giving users access to GPT-4 and DALL-E 3. (Emma Roth / The Verge)
Fidelity marked down the value of its X shares again. The financial institution now says X is worth 71.5 percent less than it was when Elon Musk bought it. (Dan Primack / Axios)
X is bringing back headlines in posts that contain links, but now the font will be tiny. Whatever! (Ben Schoon / 9to5Google)
Telegram’s latest Android update has a new look for voice and video calls, and a new delete animation. (Ben Schoon / 9to5Google)
Writers could use AI the same way poets and novelists once used séances and Oujia boards – as a source for inspiration, this essay argues. (A.O. Scott / The New York Times)
Brands have found a new way to cut advertising costs – using AI-generated virtual influencers rather than real-life ones. It turns out AI influencers have fewer opinions. (Cristina Criddle / Financial Times)
People are creating AI-generated replicas of living experts without their consent.(Mohar Chatterjee / POLITICO)
Free event planning app Partiful is booming in popularity as a trendy alternative to Facebook Events. (Dan Rosenzweig-Ziff / Washington Post)
Those good posts
For more good posts every day, follow Casey’s Instagram stories.
(Link)
(Link)
(Link)
Talk to us
Send us tips, comments, questions, and copyrighted LLM output: casey@platformer.news and zoe@platformer.news.
very impressed with your approach to substack, particularly the bit about sharing findings with stripe.
Yes, I moved my own newsletter (extrafocus.com) off of Substack and have unfollowed dozens more.
I've cancelled some of my paid subscriptions, and will likely cancel the rest in the next month or so (giving some of my favorite publications, like The Platformer, a chance to arrange moving somewhere else).
Sadly, I have little faith in Substack changing their ways. This latest news was not entirely unexpected given some other past comments and behavior.