OpenAI wants to moderate your content
Can a chatbot solve the hardest problem on the internet? Moderation experts weigh in
Today let’s talk about a potentially transformative new tool that platforms are considering to bolster their trust and safety systems. Early reviews are good — but as always with platforms, how these systems are implemented will bear close watching.
On Tuesday, OpenAI said that it has been using GPT-4, its latest publicly available large language model, to moderate content — and suggested that other platforms consider using its API to do the same.
“Content moderation demands meticulous effort, sensitivity, a profound understanding of context, as well as quick adaptation to new use cases, making it both time consuming and challenging,” the company said in a blog post. “Traditionally, the burden of this task has fallen on human moderators sifting through large amounts of content to filter out toxic and harmful material, supported by smaller vertical-specific machine learning models. The process is inherently slow and can lead to mental stress on human moderators.”
That last line is, to my mind, an understatement. In 2019, I wrote a series of articles about the long-term effects of moderating content on contract workers at Facebook and Google entrusted with the task. Several of the people I spoke with developed post-traumatic stress syndrome and other mental health challenges.
Much of the discussion around those articles focused rightly on the human toll of moderation. But in reporting them, I was also struck by the practical difficulties of large-scale content moderation. Platforms’ community standards change constantly, and thus must be enforced differently from day to day. Communicating those changes to a global workforce speaking dozens of languages introduces daily opportunities for mass confusion.
For platform policy makers, this introduces significant friction into the development and implementation of new rules. Mostly it just means that everything takes more time — time to distribute the policy, time to educate the moderators, time to test the effect of implementing it, and so on.
The promise of GPT-4 and other generative AI tools in content moderation is that they speed up these cycles. “You get the feedback in minutes, not months,” said Dave Willner, former head of trust and safety at OpenAI, who tested the API’s moderation abilities.
At the same time, you reduce the need for humans to manually review disturbing content that could leave them with PTSD or worse.
So how does it work?
Here’s how OpenAI describes it:
Once a policy guideline is written, policy experts can create a golden set of data by identifying a small number of examples and assigning them labels according to the policy.
Then, GPT-4 reads the policy and assigns labels to the same dataset, without seeing the answers.
By examining the discrepancies between GPT-4’s judgments and those of a human, the policy experts can ask GPT-4 to come up with reasoning behind its labels, analyze the ambiguity in policy definitions, resolve confusion and provide further clarification in the policy accordingly. We can repeat steps 2 and 3 until we are satisfied with the policy quality.
This iterative process yields refined content policies that are translated into classifiers, enabling the deployment of the policy and content moderation at scale.
OK, but is it accurate? As easy as it is for me to understand the appeal of AI moderation, particularly insofar as it can reduce worker suffering, it still seems a bit too good to be true.
I asked Willner how accurate GPT-4 moderation is today. He pointed me to a scientific paper from earlier this year that found that an older version of ChatGPT outperformed human workers on average for tasks involving text labeling, which is functionally similar to moderation.
Top-level human moderators likely still outperform the AI — but perhaps not for long.
“Is it more accurate than me? Probably not,” Willner said. “Is it more accurate than the median person actually moderating? It’s competitive for at least some categories. And, again, there’s a lot to learn here. So if it’s this good when we don’t really know how to use it yet, it’s reasonable to believe it will get there, probably quite soon."
OK, I hear you asking, but how do people who didn’t work at OpenAI feel about this? I asked around, and the answers I heard were notably positive.
Alex Stamos, director of the Stanford Internet Observatory, told me that students in his trust and safety engineering course this spring had tested GPT-4-based moderation tools against their own models, Google/Jigsaw’s Perspective model, and others.
“GPT-4 was often the winner, with only a little bit of prompt engineering necessary to get to good results,” said Stamos, who added that overall he found that GPT-4 works “shockingly well for content moderation.”
One challenge his students found was that GPT-4 is simply more chatty than they are used to in building tools like this; instead of returning a simple number reflecting how likely a piece of content is to violate a policy, it responded with paragraphs of text.
Still, Stamos said, “my students found it to be completely usable for their projects.”
Assuming AI takes over more front-line moderation tasks, what will be left for humans to do? Kate Klonick, a law professor at St. John’s University who studies content moderation, told me platforms had an opportunity to refocus workers on addressing appeals from users who believe the AI has gotten it wrong.
“Or areas of trust and safety or content moderation that require care,” added Klonick, who was briefed by OpenAI on its moderation API last week. “Obviously that’s a cost center, so maybe the saved money just gets saved instead of rerouted to better user experience. But ideally.”
If there’s a risk, she said, it’s that AI will tend to be less tolerant of edge cases than workers who might be more tolerant of their fellow humans.
“AI has a great chance at creating a higher baseline for good trust and safety,” she said. “But also, yes, probably more censorship at the margins. As is ever the case.”
I also asked Yoel Roth, former head of trust and safety at Twitter, what he made of the potential for LLM-based moderation. He offered several questions that I think trust and safety practitioners would want to consider before implementing ChatGPT or a similar tool. Among them:
How well-aligned are GPT-4's verdicts with the specific policies companies provide, rather than a generic version of "hate" or "toxicity"?
How consistent are decisions made by LLMs — and, especially importantly for platforms eyeing Digital Services Act compliance, can companies readily explain those decisions when they're required to do so?
Can ChatGPT moderation deal with the nuances of coded speech and reclaimed slang?
Does it work in languages other than English and cultural contexts other than the United States?
“AI for moderation might have great answers to all of these questions,” Roth told me, “but we need actual, comparative data-driven answers to know for sure.”
It’s also worth saying that, as with everything else related to generative AI, using GPT-4 for content moderation is more expensive than other automated tools. For the moment, platforms will likely have to layer it in with other, cheaper software.
We’re clearly only at the beginning of the story of how generative AI could change moderation. For the moment, though, I’m struck by the optimism it has inspired in the handful of top practitioners I’ve spoken to about it.
It won’t be a panacea for our conflicts over speech issues on the internet — nothing will. Still, Willner told me, “it feels like a real breakthrough in a way that we haven’t had in quite a while.”
Pushback
Thanks to everyone who wrote in with their thoughts on yesterday’s piece about Elon Musk. While some of you appreciated it, I got more notes than usual from readers who are extremely over reading about the owner of X, particularly opinion-driven pieces about his personality. Even when he is making dark threats about a fellow CEO. You told me that you come to Platformer for news and analysis, and that yesterday I strayed too far into the realm of the hot take.
Message received. I have now said all I can imagine saying about Musk’s personality, and while surely there will be future antics that leave my jaw dropped, I will avoid writing about them at column length. (Interested readers may still find news items about these events in our links section.)
We’re still interested in the story of X as a business, and to the extent that Musk’s actions affect it, I consider them fair game. (Particularly when we can contribute original reporting to the conversation.)
But if he’s just being a creep online, we’ll do our best to look away.
Talk about this edition with us in Discord: This link will get you in for the next week.
Want to hang out with me in person? Applications are open for this year’s Code Conference, hosted by me, The Verge’s Nilay Patel, and CNBC’s Julia Boorstin. Join us for live, on-stage journalism with X/Twitter CEO Linda Yaccarino, GM CEO Mary Barra, Microsoft CTO Kevin Scott, and many more speakers to come. It’s all happening September 26th and 27th at The Ritz-Carlton, Laguna Niguel. Follow the latest here.
Governing
China’s sweeping AI regulations, which take effect on Aug. 15, will try to balance much-needed safety and privacy concerns with a desire to help Chinese firms compete on the global stage. (Sarah Zheng and Jane Zhang / Bloomberg)
President Biden’s administration argued for the U.S. Supreme Court to strike down portions of social media laws in Florida and Texas that restrict how platforms can moderate speech. (Greg Stohr / Bloomberg)
Microsoft announced a new strike system for enforcing its Xbox community standards that will ban players for a year from using multiplayer services or voice chat after eight strikes. (Ash Parrish / The Verge)
YouTube announced a new moderation approach toward “harmful or ineffective” cancer treatment videos, with the platform saying it will apply its medical misinformation policy to remove such content. (Jon Porter / The Verge)
Industry
Google announced a new feature for its experimental Search Generative Experience that will generate AI summaries of news articles. Google said it won’t work with paywalled articles, and is only available after the user has clicked the publisher’s link. (Jay Peters / The Verge)
X is throttling traffic to The New York Times, Meta products including Instagram and Threads, and Substack, using the platform’s link-shortening domain. One more reasons for news organizations to abandon it. (Jeremy B. Merrill and Drew Harwell / The Washington Post)
X told advertising partners it would wind down its promoted follower ads, which have historically generated $100 million per year, to try to encourage other ad formats. The decision was made from X’s product group, not its ad sales or revenue divisions. (Sara Fischer / Axios)
A far-reaching scam linked to advertising firm CPABuild has been using the promise of Fortnite and Roblox rewards to farm internet traffic and app downloads from young internet users. (Matt Burgess / Wired)
Substack added another social network-like feature with a new following option, so users can follow writers they like without having to fully commit to subscribing to their newsletter. I like this. (Jay Peters / The Verge)
YouTube Music added a new discovery feature called “Samples” that will feature 30-second music videos and other content uploaded directly by artists. (Abner Li / 9to5Google)
WhatsApp’s latest beta release contains a feature for creating custom in-app stickers using AI. (Techmeme)
Those good posts
For more good posts every day, follow Casey’s Instagram stories.
(Link)
(Link)
(Link)
Talk to us
Send us tips, comments, questions, and AI moderators: casey@platformer.news and zoe@platformer.news.
I appreciated your post on Musk and I think it was important to emphasise the wider social and deeper personal context of his behaviour, especially given the reluctance/inability for other publishers to do so.
I think I could have easily dismissed it as ‘ah, more blowhard bs’, but placing it in the context of his history as an individual, his responsibilities as a business leader, and the communities he apes/values (e.g. Edgelord, Twitter) helped me appreciate the dangerous scale of his buffoonery.
So, thanks for writing it, even as I am also absolutely tired of the guy.
"So if it’s this good when we don’t really know how to use it yet, it’s reasonable to believe it will get there, probably quite soon."
I do not understand how people in tech have this level of apparently unfounded optimism after everything that's happened in the past 20 years.
OpenAI has said that ai could be very dangerous. Existing tech platforms have facilitated genocides and empowered dictators. But yeah I guess it's totally reasonable to assume this thing we don't understand will work out fine.
I can understand why this particular use of automation elicits a hopeful reaction though. I don't wish the job of sifting through the worst parts of the Internet on anyone. It would be good if we had effective content moderation tools that weren't cursed monkey paws so that humans didn't need to be exposed to that level of evil.
It's just that this ai stuff looks pretty paw-shaped at a distance.