AI's missing user interface
Do next-generation chatbots need to be more specialized — or just easier to use?
Today I want to use the launch of another high-profile generative AI bot to ask two questions. One, how many chatbots are most people going to use? And two, what role can design play in the success or failure of these platforms?
I.
The questions came to mind for me this week when meeting Mustafa Suleyman. Suleyman co-founded the artificial intelligence pioneer DeepMind in 2010; Google acquired it in 2014 for a reported £400 million. Last year he co-founded a new AI company called Inflection with LinkedIn co-founder and investor Reid Hoffman; Hoffman was excited enough about its prospects that he quit the board of OpenAI to remove the conflict of interest. (The company has raised $225 million to date.)
Today, Inflection released its first product: Pi, a “personal AI” designed to help with more intimate, emotional requests. Relationships, family, work, stress, health, and mental health are among the subjects that Pi offered to help me with when I chatted with it. The idea is that you’ll use it less as a replacement for Google or Wikipedia and more to get advice and help plan your day.
And, like most chatbots like this, its interface is a box for you to type into, get replies, and type again. (A voice-based version is also in testing; the synthetic voices sounded great to my ears.)
In time, Suleyman told me, most interfaces are going to be conversational like this: serving as confidants, coaches, sounding boards and creative partners as circumstances require. Inflection is betting that the best chatbots will also be specialized in some way — and so rather than one or two big winners, the AI market will produce many more.
For Pi, that means working more than other chatbots do to understand your intent and also to support you after your query — sometimes days afterward. Alex Konrad explained how in a nice profile of the company today in Forbes:
Test users have been putting Pi through its paces for the past several months. Whereas other chatbots might provide a handful of options to answer a query, Pi follows a dialog-focused approach; ask Pi a question, and it will likely respond with one of its own. Through 10 or 20 such exchanges, Pi can tease out what a user really wants to know, or is hoping to talk through, more like a sounding board than a repackaged Wikipedia answer, Suleyman said. And unlike other chatbots, Pi remembers 100 turns of conversation with logged-in users across platforms, supporting a web browser (heypi.com), phone app (iOS only to start), WhatsApp and SMS messages, Facebook messages and Instagram DMs. Ask Pi for help planning a dinner party in one, and it will check in on how the party went when you talk later on another.
As Konrad notes elsewhere in the piece, though, there is no shortage of bots out there attempting to do very similar work. OpenAI’s ChatGPT, Anthropic’s Claude, and Google’s Bard are all freely available and will also answer personal questions — although, at least in my tests, often without the softer touch that Pi provided. During a demo, I had Suleyman ask Pi a question that had shown up in the company’s user logs: “why does my husband want a divorce?” The first thing Pi did was tell me how sorry it was to hear that; it then offered some gentle suggestions for where my fictional marriage might have gone awry.
Contrast that with ChatGPT, which began: “As an AI language model, I don't have access to personal information or individual circumstances, so I can't say for sure why your husband asked you for a divorce.” (It did go on to same some supportive things, though.) Bard, helpfully, absolved me of all responsibility: “I can't tell you for sure why your husband asked for a divorce. There are many possible reasons, and it's important to remember that it's not your fault.”
If you find your eyes glazing over here, as my eyes often do when reading AI-generated text, you may be feeling the same thing that I am: that differences in answers between chatbots matter somewhat, but maybe not as much as the founders of AI companies might hope. At least in their current state, the ceiling on how compelling the text produced by these bots is feels rather low — well suited to rote tasks like drafting emails or high school essays, but lousy at fully simulating a human therapist.
As a result, I suspect most people will primarily use the first bot they find useful, rather than employ a bunch of different ones for specialized uses. One, because today’s bots just aren’t all that differentiated. And two, because to the extent that they can become differentiated, it will likely be because you’ve agreed to share lots of personal data with them. And I can’t imagine most people wanting to type or upload lots of personal data to more than a small handful of these companies.
That’s not to say that we won’t use AI features dozens or hundreds of apps — I’m confident that we will. But when it comes to a text- or voice-based interface you use to ask any question you would ask a search engine, coach, therapist, or personal assistant, I still can’t see why there would be more than two or three big winners there.
Listening to founders pitch their chatbots over the past couple months, I feel like a reporter covering search engines in 1998 — with the CEO of Infoseek telling me that people will use their engine for some queries, Excite for others, and Yahoo for still others. In reality, of course, we were just a couple years away from Google steamrolling over the whole field.
II.
Of course, it’s hard to project yourself too far forward in an industry that is evolving at an exponential pace. But there is one dimension where all the players are moving slowly enough to have a considered opinion: design. And it’s worth thinking about as we consider the prospects of all these would-be AI giants.
Amelia Wattenberger, a research engineer at Github, laid out the problem in a sharp (and beautifully designed) piece on her blog. The problem, she writes, is that the blank interfaces offered by all the chatbots are awful at telling you what they can do:
Good tools make it clear how they should be used. And more importantly, how they should not be used. If we think about a good pair of gloves, it's immediately obvious how we should use them. They're hand-shaped! We put them on our hands. And the specific material tells us more: metal mesh gloves are for preventing physical harm, rubber gloves are for preventing chemical harm, and leather gloves are for looking cool on a motorcycle.
Compare that to looking at a typical chat interface. The only clue we receive is that we should type characters into the textbox. The interface looks the same as a Google search box, a login form, and a credit card field.
Of course, users can learn over time what prompts work well and which don't, but the burden to learn what works still lies with every single user. When it could instead be baked into the interface.
Chatbots aren’t the only consumer technology to face this problem. In fact, the previous generation of chatbots fell victim to it in 2016.
More recently, the problem has afflicted smart speakers and AI assistants like SIri. I rarely ask my Echo or Apple Watch to do much more than set an alarm or check the weather — because, in a profound way, I don’t understand what else they can do.
Invisible interfaces like these can seem elegant to some designer types, if only for their lack of visual clutter. But they also put a real burden on the user, and at least in the case of Siri and smart speakers, that burden has absolutely limited their potential.
I continue to think that Pi-style chatbots have more potential than smart speakers, if only because they are already obviously so much more useful. Chegg, a publicly traded education tech company, watched its stock price drop by almost half today when it announced that so many students have replaced its products with ChatGPT that it can no longer offer full-year guidance for its revenues. When the alternative is actually doing your homework, students are happy to hack and slash their way through mystery interfaces.
At the same time, it doesn’t seem ideal that getting an ideal image out of a text-to-image generator like Midjourney requires appending your query with something like “futuristic, character design, cinematic lightning, epic fantasy, hyper realistic, detail 8k --ar 9:16” — tags you’d never know to add unless you had searched through various forums and Discord chatrooms.
In her post, Wattenberger suggests finding ways to make these features visible — and tuning models to ask us follow-up questions after our initial queries so that they can get us closer to what we want. “We can add controls, information, and affordances to our chatbot interfaces to make them more usable,” she writes.
To its credit, Inflection’s Pi does ask questions like this of the user, to good effect. And answering follow-up questions from the bot did improve the responses I got. But an otherwise invisible interface leaves the rest of what it can do an ongoing puzzle for users to solve.
When the first round of chatbots began landing last year, it makes sense that they presented themselves as all-purpose tools — ideal for attracting a broad audience and increasing the amount of feedback they would generate to improve their underlying large language models.
But as the second round lands, something different may be required — and it might not be what founders are thinking. Yes, it could be valuable to create an AI model that is highly specialized. But it might be even more valuable to create one that is, first and foremost, highly usable.
Talk about this edition with us in Discord: This link will get you in for the next week.
Governing
Pornhub blocked access to its website for anyone living in Utah following the state’s new age verification requirement, which goes into effect on Tuesday. It’s a good day to be selling VPN service in Utah. (Samantha Cole / Motherboard)
The current wave of child safety bills targeting internet platforms echoes a similar fight over obscenity and online speech in the ‘90s, which ultimately resulted in the Supreme Court striking down anti-indecency provisions. (Kashmir Hill and Natasha Singer / The New York Times)
Google continues to place ads alongside climate-denying content on its search engine and YouTube, despite promising in 2021 to stop doing so. (Nico Grant and Steven Lee Myers / The New York Times)
AI pioneer Geoffrey Hinton, who quit Google on Monday, said he “suddenly switched” his views on the dangers of generative AI and now worries it may influence elections and be used in warfare. (Will Douglas Heaven / MIT Technology Review)
An in-depth exploration of the legal debate around AI-generated songs illustrates how ill-equipped modern copyright law is for the ongoing generative music boom. (Mia Sato / The Verge)
Apple and Google are partnering on a Bluetooth update to prevent AirTags from being used to track people without their knowledge. (Kif Leswing / CNBC)
Industry
The Writers Guild of America called a strike after failing to negotiate with Hollywood studios on a new contract that would include higher pay and guardrails on how AI is used in the screenwriting process. (Lucas Ropek / Gizmodo)
Twitter said it would restore free API access for emergency, weather and transportation alerts. The service is becoming less useful by the day, and this time even Elon Musk noticed it. (Jon Fingas / Engadget)
Eric Han, the head of US trust and safety for TikTok, is leaving the company. Han was leading the project to separate US data from the rest of ByteDance; his departure will leave a void. (Alex Heath / The Verge)
IBM paused hiring for new jobs that managers believe can be replaced using AI, including non-consumer-facing roles like HR. CEO Arvind Krishna said he expects that about 30% of such roles, or roughly 7,800 jobs, will likely become automated. (Brody Ford / Bloomberg)
Samsung banned the use of ChatGPT among its workforce after employees uploaded sensitive internal code to the platform, raising concerns about data privacy. (Mark Gurman / Bloomberg)
Microsoft is planning a version of ChatGPT that will run on dedicated servers and keep customer data private and separate from other users to prevent data leaks. (Aaron Holmes and Jon Victor / The Information)
Snap said it would start testing sponsored ads within its My AI chatbot by analyzing conversations and providing users with relevant links to businesses. (Sarah Perez / TechCrunch)
Meta announced new features for Facebook Reels, including personalization controls to see more content you prefer and discovery improvements that will surface relevant Reels within Facebook Watch. (Aisha Malik / TechCrunch)
BeReal launched a curated timeline of updates from high-profile users as it grapples with declining momentum and concerns about future growth. This is basically just a feed of verified users, and I do not understand it at all. (Kris Holt / Engadget)
Amazon had big ambitions for its Halo health and fitness product line, including an AI-powered personal trainer, before it abruptly shut it down last week. (Chris Welch / The Verge)
Those good tweets
For more good tweets every day, follow Casey’s Instagram stories.
(Link)
(Link)
(Link)
Talk to us
Send us tips, comments, questions, and Pi transcripts: casey@platformer.news and zoe@platformer.news.
To add to this, Adobe's Firefly AI beta offers some nice interface enhancements for image generation: a browser for style, lighting, composition and more. Each represented by a chiclet containing the style name and thumbnail of what it looks (or does) like. Clicking on them added the chiclet to your prompt. It's a nice visual way to represent what are arguments in Midjourney. It's exciting to see a company embracing and experimenting with a visual approach to AI generation.
> I had Suleyman ask Pi a question that had shown up in the company’s user logs: “why does my husband want a divorce?”
Uh, should we be concerned that the CEO of a company that takes intimate, emotional requests from users is able to view those requests?