Voice is (mostly) Bad UI

Humane’s AI Pin is in the news and unsurprisingly, it’s been universally panned by reviewers. I don’t have anything to add there. I appreciate the courage to invent a new paradigm, but there are no excuses for shipping something so pointless after raising hundreds of millions.

What I want to discuss is voice as an interface.

Movies like Her have led us to believe that voice is the ultimate user interface— why use a keyboard or mouse when you can just talk things out? Just a few days ago, I heard about a talk claiming that it’s only a matter of time before every app becomes a voice interface that uses AI agents to accomplish tasks.

I can’t shake the feeling that these ideas reflect a shallow understanding of how the human mind actually works.

Of course, voice as input is nowhere near as polished today, but let’s assume it does become so in the near future. Is Voice poised to take over as the universal interface? I don’t believe that for many reasons.

Voice is inherently incapable of representing abstract thoughts: When we use tools, we rely heavily on our subconscious and abstract thinking — especially in activities like writing or research. These tasks tap into deeper, often nonverbal processes. I wouldn’t be able to finish an article if I had to narrate the whole thing aloud. That’s because, while I occasionally have fully formed sentences in mind, more often I’m working with fragments — scattered thoughts that need time and space to come together. I have to search for the right words, connect the right ideas, and track down the right information — all of which unfolds more naturally through typing and thinking, not talking.

Voice exercises are part of the brain that belongs to the ‘consciousness’ category, which limits these capabilities.
Voice makes quite a few things harder not easier: Theoretically, saying, “order an Uber to airport” seems like the easiest way to accomplish the task. But is it? What kind of Uber? UberXL, UberGo? There’s a 1.5x surge pricing. Acceptable? Is the pickup point correct? What would be easier, resolving each of those queries through a computer asking questions, or taking a quick look yourself on the app?

Another example is food ordering. What would you prefer, going through the menu from tens of restaurants yourself or constantly nudging the AI for the desired option? Technological improvement can only help so much here since users themselves don’t clearly know what they want.
Voice is privacy invading: Most people are naturally self-conscious and don’t like the idea of being seen talking to a computer in public. Even around friends or a partner, it can feel awkward. And it’s not just about making suspicious or unusual requests — people often worry about being judged for completely normal things. Take checking messages, for example. That alone could raise silent questions like, “Why does he need to check so often?” or “Doesn’t he have anything better to do?” The fear of judgment, even over something trivial, can be surprisingly strong.
UI is preferable to understand the limits of software: The UI isn’t a limitation — it’s what helps us understand what a piece of software can actually do. Take Google Maps, for example. If you want a route that avoids narrow or poorly maintained roads, or one that takes you through scenic areas, it likely can’t help — not because of the UI, but because it doesn’t have that kind of data. On the other hand, without the UI, you might never realize you can check traffic conditions between two locations for a specific departure time. The interface reveals the software’s capabilities — and sometimes, its limitations.
Users don’t like talking all the time: Using voice for simple tasks like setting a timer or alarm is fine, but relying on it for everything can quickly become frustrating. We often don’t realize how many micro-tasks we breeze through each day thanks to familiar interfaces — checking the calendar, reading emails, browsing messages, or just scrolling the web. Sure, some of that is about passing the time, but even that — the act of killing time — is a deeply ingrained part of our culture, and it’s not going anywhere.
Voice can’t resolve real-world ambiguities: Voice-only interfaces can easily lead to misunderstandings. For example, you might say, “Sell 500 shares,” thinking that’s 50% of your holdings. But in reality, 500 shares represent your entire holding. The system won’t know your intent—it will just do what it’s told, even if your instruction was based on a wrong assumption.

Of course, you can make the same mistake on UI, but it would be more difficult since there’s visual feedback about your action. An audio confirmation prompt won’t be that effective as it’s not easy to visualize the numbers in your head.

Overall, I’m not convinced that voice is the future. There are too many fundamental challenges with not having a screen to interact with. And even when you add a screen, voice often complicates tasks that could be done more efficiently with a well-designed UI. Even if AI could perfectly understand human speech with zero latency, I still don’t think we’d arrive at the kind of future companies like Humane are imagining.