Apr 16, 2024

Voice is (mostly) Bad UI

Humane’s AI Pin is in the news and unsurprisingly, it’s been universally panned by reviewers. I don’t have anything to add there. I appreciate the courage to invent a new paradigm, but there are no excuses for shipping something so pointless after raising hundreds of millions.

What I want to discuss is voice as an interface.

Movies like Her have led us to believe that voice is the ultimate user interface. Why bother with a keyboard and mouse, when you can just “talk it out.” Just a few days ago, I heard about a talk where someone was claiming that it’s only a matter of time before every app becomes a voice interface that uses AI agents to accomplish tasks.

I can’t help but think that these assumptions carry only a surface-level understanding of the human mind.

Of course, voice as input is nowhere near as polished today, but let’s assume it does become so in the near future. Is Voice poised to take over as the universal interface? I don’t believe that for many reasons.

  1. Voice is inherently incapable of representing abstract thoughts: When we use tools we heavily use our subconscious mind and abstract thoughts. For eg, writing and researching. Both of these activities heavily use those abilities. I won’t be able to finish any article if I have to narrate everything. That’s because while sometimes I have sentences fully formed in my brain, more often, I only have bits and pieces and I have to take time to string them together. I have to search for the right words, link the right ideas, or find the right research.

    Voice exercises part of the brain that belongs to the ‘consciousness’ category, which limits these capabilities.

  2. Voice makes quite a few things harder not easier: Theoretically, saying, “order an Uber to airport” seems like the easiest way to accomplish the task. But is it? What kind of Uber? UberXL, UberGo? There’s a 1.5x surge pricing. Acceptable? Is the pickup point correct? What would be easier, resolving each of those queries through a computer asking questions, or taking a quick look yourself on the app?

    Another example is food ordering. What would you prefer, going through the menu from tens of restaurants yourself or constantly nudging the AI for the desired option? Technological improvement can only help so much here since users themselves don’t clearly know what you want.

  3. Voice is privacy invading: Most humans are quite self-conscious and they don’t want to be seen as in public making requests to a computer. Hell, even with friends or a partner, it might get awkward. It’s not just about suspicious requests, people can get paranoid about being judged even for banal ones. Checking messages, for example. This could post questions in mind like: “How frequently this guy wants to check his messages?”, “Doesn’t he have anything else to do?“.

  4. UI is preferable to understand the limits of software: UI is not a hinderance, it helps us understand the capabilities of software. For eg, Google Maps. Say you want a route that isn’t narrow or bad or a route that takes you through beautiful surroundings. It won’t be able to do that because it doesn’t have that data. Nor, without the UI it’d be easy to discover that you can check the traffic conditions between two locations for a certain departure time.

  5. Users don’t like talking all the time: Using voice to accomplish simple tasks like setting a timer or an alarm is okay, but to accomplish everything can quickly become annoying. We don’t even realize how many micro-tasks we accomplish throughout the day by familiar interfaces. Checking calendar, emails, messages, browsing. Of course, some of these tasks are a way to kill time, but killing time, too, is an important part of our culture, which isn’t going anywhere.

  6. Voice can’t resolve real-world ambiguities: There can be tons of ambiguities in an instruction that can’t be easily resolved by a voice-only interface. For example, say you want to sell 500 shares of a certain company, which amounts to 50% of your holdings. But, it was a mistaken assumption, 500 shares represent 100% of your holdings. AI won’t be able to sense what you want to accomplish because it can’t read your brain.

    Of course, you can make the same mistake on UI, but it would be more difficult since there’s visual feedback about your action. An audio confirmation prompt won’t be that effective as it’s not easy to visualize the numbers in your head.

Overall, I am not convinced Voice is the future. There are too many inherent problems in not having a screen you can interact with. And even if you add a screen, Voice makes it harder to achieve many things that can be done more easily with a better UI. I don’t think, even if AI was perfectly capable of understanding human voice and had zero latency, we would reach a future that Humane is envisioning.

Follow Me!

I write about things that I find interesting. If you're modestly geeky, chances are you'll find them too.

Subscribe to this blog via RSS Feed.

Don't have an RSS reader? Use Blogtrottr to get an email notification when I publish a new post.