AI 已经很强大了。但在 AI 更强大的未来,我们跟它的主要交流方式还会是现在这样——坐在电脑前打字、看屏幕吗?
AI is already powerful. But in a future where AI is even more powerful, will our primary way of interacting with it still be sitting at a computer, typing and staring at a screen?
我觉得不会。我相信语音交互才是 AI 交互的终局形态。这篇文章讲讲为什么。
I don't think so. I believe voice is the endgame form of AI interaction. This article is about why.
一、从老板的工作方式说起
1. Start with how executives actually work
我的第一份工作是咨询顾问,接触过很多大企业的一把手。我发现他们有一个共同特点:很少用电脑。
My first job was as a management consultant, and I worked closely with the top executives of many large companies. I noticed they all shared one trait: they rarely use computers.
他们怎么工作?开会,打电话,用手机发消息,最多拿个 iPad 扫一眼文件。实际上,他们的核心工作只有三件事:
How do they actually work? Meetings, phone calls, messages on their phone, occasionally an iPad to glance at a document. In practice, their core work boils down to three things:
除了发生在大脑中的第二个环节,第一和第三个环节,大多是用耳朵和嘴完成的。英文里有句话叫 "too long, don't read",老板们更是如此——他们真正仔细看文件的时候,往往是因为要向自己的老板汇报了。
Except for step two — which happens in their head — steps one and three are mostly done with their ears and mouth. There's a phrase in English: "TL;DR" — too long, didn't read. Executives live by it. The rare times they actually read a document carefully are usually because they need to brief their own boss.
最高效的信息获取方式是对话和提炼,而不是阅读。
The most efficient way to acquire information is conversation and distillation — not reading.
对话可以迅速聚焦到你感兴趣的话题,追问细节,获取洞察。静态的文字做不到这一点。
Conversation lets you zoom in on what you actually care about, ask follow-ups, and pull out insight. Static text can't do that.
二、每个人都可以像老板一样工作了
2. Now everyone can work like an executive
你有没有发现,你现在跟 AI 协作时做的事情,和老板的三件事几乎一样?
Have you noticed? What you do when you collaborate with AI looks almost exactly like the executive's three things:
- 你给 AI 一堆资料,让它帮你提炼要点——这是听汇报
- You hand AI a pile of material and ask it to extract the key points — that's listening to a briefing
- 你根据 AI 的分析做出判断——这是做决策
- You make a judgment based on AI's analysis — that's making a decision
- 你让 AI 去写代码、写文章、做调研——这是发指令
- You ask AI to write code, draft an article, run research — that's issuing instructions
以前只有老板能这样工作,因为他身边有人帮他过滤信息、执行指令。普通人没有这个条件,所以你得自己看文件、自己写报告、自己敲键盘。
In the past, only executives could work this way, because they had a team filtering information and executing on their behalf. Regular people didn't have that, so you had to read the documents yourself, write the reports yourself, hit the keyboard yourself.
但现在,AI 正在成为每个人的助手。理论上,每一个人都可以用老板的工作方式——用语音指挥 AI 去做事。
But now, AI is becoming everyone's assistant team. In theory, every single person can adopt the executive's working style — using voice to direct AI to get things done.
三、为什么你现在还没能像老板一样潇洒
3. So why can't you yet work as smoothly as one?
既然语音这么高效,为什么你还在打字、还在盯着屏幕?三个原因。
If voice is so efficient, why are you still typing and staring at a screen? Three reasons.
第一,AI 还不够 "懂你"
First — AI doesn't yet "get you" well enough
好的助手之所以好用,不是因为他能力强,而是因为他非常了解老板需要什么——能够根据老板的偏好、关注点、决策风格,把信息调整到最合适的粒度。
A great assistant isn't great because they're capable. They're great because they deeply understand what the boss needs — and can adjust the granularity of information to match the boss's preferences, focus areas, and decision style.
现在的 AI 还做不到这一点。你仍然需要把需求很结构化地讲清楚,而纯语音交互不太适合输出高度结构化的内容。文字在这方面仍有优势。
Today's AI can't do that yet. You still have to articulate your needs in a structured way, and pure voice isn't great for producing highly structured input. Text still has the edge here.
第二,响应延迟太大
Second — response latency is still too high
这一点我在自己做语音 AI 产品时感受最深:文字聊天等三秒我有耐心,但语音对话等三秒我就想挂断了。人对语音延迟的容忍度远远低于文字。而且语音是实时交互,需要支持随时打断、快速再次响应。目前的技术还达不到真正流畅的体验。
This is what I've felt most viscerally building voice AI products myself: I'll wait three seconds patiently in a text chat, but in a voice conversation, three seconds and I want to hang up. Human tolerance for voice latency is dramatically lower than for text. And voice is real-time — it has to support being interrupted at any moment and responding instantly. Today's tech isn't there yet.
第三,你还在盯着现有的工具想怎么用它们
Third — you're still thinking inside the current toolset
这一条不是技术问题,而是思维问题。很多人把 AI 当作 "更好的搜索引擎" 或 "更好的文本编辑器" 来用,还在用老工具的逻辑思考新工具。但如果你跳出来想一想 AI 下一步的发展方向,就应该开始为未来的协作方式做准备了。
This one isn't a technical problem — it's a mindset problem. A lot of people use AI as a "better search engine" or a "better text editor," still reasoning about a new tool with the logic of old tools. But if you step back and look at where AI is heading next, you should already be preparing for the collaboration patterns of the future.
四、这些问题一定会被解决
4. These problems will be solved
看 AI 的发展速度,前两个技术问题不会持续太久。模型在变得更懂上下文、更快、更便宜。语音合成和识别的延迟也在持续降低。
Looking at the pace of AI progress, the first two technical problems won't last long. Models are getting better at context, faster, and cheaper. Speech synthesis and recognition latency keep dropping.
当你拥有一个足够聪明、记性够好、响应够快的 AI 助手,你还需要坐在电脑前盯着屏幕吗?
When you have an assistant that's smart enough, has long enough memory, and responds fast enough — will you still sit at a computer staring at a screen?
我不会。我会直接问它:"这个文件讲了什么?对我的工作有什么影响?" 然后跟它几轮语音讨论,让它直接去干活。
I won't. I'll just ask: "What does this document say? What does it mean for my work?" Then have a few rounds of voice discussion and tell it to go do the work directly.
五、未来人人都是老板
5. In the future, everyone is an executive
所以,为什么我相信语音交互是 AI 的未来?
So why do I believe voice is the future of AI interaction?
因为 AI 正在让每个人都拥有自己的 "助手团队"。而当你有了足够好的助手,最高效的工作方式就不再是打字和看屏幕——而是听、思考、说。
Because AI is giving every single person their own "team of assistants." And once you have good enough assistants, the most efficient way to work is no longer typing and staring at a screen — it's listening, thinking, and speaking.
未来人人都是老板。所以你现在就应该学习老板的工作方式。
In the future, everyone is an executive. So you should start learning to work like one now.
这也是我每天在做的事情——研究怎么让人和 AI 之间的语音交互变得真正好用。这条路还很长,但方向我很确定。这个博客也会持续探讨这个话题:AI 如何让每个人都能用 "老板的方式" 工作。
This is what I work on every day — making voice interaction between humans and AI genuinely useful. The road ahead is long, but the direction is clear to me. This blog will keep exploring this thread: how AI lets every person work in the "executive way."
你觉得呢?你更愿意打字还是说话跟 AI 交流?欢迎留言聊聊。
What do you think — would you rather type or talk to AI? Drop a comment.
用语音指挥 AI,从唤达开始
Talk to your AI agents — start with HeraldVox
唤达是一个 Coding Agent 的语音交互层。手机和电脑都能用,端到端加密,当前版本完全免费。
HeraldVox is a voice interface for Coding Agents. Works on mobile and desktop, end-to-end encrypted, completely free right now.
免费开始使用 Start for free