产品是什么 What is HeraldVox
唤达(HeraldVox)是一个 Coding Agent 的语音交互层。核心逻辑很简单:用户说话 → 语音识别 → 自然语言理解 → 指令下发给 Coding Agent → Agent 执行 → 结果通过 TTS 反馈给用户。
HeraldVox is a voice interface layer for Coding Agents. The core logic is simple: user speaks → speech recognition → natural language understanding → command to Coding Agent → Agent executes → result fed back via TTS.
问题:命令行把大部分人挡在了门外 The Problem: Terminals Lock Most People Out
Coding Agent 是目前使用 AI 最强大的方式——它能操作电脑、执行命令、写代码、自动化几乎一切。但它的默认界面是这样的:
Coding Agents are the most powerful way to use AI right now — they can operate your computer, run commands, write code, automate almost everything. But the default interface looks like this:
我家有两个小孩,一岁多和三岁多。大量时间花在陪孩子和做家务上,需要反复离开电脑。但每次给 Agent 下指令,我都必须坐下来,打开终端,打字。每次离开,Agent 就停在那里等我。
I have two kids, one just over 1 and one just over 3. I spend large chunks of my day with them, away from my desk. But every time I needed to give an Agent a new instruction, I had to sit down, open a terminal, type. Every time I left, the Agent just sat there waiting.
问题不是 AI 不够强。问题是交互方式把我锁死在了电脑前。
The problem wasn't that AI wasn't powerful enough. The problem was that the interface chained me to my desk.
语音引擎:全链路自研 Voice Engine: Fully Self-Built Pipeline
这是整个产品技术复杂度最高的部分。从唤醒词到最终 TTS 输出,全链路都是自研的,不依赖任何第三方语音平台。
This is the most technically complex part of the product. From wake word to final TTS output, the entire pipeline is self-built — no third-party voice platform dependencies.
唤醒词检测采用本地离线方案,不依赖网络,激活前的语音完全不经过任何服务器。STT 针对技术术语做了专项优化——Coding Agent 场景下用户会说大量函数名、组件名、文件路径,通用 STT 识别率往往不够。TTS 采用流式方案,边生成边合成,显著降低首字延迟。
Wake word detection runs locally and offline — no audio reaches any server before activation. STT is optimized for technical terminology: in Coding Agent scenarios, users say many function names, component names, and file paths that standard STT handles poorly. TTS uses a streaming approach — synthesize as the text generates — which significantly reduces time-to-first-audio.
支持的 Agent Supported Agents
目前支持四个主流 Coding Agent,可以同时开多个并行运行,用语音一句话切换。
Currently supports four major Coding Agents. Run multiple in parallel, switch between them with a single voice command.
端到端加密通信 End-to-End Encrypted Communication
手机端控制桌面 Agent 的通信基于 libsodium,使用 X25519 密钥交换 + XSalsa20-Poly1305 对称加密。中继服务器只转发加密后的数据包,无法读取任何内容。
Mobile-to-desktop Agent communication uses libsodium: X25519 key exchange + XSalsa20-Poly1305 symmetric encryption. The relay server only forwards encrypted packets — it can't read anything.
底层基于 16,000+ Star 开源项目,代码可审计,零数据收集
Built on an open-source foundation with 16,000+ GitHub stars — fully auditable, zero data collection
怎么做到的:用 AI Agent 做 AI Agent How I Did It: Using AI Agents to Build AI Agents
这件事最有意思的地方是:我用来做唤达的工具,就是唤达要支持的那些工具。我的工作流是这样的:
The most interesting thing about this project: the tools I used to build HeraldVox are the exact tools it now supports. My workflow looked like this:
边带俩娃边创业。白天陪孩子,晚上写代码。有时候凌晨三点爬起来调 bug,六点又被小的那个哭醒。这个节奏持续了将近一年。整个产品的核心业务代码量超过了 20 万行。
Building while raising two kids. Days with them, nights writing code. Sometimes up at 3am debugging, back up at 6 when the younger one cried. That rhythm went on for nearly a year. The product's core business code now exceeds 200,000 lines.
现在可以用了
It's live now
Web App,手机和电脑都做了适配,不用下载安装。当前版本完全免费。
Web app — works on mobile and desktop, no install needed. Current version is completely free.
免费开始使用 Start for free免费使用 · 端到端加密 · 零数据收集
Free · End-to-end encrypted · Zero data collection