Face-to-face AI interaction that feels human
Face-to-face AI interaction that feels human
Face-to-face AI interaction that feels human
We are building a human foundation model with emotional intelligence — it reads tone, expression, and even hesitation, and responds in real time.
We are building a human foundation model with emotional intelligence — it reads tone, expression, and even hesitation, and responds in real time.
We are building a human foundation model with emotional intelligence — it reads tone, expression, and even hesitation, and responds in real time.
backed by
backed by








The Mission
The Mission
Most of a conversation is never said. A raised eyebrow. A pause before answering. A shift in tone. People communicate constantly through everything beyond words.
Every interface in computing has moved closer to the human. We are creating a world where people can finally talk to every product, face to face.
Most of a conversation is never said. A raised eyebrow. A pause before answering. A shift in tone. People communicate constantly through everything beyond words.
Every interface in computing has moved closer to the human. We are creating a world where people can finally talk to every product, face to face.
Most of a conversation is never said. A raised eyebrow. A pause before answering. A shift in tone. People communicate constantly through everything beyond words.
Every interface in computing has moved closer to the human. We are creating a world where people can finally talk to every product, face to face.
The Full-Duplex Engine
The Full-Duplex Engine
Human conversation is a simultaneous act. You perceive while you speak, react while you listen, and respond within a few hundred milliseconds — all at once, never in turns.
Nuance Labs is building the first audiovisual model that works this way: one system that sees, hears, reasons, speaks, and expresses in the same moment, in real time.
Human conversation is a simultaneous act. You perceive while you speak, react while you listen, and respond within a few hundred milliseconds — all at once, never in turns.
Nuance Labs is building the first audiovisual model that works this way: one system that sees, hears, reasons, speaks, and expresses in the same moment, in real time.
Human conversation is a simultaneous act. You perceive while you speak, react while you listen, and respond within a few hundred milliseconds — all at once, never in turns.
Nuance Labs is building the first audiovisual model that works this way: one system that sees, hears, reasons, speaks, and expresses in the same moment, in real time.

Expressivity & naturalness
Expressivity & naturalness
Speech and video that cross the uncanny valley
Speech and video that cross the uncanny valley

Real-time interactivity
Real-time interactivity
500ms is the hard floor for conversation latency
500ms is the hard floor for conversation latency

Engaging personality
Engaging personality
Warm, engaging, personality that is genuinely interesting to talk to
Warm, engaging, personality that is genuinely interesting to talk to

Active listening
Active listening
Interjections, nodding, backchanneling — the full texture of real dialogue
Interjections, nodding, backchanneling — the full texture of real dialogue




multimodal intelligence
multimodal intelligence
Human Foundation Model
Human Foundation Model
Human Foundation Model
Human Foundation Model
Auto-regressive transformers learned language through next-token prediction.
The same machinery can learn human behavior—predicting the next audio and visual token instead of the next word. We're building that model.
Auto-regressive transformers learned language through next-token prediction.
The same machinery can learn human behavior—predicting the next audio and visual token instead of the next word. We're building that model.
Auto-regressive transformers learned language through next-token prediction.
The same machinery can learn human behavior—predicting the next audio and visual token instead of the next word. We're building that model.
Our Team
Our Team
Built by PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins—with experience from Apple, Meta, Amazon AGI, and Discord.
Built by PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins—with experience from Apple, Meta, Amazon AGI, and Discord.
Built by PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins—with experience from Apple, Meta, Amazon AGI, and Discord.
Fangchang Ma
Cofounder & CEO
PhD in Robotics/Machine Learning from MIT; previously an Engineering Manager at Apple.
Edward Zhang
Cofounder & CTO
PhD in Computer Graphics from the University of Washington; previously a Senior Research Scientist at Apple.
Karren Yang
Cofounder & Chief Scientist
PhD in Audio-visual synthesis from MIT; previously a Senior Research Scientist at Apple.


