Face-to-face AI interaction that feels human

We are building a human foundation model with emotional intelligence — it reads tone, expression, and even hesitation, and responds in real time.

Get Demo

Join the team

backed by

The Mission

Most of a conversation is never said. A raised eyebrow. A pause before answering. A shift in tone. People communicate constantly through everything beyond words.

Every interface in computing has moved closer to the human. We are creating a world where people can finally talk to every product, face to face.

Most of a conversation is never said. A raised eyebrow. A pause before answering. A shift in tone. People communicate constantly through everything beyond words.

Every interface in computing has moved closer to the human. We are creating a world where people can finally talk to every product, face to face.

Most of a conversation is never said. A raised eyebrow. A pause before answering. A shift in tone. People communicate constantly through everything beyond words.

Every interface in computing has moved closer to the human. We are creating a world where people can finally talk to every product, face to face.

The Full-Duplex Engine

Human conversation is a simultaneous act. You perceive while you speak, react while you listen, and respond within a few hundred milliseconds — all at once, never in turns.

Nuance Labs is building the first audiovisual model that works this way: one system that sees, hears, reasons, speaks, and expresses in the same moment, in real time.

Human conversation is a simultaneous act. You perceive while you speak, react while you listen, and respond within a few hundred milliseconds — all at once, never in turns.

Nuance Labs is building the first audiovisual model that works this way: one system that sees, hears, reasons, speaks, and expresses in the same moment, in real time.

Human conversation is a simultaneous act. You perceive while you speak, react while you listen, and respond within a few hundred milliseconds — all at once, never in turns.

Nuance Labs is building the first audiovisual model that works this way: one system that sees, hears, reasons, speaks, and expresses in the same moment, in real time.

Expressivity & naturalness

Speech and video that cross the uncanny valley

Real-time interactivity

500ms is the hard floor for conversation latency

Engaging personality

Warm, engaging, personality that is genuinely interesting to talk to

Active listening

Interjections, nodding, backchanneling — the full texture of real dialogue

multimodal intelligence

Human Foundation Model

Auto-regressive transformers learned language through next-token prediction.

The same machinery can learn human behavior—predicting the next audio and visual token instead of the next word. We're building that model.

Auto-regressive transformers learned language through next-token prediction.

The same machinery can learn human behavior—predicting the next audio and visual token instead of the next word. We're building that model.

Auto-regressive transformers learned language through next-token prediction.

The same machinery can learn human behavior—predicting the next audio and visual token instead of the next word. We're building that model.

Our Team

Built by PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins—with experience from Apple, Meta, Amazon AGI, and Discord.

Fangchang Ma

Cofounder & CEO

PhD in Robotics/Machine Learning from MIT; previously an Engineering Manager at Apple.

Edward Zhang

Cofounder & CTO

PhD in Computer Graphics from the University of Washington; previously a Senior Research Scientist at Apple.

Karren Yang

Cofounder & Chief Scientist

PhD in Audio-visual synthesis from MIT; previously a Senior Research Scientist at Apple.

Face-to-face AI interaction that feels human

Face-to-face AI interaction that feels human

Face-to-face AI interaction that feels human

backed by

backed by

The Mission

The Mission

The Full-Duplex Engine

The Full-Duplex Engine

Expressivity & naturalness

Expressivity & naturalness

Real-time interactivity

Real-time interactivity

Engaging personality

Engaging personality

Active listening

Active listening

multimodal intelligence

multimodal intelligence

Human Foundation Model