Face-to-face AI interaction that feels human

Face-to-face AI interaction that feels human

Face-to-face AI interaction that feels human

We are building a human foundation model with emotional intelligence — it reads tone, expression, and even hesitation, and responds in real time.

We are building a human foundation model with emotional intelligence — it reads tone, expression, and even hesitation, and responds in real time.

We are building a human foundation model with emotional intelligence — it reads tone, expression, and even hesitation, and responds in real time.

backed by
backed by
Accel
Lightspeed
South Park Commons
NVIDIA
Accel
Lightspeed
South Park Commons
NVIDIA

The Mission

The Mission

Most of a conversation is never said. A raised eyebrow. A pause before answering. A shift in tone. People communicate constantly through everything beyond words.

Every interface in computing has moved closer to the human. We are creating a world where people can finally talk to every product, face to face.

Most of a conversation is never said. A raised eyebrow. A pause before answering. A shift in tone. People communicate constantly through everything beyond words.

Every interface in computing has moved closer to the human. We are creating a world where people can finally talk to every product, face to face.

Most of a conversation is never said. A raised eyebrow. A pause before answering. A shift in tone. People communicate constantly through everything beyond words.

Every interface in computing has moved closer to the human. We are creating a world where people can finally talk to every product, face to face.

The Full-Duplex Engine

The Full-Duplex Engine

Human conversation is a simultaneous act. You perceive while you speak, react while you listen, and respond within a few hundred milliseconds — all at once, never in turns.

Nuance Labs is building the first audiovisual model that works this way: one system that sees, hears, reasons, speaks, and expresses in the same moment, in real time.

Human conversation is a simultaneous act. You perceive while you speak, react while you listen, and respond within a few hundred milliseconds — all at once, never in turns.

Nuance Labs is building the first audiovisual model that works this way: one system that sees, hears, reasons, speaks, and expresses in the same moment, in real time.

Human conversation is a simultaneous act. You perceive while you speak, react while you listen, and respond within a few hundred milliseconds — all at once, never in turns.

Nuance Labs is building the first audiovisual model that works this way: one system that sees, hears, reasons, speaks, and expresses in the same moment, in real time.

Expressivity & naturalness
Expressivity & naturalness

Speech and video that cross the uncanny valley

Speech and video that cross the uncanny valley

Real-time interactivity
Real-time interactivity

500ms is the hard floor for conversation latency

500ms is the hard floor for conversation latency

Engaging personality
Engaging personality

Warm, engaging, personality that is genuinely interesting to talk to

Warm, engaging, personality that is genuinely interesting to talk to

Active listening
Active listening

Interjections, nodding, backchanneling — the full texture of real dialogue

Interjections, nodding, backchanneling — the full texture of real dialogue

multimodal intelligence
multimodal intelligence

Human Foundation Model

Human Foundation Model

Human Foundation Model

Human Foundation Model

Auto-regressive transformers learned language through next-token prediction.

The same machinery can learn human behavior—predicting the next audio and visual token instead of the next word. We're building that model.

Auto-regressive transformers learned language through next-token prediction.

The same machinery can learn human behavior—predicting the next audio and visual token instead of the next word. We're building that model.

Auto-regressive transformers learned language through next-token prediction.

The same machinery can learn human behavior—predicting the next audio and visual token instead of the next word. We're building that model.

Our Team

Our Team

Built by PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins—with experience from Apple, Meta, Amazon AGI, and Discord.

Built by PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins—with experience from Apple, Meta, Amazon AGI, and Discord.

Built by PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins—with experience from Apple, Meta, Amazon AGI, and Discord.

Fangchang Ma

Cofounder & CEO

PhD in Robotics/Machine Learning from MIT; previously an Engineering Manager at Apple.

Edward Zhang

Cofounder & CTO

PhD in Computer Graphics from the University of Washington; previously a Senior Research Scientist at Apple.

Karren Yang

Cofounder & Chief Scientist

PhD in Audio-visual synthesis from MIT; previously a Senior Research Scientist at Apple.

Join us in building the emotional layer of artificial intelligence

© 2025–2026 Nuance Labs. All rights reserved.

Join us in building the emotional layer of artificial intelligence

© 2025–2026 Nuance Labs. All rights reserved.

Join us in building the emotional layer of artificial intelligence

© 2025–2026 Nuance Labs. All rights reserved.