The instrument that sees and is seen.
An instrument that addresses the room, not the screen. A presence held in space, with eye contact returned. A conversation that does not leave the device unless you decide it should.
What it does
Speak into the room. The instrument hears the field around the words — reverberation, distance, the second voice at the kitchen table. The answer is delivered in the same register the room offered up.
The presence holds eye contact when you look back. It mouths what it says — not as performance, as fidelity to the moment that produced the line.
How it works
Spatial intelligence runs first — surfaces, depth, hand pose, the body in the chair. A live transcript becomes a viseme stream. The rig drives a face that knows where you are standing. The map stays on the device; the conversation stays in the room.
What it's built on
ARKit world tracking. RealityKit composition. The MPFB avatar rig. Azure Speech visemes. ElevenLabs synthesis. Cross-mode persistence through VoirCompanionStore.
spatial intelligence · voice in the room · viseme-driven presence · persistent anchor across modes — every line of it runs on the device, in your space, in real time.
Lens of Record
- FORM FACTOR
- iPhone Pro, held at chest height.
- OPTICS
- One wide. One ultra-wide. One LiDAR.
- OBSERVER
- You. Singular.
- LATENCY
- Frame to viseme — under 80 ms.
- RANGE
- The room around you. The hand in front.
- VOICE
- Yours, in register. Theirs, in answer.
One person. One room. One minute kept on the record because you chose to keep it.
DISCIPLINES — FOUR / SEVENTEEN
What the lens listens for.
- iOptical witnessThe room is mapped before the question is asked. Light, depth, surface normal, hand pose — all kept on-device and never named back to you.OUTPUT · room mesh, pose graph
- iiAcoustic witnessSpeech is heard inside its room. Reverb, distance, two voices in a small kitchen — the field is part of the meaning.OUTPUT · transcript, viseme stream
- iiiEmbodied replyThe avatar mouths what it says. Eye contact when you look back. Blink every six to eight seconds. The reply is not on a screen; the reply is in the room.OUTPUT · MPFB rig, real-time
- ivCross-mode memoryA conversation begun in SpaciAR survives into LearnAR. The lens carries the thread. The chambers do not start from zero.OUTPUT · VoirCompanionStore
SPECIFICATION — REDUCED FACSIMILE
- FRAME RATE60fps, capped
- VOICE LATENCY< 80ms, viseme to render
- ON-DEVICE MODELS12running locally
- CLOUD CALLSopt-inexplicit, per session
- RECEIPTSretaineduntil you delete them
- SUPPORTED LENSESiPhonePro, A15 and later
The instrument hears the room around the words, not just the words inside it. The instrument answers in your register, not its own. The conversation is private until you decide it is not.
— voir / charter, line 4