voir / lens · i

The instrument that sees and is seen.

An instrument that addresses the room, not the screen. A presence held in space, with eye contact returned. A conversation that does not leave the device unless you decide it should.

What it does

Speak into the room. The instrument hears the field around the words — reverberation, distance, the second voice at the kitchen table. The answer is delivered in the same register the room offered up.

The presence holds eye contact when you look back. It mouths what it says — not as performance, as fidelity to the moment that produced the line.

How it works

Spatial intelligence runs first — surfaces, depth, hand pose, the body in the chair. A live transcript becomes a viseme stream. The rig drives a face that knows where you are standing. The map stays on the device; the conversation stays in the room.

What it's built on

ARKit world tracking. RealityKit composition. The MPFB avatar rig. Azure Speech visemes. ElevenLabs synthesis. Cross-mode persistence through VoirCompanionStore.

spatial intelligence · voice in the room · viseme-driven presence · persistent anchor across modes — every line of it runs on the device, in your space, in real time.

DOSSIER · LENS — FILE 0001 / ΔCLEARANCE · PUBLIC FACSIMILE

Lens of Record

FORM FACTOR: iPhone Pro, held at chest height.
OPTICS: One wide. One ultra-wide. One LiDAR.
OBSERVER: You. Singular.
LATENCY: Frame to viseme — under 80 ms.
RANGE: The room around you. The hand in front.
VOICE: Yours, in register. Theirs, in answer.

One person. One room. One minute kept on the record because you chose to keep it.

SPECIMEN REGISTRY · L · LENS APPARATUSDRAG · ZOOM · ROTATE

L-001Pinhole frustum

L-002Optical stack

L-003Iris, nine blades

L-004Room mesh, one chair

L-005Starfield, isotropic

DISCIPLINES — FOUR / SEVENTEEN

What the lens listens for.

iOptical witnessThe room is mapped before the question is asked. Light, depth, surface normal, hand pose — all kept on-device and never named back to you.OUTPUT · room mesh, pose graph
iiAcoustic witnessSpeech is heard inside its room. Reverb, distance, two voices in a small kitchen — the field is part of the meaning.OUTPUT · transcript, viseme stream
iiiEmbodied replyThe avatar mouths what it says. Eye contact when you look back. Blink every six to eight seconds. The reply is not on a screen; the reply is in the room.OUTPUT · MPFB rig, real-time
ivCross-mode memoryA conversation begun in SpaciAR survives into LearnAR. The lens carries the thread. The chambers do not start from zero.OUTPUT · VoirCompanionStore

SPECIFICATION — REDUCED FACSIMILE

FRAME RATE60fps, capped
VOICE LATENCY< 80ms, viseme to render
ON-DEVICE MODELS12running locally
CLOUD CALLSopt-inexplicit, per session
RECEIPTSretaineduntil you delete them
SUPPORTED LENSESiPhonePro, A15 and later

The instrument hears the room around the words, not just the words inside it. The instrument answers in your register, not its own. The conversation is private until you decide it is not.

— voir / charter, line 4