Skip to content

Architecture

Protoface is built as three planes, each with a single job. You only ever talk to the public surface — the rest is internal, but understanding the shape helps you reason about latency, capacity, and failure modes.

┌─────────────────────────────────────────────────────────────┐
│ Public surface │
│ REST API (/v1/*), SDKs, docs, examples │
└──────────────────────────┬──────────────────────────────────┘
│ public API contract (this site)
┌──────────────────────────▼──────────────────────────────────┐
│ Control plane (SaaS) │
│ auth, orgs/keys, avatars, sessions, jobs, │
│ worker registry, scheduler, usage, session events │
└──────────────────────────┬──────────────────────────────────┘
│ worker protocol (internal)
┌──────────────────────────▼──────────────────────────────────┐
│ Worker / data plane │
│ job runner → avatar runtime → transport → media out │
│ heartbeat, healthcheck, usage events │
└─────────────────────────────────────────────────────────────┘
  • Public surface — the REST API at https://api.protoface.com/v1 plus the SDKs and docs. It never talks to a worker directly.
  • Control plane — the only persistent source of truth: accounts, API keys, avatars, sessions, the scheduler, usage aggregation, and the per-session event timeline.
  • Worker / data plane — GPU workers that run the avatar model and publish media over a transport. Workers are portable and self-contained; they receive a typed job and run it.

Creating a session is asynchronous by design. POST /v1/sessions validates, allocates an ID, writes the session and a worker job, and returns — all in under 300 ms (p95). It does not block on worker availability.

  1. created

    The API accepted your request and queued a job. You have a sess_... ID.

  2. queued

    The job is written and awaiting a worker claim.

  3. starting

    A worker claimed the job and is initializing (joining your room, warming up).

  4. running

    The avatar is publishing video. first_frame_at is set on the first frame; running is reported after sustained output.

  5. ending → ended / failed / canceled

    The session drains and terminates — cleanly (ended), on error (failed), or because you ended it before it ran (canceled).

These are the performance targets the API is designed around. v0 runs the mock runtime; the “real model” column applies once the production face model ships.

| Metric | Target (mock, v0) | Target (real, later) | | ----------------------------------- | ----------------- | -------------------- | | Session create → 201 (p95) | < 300 ms | < 300 ms | | Create → first frame, cold | < 5 s | < 3 s | | Create → first frame, warm | < 1 s | < 500 ms | | Steady-state audio → video latency | < 400 ms | < 250 ms | | Output frame rate | 25 fps | 25–30 fps | | Recoverable reconnect window | 10 s | 10 s |

These targets surface as session events (session.first_frame, session.audio_gap_started, session.audio_gap_recovered) on the internal event timeline.

API keys are environment-scoped. A staging key never authenticates against prod and vice versa — the key prefix is the same (sk_live_), but it’s bound to the environment it was minted in.

| Environment | Who | Base URL | LiveKit | | ----------- | -------------------- | ---------------------- | ---------------------- | | prod | Customers | api.protoface.com | Yours (BYO) | | staging | Internal / CI | api.staging.protoface.com | Shared sandbox | | local | Engineer laptop | localhost | Your own sandbox |

The control plane is deployed identically across environments — config flips, not code branches.

  • The public surface never reaches a worker, so an issue in the data plane can’t cascade into your API calls.
  • The control plane is the single source of truth and never imports worker-specific code (no model libraries, no LiveKit SDK).
  • Workers are portable and stateless per job, which is what lets the scheduler place, reclaim, and scale them freely.