Model Processing Unit

One Model.
One Chip.
Instant Inference.

MPU bakes neural network weights directly into silicon — no memory, no bottleneck. A full multimodal LLM on a chip smaller than a SIM card.

Get in Touch How It Works

0.73 -1.2 0.41 2.08 -0.5 1.67 0.92 -3.1

Inference hardware is specializing

GPU

Any model, any framework

~350

TPU / NPU

NN-optimized datapath

—

Inference ASIC

On-chip SRAM, deterministic

~2,000

Arch-ASIC

Transformer-only silicon

~5,000

Model-ASIC

Weights are the circuit. Zero memory.

15K+

tok/s · Llama 8B equivalent

The Idea

Weights are the circuit.

Every inference chip today reads model weights from memory. That read is the bottleneck.

MPU skips it. We encode weights as physical wiring on the chip — no DRAM, no SRAM, no memory wall. The model doesn't run on the chip. It is the chip.

One chip, one model. When models are cheap and inference is expensive, dedicated silicon wins.

Why Now

The window for model-specific silicon is open.

Models are stable enough for silicon. Agent frameworks reach human performance in targeted domains. Architectures aren't changing every quarter — it's time to commit to hardware.

60%+ of AI compute is inference. H100s cost $2–7/hr and are prohibitive for edge. The bottleneck isn't training — it's running models at scale.

Billions of edge devices need local AI. AR glasses, drones, robots, vehicles — all need real-time inference at under 5 watts. No existing chip can do this.

First-mover window is open. Etched ($5B), Groq (~$20B), and Taalas ($219M) prove the thesis — but all target cloud. No one has shipped a model-specific edge ASIC.

First Product

MPU Nano

A complete multimodal LLM — text, speech, and vision — on a single chip smaller than a nano SIM card.

Smaller than a nano SIM card

One unified brain — not three models stitched together.

Text, speech, and vision share a single semantic core hardwired in metal. Frontends and task heads are lightweight and swappable. Adding a modality costs a small endpoint — not another chip.

Per-block power gating adapts to the task. A simple voice command uses a fraction of the chip. Full multimodal reasoning lights up everything — still under 5 watts.

Performance 15K+ tok/s (Llama 8B equivalent)

Modality Text, speech, vision — unified core

Power Under 5W peak, ~1W idle inference

Size ~100mm² — embeddable module

Starting where GPUs can't reach

AI/AR Glasses

112M units by 2030

120 FPS, <10ms latency, <5W, ~100mm². No existing GPU or NPU meets all four constraints simultaneously.

Autonomous Drones

10M+ units by 2030

On-board AI currently cuts flight time by 80%. Ultra-low power inference changes the trade-off entirely.

One Model.One Chip.Instant Inference.

GPU