← Content
Engineering · 8 min read · April 28, 2026

Tessera: Cache-Line Encryption for Edge AI Without Bandwidth Loss

A hardware architecture that decrypts neural network weights at 64-byte granularity, hiding cryptographic overhead within DRAM fetch latency on shared-memory edge accelerators.

Source: arxiv/cs.LG · Animan Naskar · open original ↗ ↗
Share: X LinkedIn

Tessera decrypts DNN weights inline at cache-line granularity, achieving near-zero overhead on UMA edge devices by parallelizing AES-256-CTR with DRAM access.

  • UMA systems expose plaintext model weights to OS-level and physical attacks because CPU and NPU share DRAM.
  • Page-level encryption (4 KB granularity) wastes bandwidth fetching entire pages for small tensor tiles, incurring up to 32x penalty.
  • Tessera intercepts 64-byte AXI bursts and computes AES-256-CTR keystreams in parallel with DRAM fetches, hiding crypto latency.
  • Decrypted weights stream directly into isolated NPU SRAM, eliminating permanent memory carve-outs required by trusted execution environments.
  • Measured across three SoC platforms, Tessera achieves 98.4% of theoretical bandwidth with only 1.6% overhead.
  • Architecture neutralizes DRAM extraction, rogue DMA, and compute hijacking attacks while preventing plaintext leakage across sparse tensors.
  • Design maintains constant 1x memory footprint across all layer geometries, unlike page-level schemes that degrade with irregular tensor shapes.

Frequently asked

  • Page-level encryption operates at 4 KB granularity. When a neural network layer accesses a small tensor tile (e.g., 64 bytes), the system must fetch the entire 4 KB page, decrypt it, and extract the needed bytes. This forces unnecessary data movement and cache pollution. Tessera avoids this by decrypting at 64-byte cache-line granularity, matching the actual memory access size.

Related