Why does page-level encryption waste bandwidth on edge accelerators?

Page-level encryption operates at 4 KB granularity. When a neural network layer accesses a small tensor tile (e.g., 64 bytes), the system must fetch the entire 4 KB page, decrypt it, and extract the needed bytes. This forces unnecessary data movement and cache pollution. Tessera avoids this by decrypting at 64-byte cache-line granularity, matching the actual memory access size.

How does Tessera hide cryptographic latency?

Tessera intercepts 64-byte AXI memory bursts and computes AES-256-CTR keystreams in parallel with the DRAM fetch. By the time the encrypted data arrives from DRAM, the keystream is ready. The decryption XOR happens inline, and plaintext flows directly into the NPU's isolated SRAM. This parallelization means the crypto adds no extra latency beyond the standard DRAM access time.

What attacks does Tessera protect against?

Tessera defends against physical DRAM extraction (an attacker cannot read plaintext weights from memory), rogue DMA (a compromised device cannot access plaintext), compute hijacking (the NPU receives only decrypted data in isolated SRAM), and OS-level attacks (a compromised kernel cannot read plaintext from shared DRAM). It does not address side-channel attacks or key management vulnerabilities.

← Content

Engineering · 8 min read · April 28, 2026

Tessera: Cache-Line Encryption for Edge AI Without Bandwidth Loss

A hardware architecture that decrypts neural network weights at 64-byte granularity, hiding cryptographic overhead within DRAM fetch latency on shared-memory edge accelerators.

Source: arxiv/cs.LG · Animan Naskar · open original ↗ ↗

Share: X LinkedIn

Tessera decrypts DNN weights inline at cache-line granularity, achieving near-zero overhead on UMA edge devices by parallelizing AES-256-CTR with DRAM access.

— UMA systems expose plaintext model weights to OS-level and physical attacks because CPU and NPU share DRAM.
— Page-level encryption (4 KB granularity) wastes bandwidth fetching entire pages for small tensor tiles, incurring up to 32x penalty.
— Tessera intercepts 64-byte AXI bursts and computes AES-256-CTR keystreams in parallel with DRAM fetches, hiding crypto latency.
— Decrypted weights stream directly into isolated NPU SRAM, eliminating permanent memory carve-outs required by trusted execution environments.
— Measured across three SoC platforms, Tessera achieves 98.4% of theoretical bandwidth with only 1.6% overhead.
— Architecture neutralizes DRAM extraction, rogue DMA, and compute hijacking attacks while preventing plaintext leakage across sparse tensors.
— Design maintains constant 1x memory footprint across all layer geometries, unlike page-level schemes that degrade with irregular tensor shapes.

Frequently asked

Page-level encryption operates at 4 KB granularity. When a neural network layer accesses a small tensor tile (e.g., 64 bytes), the system must fetch the entire 4 KB page, decrypt it, and extract the needed bytes. This forces unnecessary data movement and cache pollution. Tessera avoids this by decrypting at 64-byte cache-line granularity, matching the actual memory access size.

#encryption #edgeai #dnn #hardware #security #bandwidth

Tessera: Cache-Line Encryption for Edge AI Without Bandwidth Loss

Frequently asked

Vibe Coding Triggers a Dopamine Loop That Undermines Engineering Judgment

Deterministic Routing Cuts Tail Latency by Aligning Requests With Data

How GCP Architects Should Actually Use Generative AI