1.85×
faster than Float-FAISS-GPU at msmarco-10M (H100 PCIe)
AQEA Gen-5 · Shader
Hyperscale vector search on the AQEA Gen-5 Substrate — bit-identical across CUDA, Vulkan, Metal and WebGPU.
Section 01
1.85×
faster than Float-FAISS-GPU at msmarco-10M (H100 PCIe)
23×
smaller index storage at equal retrieval recall
100%
Bit-Identity verified vs CPU brute-force reference
4 / 4
GPU backends bit-identical (Vulkan · Metal · WebGPU · CUDA)
99.68%
reversible decoding at 1M scale
13 / 13
cross-modal validation domains above 80% floor
Today's production retrieval stacks (FAISS-GPU, NeMo Retriever, RAPIDS cuVS) are CUDA-only by construction. AQEA Shader is the only system that delivers exact (not approximate) results across NVIDIA, AMD, Apple, Intel, ARM and browser hardware — from a single shader source.
Section 02
Most ANN systems trade recall for speed. AQEA Shader does not. Bit-Identity means the Top-K set we return equals the brute-force Top-K — same documents, same order, same distance values, byte-identical against a brute-force reference.
We verified this against an independent CPU reference at both 1M and 10M corpus scales, and across NVIDIA-Vulkan vs Apple-Metal hardware.
Section 03
CPU Pipeline
GPU Pipeline
Section 04
FAISS-GPU, NeMo Retriever, RAPIDS cuVS — all CUDA-only by construction. For hyperscalers diversifying away from single-vendor compute (AMD MI-series, custom silicon, Apple-based edge), for enterprises with multi-cloud or sovereign-cloud requirements, and for edge/browser deployments where CUDA isn't present — there is no portable, exact alternative. We are it.
Section 05
| Workload | AQEA R@10 | Float R@10 | Ratio | QPS (batch) | Energy Ratio |
|---|---|---|---|---|---|
| msmarco-100k (CPU) | 0.9472 | 0.9725 | 97.4% | 13.7× higher | 6.6× lower |
| msmarco-1M (GPU) | — | — | — | 1.63× faster (p50) | 1.73× lower |
| msmarco-10M (GPU) | — | — | 100% Bit-Identity | 1.85× faster (p50) | ~2.5× lower (est.) |
Section 06
Step 01
4–6 weeks
Joint benchmark on your retrieval workload and hardware. Recommended starting point.
Step 02
8–12 weeks
Shadow-mode deployment in your production retrieval stack.
Step 03
Engagement
Hardware-specific tuning, encoder-family extensions, bespoke deployment.
Run our test fixtures on your own hardware — Apple M3 to NVIDIA H100 to AMD MI300X.