High-Performance On-Device Vector Search for the Edge
高性能端侧暴力向量搜索引擎
QingMing (青冥) is a cross-platform, brute-force vector search engine engineered for the edge. By optimizing for memory bandwidth and cache residency, it achieves exact (or near-exact) recall with zero index construction time.
青冥是一个专为边缘计算打造的跨平台暴力向量搜索引擎。通过极致优化内存带宽和缓存驻留,它在无需构建索引的情况下,实现了全精度的检索效果。
Supported Architectures / 支持架构:
- NVIDIA GPU:
qingming.cu(Optimized for RTX 5090D / Server GPUs) - AMD GPU:
qingming.cpp(HIP/ROCm for 7900 XTX) - Mobile NPU/CPU:
qingming-mobile.cpp(NEON-accelerated for Snapdragon/Apple Silicon)
Data Source / 数据来源: ANN-Benchmarks Datasets
Environment: Ubuntu 24.04 + CUDA 12.8
# Compiler Version: nvcc 12.8.93 (sm_120)
nvcc -O3 -arch=sm_120 qingming-flat.cu -o flat-hdf5 \
-I/usr/include/hdf5/serial -L/usr/lib/x86_64-linux-gnu/hdf5/serial \
-lhdf5_cpp -lhdf51. SIFT-1M (128-dim)
Result: 9354 QPS @ ~5.5ms Latency (Batch=10k) | Recall@1: 99.26%
=======================================================
QINGMING-ENGINE v1.0.0 PRO [GPU-FLAT]
ADAPTER: NVIDIA RTX 5090D v2 (24GB)
=======================================================
Dataset Vectors : 1,000,000
Dimension : 128
VRAM Usage : 497.17 MB
-------------------------------------------------------
Recall@1 : 99.260 % (FP32 Precision)
Recall@10 : 100.000 %
-------------------------------------------------------
Max Throughput : 9354.902 QPS
Latency P50 : 5.365 ms
Latency P99 : 5.573 ms
=======================================================
2. GIST-1M (960-dim)
Result: High-Dimensional Throughput Test
Dataset Vectors : 1,000,000
Dimension : 960
VRAM Usage : 3702.74 MB
-------------------------------------------------------
Recall@1 : 99.400 %
Recall@10 : 100.000 %
-------------------------------------------------------
Max Throughput : 621.076 QPS
Latency P99 : 1.186 ms
=======================================================
3. Deep-10M (96-dim)
Result: 10 Million Vectors Flat Search
Dataset Vectors : 9,990,000
Dimension : 96
VRAM Usage : 3666.12 MB
-------------------------------------------------------
Recall@1 : 99.960 %
Recall@10 : 99.990 %
-------------------------------------------------------
Max Throughput : 1137.779 QPS
Latency P99 : 1.188 ms
=======================================================
Environment: Ubuntu 24.04 + ROCm 6.2
/opt/rocm-6.2.4/bin/amdclang++ -x hip -O3 --offload-arch=gfx1100 qingming.cpp -o qingming_amd \
-I/usr/include/hdf5/serial -L/usr/lib/x86_64-linux-gnu/hdf5/serial \
-lhdf5_cpp -lhdf5SIFT-1M (128-dim)
=======================================================
QINGMING-ENGINE v1.0.0 PRO [REDMOON]
PLATFORM: AMD Radeon RX 7900 XTX (24GB)
Dataset Vectors : 1,000,000
Dimension : 128
Recall@1 : 99.260 %
Max Throughput : 6275.720 QPS
Latency P99 : 11.214 ms
=======================================================
GIST-1M (960-dim)
Dataset Vectors : 1,000,000
Dimension : 960
Recall@1 : 99.400 %
Max Throughput : 470.285 QPS
Latency P99 : 25.755 ms
=======================================================
Core Tech: NEON SIMD + L3 Cache Residency (Zero-Copy)
$TOOLCHAIN/aarch64-linux-android34-clang++ -O3 -static-libstdc++ -flto \
-march=armv8.2-a+fp16+dotprod qingming-mobile.cpp -o qingming_8gen5adb push qingming_8gen5 /data/local/tmp/
adb shell chmod +x /data/local/tmp/qingming_8gen5
adb shell /data/local/tmp/qingming_8gen5 100000Latency: Brute-force search over 100k 128-dim vectors achieves ~8ms latency per query.
Power: 1000 consecutive queries resulted in negligible thermal increase.
QingMing Engine offers a dual-licensing model optimized for Edge and Server deployments. 青冥引擎提供针对边缘端和服务器端的双轨授权模式。
Target: Android, iOS, Automotive (Smart Cockpit), IoT, Smart Cameras.
Tech: ARM CPU + NEON Optimization.
| Region | Pricing Strategy | Terms |
|---|---|---|
| Global | $9.99 / device / year | Converts to perpetual license after 3 years. Or $19.99 one-time perpetual. |
| China | ¥9.99 / 设备 / 年 | 3 年后自动转为永久授权。 或 ¥19.99 一次性永久授权。 |
Target: NVIDIA / AMD Servers, Workstations.
Tech: CUDA / HIP.
| Region | Pricing Strategy | Terms |
|---|---|---|
| Global | $99 / GPU / year | Or $199 one-time perpetual. |
| China | ¥99 / GPU / 年 | 或 ¥199 一次性永久授权。 |
Note: Vehicles/IoT devices using the CPU-based
qingming-mobile.cppfall under Mobile Pricing, even if a GPU is present.
Email: zhangxiaolong950@gmail.com
Volume licensing, source code access, and custom enterprise integration available upon request.
支持批量授权、源码交付与企业级定制方案。