hanfei-shu 术

Making zero-knowledge proofs run on GPUs.

"法不阿贵,绳不挠曲" — The plumb line does not bend for the crooked. — 韩非子

hanfei-shu data flow: Your Code → Smart Dispatch → GPU Kernel (locked) → Verified Result

What does this do?

If you're building on Halo2 zero-knowledge proofs — for AI verification, blockchain privacy, or anything with the Pallas curve — there's one operation eating most of your proving time: multi-scalar multiplication (MSM).

MSM is a giant dot product on elliptic curve points. It takes 60-70% of proof generation time. On CPU, half a million points takes ~300ms. That adds up.

hanfei-shu moves this to your NVIDIA GPU. Same math, same results, just faster.

How much faster?

RTX 3090 vs Ryzen 9 5950X CPU Pippenger — v0.2.1 installed from crates.io:

PointsCPUGPUSpeedup
64K (k=16)516ms110ms4.7x
128K (k=17)961ms202ms4.8x
256K (k=18)1780ms386ms4.6x
512K (k=19)3453ms1043ms3.3x
1M (k=20)6744ms1776ms3.8x

Every result is bit-exact — the GPU produces the same answer as the CPU.

No alternatives exist

LibraryBN254BLS12-381Pallas
ICICLEYesYesNo
cuZKYesNoNo
BlitzarYesYesNo
hanfei-shuYes

Quick start

# Cargo.toml
[dependencies]
hanfei-shu = { git = "https://github.com/GeoffreyWang1117/hanfei-shu" }
use hanfei_shu::{gpu_best_multiexp, is_gpu_available};
use hanfei_shu::cpu::pippenger_msm; // CPU reference

let gpu = gpu_best_multiexp(&scalars, &bases);
let cpu = pippenger_msm(&scalars, &bases);
assert_eq!(gpu, cpu); // bit-exact

"But where's the GPU code?"

The CUDA kernels are prebuilt compiled libraries, not source code. Same model as NVIDIA's cuBLAS or Intel's MKL.

Why: You don't need CUDA Toolkit installed. cargo add hanfei-shu and it works.

The Rust code is fully open — including a complete CPU Pippenger implementation so you can verify every GPU result yourself.

What's included

GPU MSM
CUDA kernels for A100, RTX 3090, RTX 4090. Auto-detects your GPU.
CPU Pippenger
Pure Rust reference implementation. Works without GPU.
Smart dispatch
Automatically uses GPU for large inputs, CPU for small ones.
5 examples
AI inference demo, CPU vs GPU comparison, full benchmarks.

Part of HanFei 韩非

CrateConceptWhat
hanfei-shuTechniqueGPU MSM engine (this crate)
hanfei-shi 势PowerFull GPU proving pipeline (planned)
hanfei-fa 法LawZK proof framework (planned)

Extracted from ChainProve, a system for verifiable transformer inference with zero-knowledge proofs.