hanfei-shu 术

Making zero-knowledge proofs run on GPUs.

"法不阿贵，绳不挠曲" — The plumb line does not bend for the crooked. — 韩非子

hanfei-shu data flow: Your Code → Smart Dispatch → GPU Kernel (locked) → Verified Result

What does this do?

If you're building on Halo2 zero-knowledge proofs — for AI verification, blockchain privacy, or anything with the Pallas curve — there's one operation eating most of your proving time: multi-scalar multiplication (MSM).

MSM is a giant dot product on elliptic curve points. It takes 60-70% of proof generation time. On CPU, half a million points takes ~300ms. That adds up.

hanfei-shu moves this to your NVIDIA GPU. Same math, same results, just faster.

How much faster?

RTX 3090 vs Ryzen 9 5950X CPU Pippenger — v0.2.1 installed from crates.io:

Points	CPU	GPU	Speedup
64K (k=16)	516ms	110ms	4.7x
128K (k=17)	961ms	202ms	4.8x
256K (k=18)	1780ms	386ms	4.6x
512K (k=19)	3453ms	1043ms	3.3x
1M (k=20)	6744ms	1776ms	3.8x

Every result is bit-exact — the GPU produces the same answer as the CPU.

No alternatives exist

Library	BN254	BLS12-381	Pallas
ICICLE	Yes	Yes	No
cuZK	Yes	No	No
Blitzar	Yes	Yes	No
hanfei-shu	—	—	Yes

Quick start

# Cargo.toml
[dependencies]
hanfei-shu = { git = "https://github.com/GeoffreyWang1117/hanfei-shu" }

use hanfei_shu::{gpu_best_multiexp, is_gpu_available};
use hanfei_shu::cpu::pippenger_msm; // CPU reference

let gpu = gpu_best_multiexp(&scalars, &bases);
let cpu = pippenger_msm(&scalars, &bases);
assert_eq!(gpu, cpu); // bit-exact

"But where's the GPU code?"

The CUDA kernels are prebuilt compiled libraries, not source code. Same model as NVIDIA's cuBLAS or Intel's MKL.

Why: You don't need CUDA Toolkit installed. cargo add hanfei-shu and it works.

The Rust code is fully open — including a complete CPU Pippenger implementation so you can verify every GPU result yourself.

What's included

GPU MSM
CUDA kernels for A100, RTX 3090, RTX 4090. Auto-detects your GPU.

CPU Pippenger
Pure Rust reference implementation. Works without GPU.

Smart dispatch
Automatically uses GPU for large inputs, CPU for small ones.

5 examples
AI inference demo, CPU vs GPU comparison, full benchmarks.

Part of HanFei 韩非

Crate	Concept	What
hanfei-shu 术	Technique	GPU MSM engine (this crate)
hanfei-shi 势	Power	Full GPU proving pipeline (planned)
hanfei-fa 法	Law	ZK proof framework (planned)

Extracted from ChainProve, a system for verifiable transformer inference with zero-knowledge proofs.