Protocol Labs Research
About
People
Research
Outreach
Blog
2021.3.24 / Blog
CryptoComputeLab announces proofs release 6.1.0

Today we’re proud to announce the recent release of rust-fil-proofs v6.1.0. This release contains a number of significant re-factors and performance optimizations, but we’d like to dig deeper into a couple of them to show some of the real-world impacts.

But first, in the realm of security, this release ships with blst v3.3.0 as an optional BLS12-381 pairing operation backend, which is an alternative to the previous and still currently default pairing backend. Most notably, this specific version of blst has been audited for security and in the future we intend to use it as the default backend for all BLS12-381 pairing operations in proofs. The proofs code actually incorporates blst via blstrs, a Rust library wrapper, and this release updates from v0.1.3 to v0.2.2. More on that below.

The latest rust-fil-proofs v6.1.0 does a number of significant re-organizations at the source code level. Several source directories have been moved out into their own ‘crate’ (the name used for a Rust package), including phase2 and fr32, and all of the source code got a once-over trying to remove so-called glob imports, arguably a sloppy style of importing modules that accrued over time.

For performance, this release increases performance in pairing operations via blstrs, improves parallelism in both PoSt proving and verification and optionally allows access to neptune’s improved OpenCL backend. Benchmarking within the updated blstrs library alone shows optimized performance in two key operations: g1/g2 multiply and fp12 inverse. A performance comparison between the previous version and the latest one is shown below:

Operation Blstrs v0.1.3
(nanoseconds)
Blstrs v0.2.2
(nanoseconds)
Speed-up
G1 multiply 102,865 79,089 1.3x
G2 multiply 252,315 180,283 1.4x
FP12 Inverse 9,314 5,520 1.7x

 

Switching gears, by default, the proofs code uses neptune’s gpu backend, which accelerates the Pre-commit Phase 2 stage of sealing by building merkle trees efficiently on the GPU. This code has served proofs well for some time and is a significant performance leap from building those same trees on modern CPUs. However, the GPU code itself inside of neptune has been updated significantly and now includes a pure OpenCL based implementation to do that same tree building even more efficiently. For the sake of this post, we’ll describe the legacy/default tree building as gpu and the new optional feature as gpu2. With this release, you can use this feature today by compiling rust-fil-proofs with the gpu2 feature. The only reason the new gpu2 feature remains optional at this time is because we would like to see wider testing with it before making it the default – though we don’t expect it to remain optional for too much longer. That said, let’s dive into some comparisons of the two and see what kind of impact we’re looking at!

For the testing hardware, we have an AMD Ryzen 9 3950X 16-core processor (32 threads) with 128GiB RAM and a Nvidia GeForce RTX 2080 Ti Rev. A GPU. For the test setup, we are using the window post benchmark included in the proofs code base for measurements. This release allows us to configure this specific test to skip all of the steps of sealing, and instead isolate the ‘Precommit Phase 2’ stage of sealing, which is primarily the tree building. Enabling this feature allows us to better benchmark GPU work so that we can have an apples-to-apples comparison on the performance.

The proofs team is happy to share that the results of this testing are very promising!

 

The image above summarizes our findings, which take a look at wall clock time for GPU tree building. The tested feature combinations are with blst and gpu enabled, with pairing and gpu enabled, with blst and gpu2 enabled, and finally with pairing and gpu2 enabled. Lower wall clock time is better.

The specific times below are listed for each of the features enabled at the time of the test run. The times recorded are in milliseconds. Both CPU and wall time are shown, but they may vary depending on the specific CPU and GPU used.



Features enabled CPU time
(milliseconds)
Wall time
(milliseconds)
Pairing and gpu 2,647,533 1,281,010
Pairing and gpu2 3,338,394 687,922
Blst and gpu 2,660,596 1,261,724
Blst and gpu2 5,390,836 696,688

 

Another notable improvement included in this release is an updated version of the bellperson library (used for building our zk-SNARK circuits). This update allows the large parameter files required to be loaded in parallel, which can help both proving and verification times.

For the full list of changes, click here or see below!

CryptoComputeLab"
We at CryptoComputeLab welcome further discussion of our research topics, and we're always on the lookout for opportunities to answer questions and develop collaborations. Please reach out via email (research@protocol.ai) to start the conversation! If you’re interested in working with us on issues at the intersection of cryptography, high-performance computing, and programming language design, please contribute to the Open Problems in our Research Repo and check out our Open Positions.