• Welcome to the Speedsolving.com, home of the web's largest puzzle community!
    You are currently viewing our forum as a guest which gives you limited access to join discussions and access our other features.

    Registration is fast, simple and absolutely free so please, join our community of 35,000+ people from around the world today!

    If you are already a member, simply login to hide this message and begin participating in the community!

vcube optimal solver


May 12, 2018
I have just released the source code for vcube, a fast optimal Rubik's Cube solver. It is a rewrite of one of the solvers behind my "optimal scrambler" that I posted about several months ago. The code, which is licensed GPLv3, is available on Github at:


This new version borrows heavily from Tomas Rokicki's nxopt solver, using his pruning table design which is superior to the 1.6-bit format that I used before, and incorporates many of his optimizations.

From my old solver, it inherits a SIMD optimized cube model which takes advantage of the AVX2 instruction set introduced with Intel's Haswell microarchitecture. One feat it's capable of is it can compose (multiply) two cube positions in 5 CPU instructions -- an edges-only cube requires only 3 instructions.

On my 32GB i7-7700K (4.20 GHz, 4 core, 8 thread) desktop, using a 22GB pruning table, vcube is able to solve random cube positions at a rate of 6.0 cubes/second. This is an improvement over nxopt (modified to support 1GB huge pages), which I measured at 3.8 cubes/second on the same hardware and data set.

Some additional speed tests, run on Linode virtual servers were:

64GB, E5-2680 v3 @ 2.50 GHz, 16 core virtual server:
- vcube with 32GB pruning table: 6.9 cubes/second
- vcube with 58GB pruning table: 15.1 cubes/second

192GB, E5-2697 v4 @ 2.30GHz, 32 core virtual server:
- vcube with 170GB pruning table: 55.0 cubes/second
- nxopt with 170GB pruning table: 49.2 cubes/second

I also ran a series of tests of the 22GB pruning table on the E5-2697 v4 @ 2.30GHz virtual server at varying concurrency levels, to get an idea of how that hardware scales, and how it compares to my desktop.
- 32 threads: 9.8 cubes/second
- 16 threads: 6.4 cubes/second
- 8 threads: 3.5 cubes/second
- 4 threads: 1.8 cubes/second

The results for my desktop were 6.0 cubes/second at 8 threads, and 3.9 cubes/second at 4 threads. Much of the speed difference can be attributed to the widely differing CPU frequencies. Also, certain tuning I was unable to perform on the virtual servers. If I get the opportunity, it would be interesting to test the larger tables on a physical server.