I decided to take a crack at writing my own 4x4x4 solving program. If anyone else has started on such a program, or if one is already in existence, please let me know if I am re-inventing the wheel. I am taking some of my chess code and adapting it for a fast bit-move generator that should outperform matrix calculations.

For example, you can replace 4 separate matrix swaps with bit-AND/bit-OR operations that should execute much more quickly.

U+ keeps the down face the same, and F[1]=R[1], F[2]=R[2], F[3]=R[3], and F[4]=R[4].

Likewise R[n]=B[N] for N = 1 to 4, and B[N]=L[N] and L[N]= the original F[N], so you need to "copy" the 4 front cubes before swapping them to the right.

There is a mathematical/coding trick to do all of these "swaps" with just one operation.

You lay out the 16 cubes per face as bits such as 0000 for cube 1, which means you have 2 cubes/byte.

The upper 4 bits = cube 1, the lower 4 bits = cube 2, etc.

You need 8 bytes, a single long long integer, to store a cube face.

So the new F face after a U+ becomes f_face_variable = (f_face_variable & FFFF000000000000) | (r_face_variable & 0000FFFFFFFFFFFFFFFF);

You replace 4 swaps with only one statement that operates on two 8-byte variables and two 8-byte hexadecimal constants. It's very fast.

When you move just the f slice clockwise (by itself) the math is f_slice_variable = (f_slice_variable & 0000FFFF00000000) | (r_slice_variable & FFFF0000FFFFFFFF);

I am using procedure pointers to speed up the solve by eliminating all of the IF branchpoints that could result.

I just encode the moves as triplets, like F+, F-, and F2, then I make sure no subsequent move would undo the most recent move, or add to it in the same fashion.

So the program won't do F+ F- and think that is a valid "depth 2" position. Nor will it do F2 F+ or F2 F- back to back.

It will allow F+ T+ F+ though, as it should, etc.

I still have to code the notation part. I entered a random 10-move scramble and it found the answer pretty quickly.

I just had no way to see the answer