# Most Efficient Way to Digitally Store CubeStates?

#### Chree

##### Member
After a random chain of thoughts I had last night, the question just occured to me: If I wanted to store a digital representation of every single possible cube state, how much space would that take up? This naturally led to the question: what is the most efficient way to store any single cube state in a digital format? I don't think that I came up with "The" answer, but I knocked out a few ideas. I searched the forums for a better answer and found none. I hope y'all can help out.

My first instinct was to just look at a scramble picture like we get from TNoodle, which is just a picture of where the stickers wind up after the scramble. TNoodle spits out the familiar (and easy to read) layout. 6 faces with 9 stickers each in a "T" shape, 54 stickers total. I could just as easily imagine a 1x54 grid of colors, where each segment of the grid is assigned to a particular sticker on the cube. To store that digitally, I can assign each color a value, then create an array or string with the sticker colors encoded in sequence. Since there are 6 possible colors for each sticker, I would need a minimum of 3 bits to say what color was asigned to any one sticker.

6 faces * 9 stickers per face * 3 bits per sticker = 162 bits per cube state

Ok... but technically, I don't need to store the center stickers. I can just as easily write a decoder that assumes the 6 center stickers never change, and we can just ignore it. That removes 1 sticker per face that we need to store.

6 faces * 8 stickers per face * 3 bits per sticker = 138 bits per cube state

To cut it down further, I can make one big assumption: the cube is always solvable. For a solvable cube, if I know the location and values for 11 edges, we know that we can derive the location and orientation of the 12th edge. Same goes for when we know the state of 7 of 8 corners. In my decoder, I could program it to figure out the last pieces and calculate their orientation. This eliminates the need to store the values for 2 edge stickers and 3 corner stickers, 5 total. So...

((6 faces * 8 stickers per face) - 5 stickers) * 3 bits per sticker = 123 bits per cube

Then I decided to take a different approach. What if, instead, only want to code the properties of each piece? There are many fewer total Pieces than Stickers, so it'll probably be more efficient.

Instead we'll just store the permutation and orientation of each piece. As far as the "Array" is concerned, we can imagine the first value stored is the location of the WBO corner. The next value is for WBR, then WGO, etc. The order that you store the pieces is arbitrary, as long as your decoder knows which values are assigned to which pieces. Then we assign each location a value, so UBL is "0", UBR is "1", UFL is "2", etc. Once again, the actual value is arbitrary, we just need 1 unique value for each unique location.

There are 8 corners with 8 possible permutations: 3 bits each.
Each corner has 3 possible orientations: 2 bits each.
12 edges with 12 possible locations: 4 bits each.
And each edge has 2 possible permutations: just 1 more bit each.

(Edit: Just realized that's really just 5 bits per piece)

So:

8 corners * (3 bits per permutation + 2 bits per orientation) + 12 edges * (4 bits per permutation + 1 bit per orientation) = 100 bits per cube state

Quite a jump. And like before, we can ignore the last edge and corner, and just program the decoder to calculate the permutation and orientation based on the state of the rest of the cube.

7 corners * (3 bits per permutation + 2 bits per orientation) + 11 edges * (4 bits per permutation + 1 bit per orientation) = 90 bits per cube state

So I'm down to 90 bits per cube. If I wanted to use this "code" to store every single possible cube state, such that I could program a decoder to pluck out any one of them and give me a visual representation of cube state... store all 4.3 quintillion states would cost 387 quintillion bits... or at 8 bits per bytes, 48ish quintillion bytes, so something like 48 million petabytes.

I haven't even bothered to do the exact calculation on this because I'm positive that there has to be a more efficient way of encoding a cube state.

Last edited:

#### RicardoRix

##### Member
What about assigning a predetermined order, where each order means given cube state. So State 0 would be the solved state. Then you would only need to store the order number between 0 and 43 quintillion. However many bytes that is, 8 I think.

#### AlphaSheep

##### Member
Using Kociemba's coordinates I think you can get it down to one bit more than the limit (66 bits = ceil(log base 2 of 43 quintillion))...
EP is 0..479001599
CP is 0..40319
EO is 0..2047
CO is 0..2186

So EP+(CP*12!)+(EO*12!*8!)+(CO*12!*8!*2^11) will give an integer between 0 and 86 quintillion.

#### obelisk477

##### Member
What about assigning a predetermined order, where each order means given cube state. So State 0 would be the solved state. Then you would only need to store the order number between 0 and 43 quintillion. However many bytes that is, 8 I think.
So you just use the devils algorithm (3x3 hamiltonian circuit), and define each cube as a certain distance into the algorithm. Maybe not quite what OP was looking for....

EDIT: I wonder if there's a program that actually does this in reverse. You submit a cube state, and it tells you how far along in the hamiltonian circuit that the cube occurs.

#### Chree

##### Member
What about assigning a predetermined order, where each order means given cube state. So State 0 would be the solved state. Then you would only need to store the order number between 0 and 43 quintillion. However many bytes that is, 8 I think.
That beats the 90 bit formula from above. What would the data actually look like?

So you just use the devils algorithm (3x3 hamiltonian circuit), and define each cube as a certain distance into the algorithm. Maybe not quite what OP was looking for....

EDIT: I wonder if there's a program that actually does this in reverse. You submit a cube state, and it tells you how far along in the hamiltonian circuit that the cube occurs.
As long as you can assign a small value to a location within the Hamiltonian, then yeah, I think that could work. Although, if the decoder was forced to traverse the Hamiltonian in order to spit out a particular cube state, it would be a bit unfeasible to actually recall any particular cube state. But I feel like I'm don't understand enough about the Hamiltonian to say that that's an actual issue.

As for your second question: that feels worthy of exploration, even if it's doesn't pertain to this thread. That'd be fun.

#### qqwref

##### Member
It's quite doable to just store a number from 0 to 43 quintillion, using something like what AlphaSheep posted (although you can also avoid the extra factor of 2 for parity permutation with some cleverness). Then the problem becomes how to quickly convert that into a more usable cube state. Sadly, it's just a bit bigger than 64 bits.

You mention, though, wanting to store a digital representation of every single possible cube state. But at this point, do you really need to? Your representation is just a list of the numbers 0, 1, 2, 3, ... up to 43 quintillion. So you could store nothing, and then if someone asks for the n'th position, you would just put n into your number-to-cube-position function, and not have to actually look anything up.

#### Chree

##### Member
That certainly negates any real need to store the Cube State, itself. Although, as you point out, it makes recalling any random position problematic.

So I could put a further constraint on the question: The Possibility of Human Recall: "Store the cubestate in such a way that a Human Being can look at the data and easily figure out the cube state".

I'm guessing it'd be sorta tough for a human to see the number 1,405,678,923,345 and be able to tell where the White-Green-Orange corner is positioned. But that's fairly easy to do with the methods I layed out above... and it's especially easy with the last method listed.

That said, you're right: going the Hamiltonian route could render this entire question moot.

#### cuBerBruce

##### Member
So you just use the devils algorithm (3x3 hamiltonian circuit), and define each cube as a certain distance into the algorithm. Maybe not quite what OP was looking for....

EDIT: I wonder if there's a program that actually does this in reverse. You submit a cube state, and it tells you how far along in the hamiltonian circuit that the cube occurs.
I've thought about trying to create a function that would take a cube state and return the index of the position within my Hamiltonian circuit, but have never gotten around to it. It would probably involve some degree of searching, and for that you would have to be using some other way of representing cube states. It would certainly not be as bad as simply iterating the Hamiltonian circuit 1 move at a time. For example, my Hamiltonian circuit has only 2728 places where EO changes. So by determining which cubies are oriented and which aren't, you can obviously restrict the search to only that section or sections where that EO occurs.

But the bottom line is that using an index into a Hamiltonian circuit would be an impractical method for representing cube states.

##### Member
Could you just use BLD memo as a basis for a compression technique?

So for corners you could use the first 3 bits to represent the face you are talking about and the second 2 to indicate which sticker on that face eg 00101 may represent the R face (001) and the top right sticker (01). Then go through all the cycles for corners leaving twisted corners (I'll explain why later) and leaving a null at the end or something that could never come up such as 111 for the face.

Then you could do edges in the same way again using 5 bits each (again leaving flipped edges)

Now, to deal with flipped edges and twisted corners, we can use the fact that they will not have been included in the cycles. We can start from the UBL and number each corner 1-8. If the program can be written to work out which pieces are not affected in the cycles, we can represent all the orientations a corner can be in and proceed through these going UBL UBR DBL DBR etc (obviously only using the ones which don't come up in the cycles). This would take an additional 2 bits per cubie. Leave a null and the same can then be done for edges but they only need 1 bit each.

I would guess that this method would take about 20- bytes of data (or at least 160- bits) though the program decoding it would have to be fairly complex (though not too complex I don't think) and I also sort of dispense with the byte as the standard unit.

I think there would be about 4 nulls, 8 corner targets, 12 edges targets, 2 flipped/twisted pieces.

I know this could be a bit of a headache for a programmer but this is the best way i could think of for compression with efficiency so what does everyone else think?

#### RicardoRix

##### Member
Given there are 43 quintillion states to distinguish uniquely, then there is no way of compressing the data any smaller than that. I got the original 8 bytes calculation wrong, it's actually 9 bytes.

"Store the cubestate in such a way that a Human Being can look at the data and easily figure out the cube state".
That's not what you originally asked for, but considering the above statement and to make the data 'readable' then you have to introduce some inefficiencies in the data storage, something like you originally described I would guess, or some slightly clever idea like AlphaSheep\shadowslice's suggestion or possible a compressed version of the 20 move scramble (20 moves * 4bits = 10bytes (would need 4 bits for 0-12 states for all the different moves R,R',B,B',etc.)).

But the real question to this statement should be 'why do you want it readable straight from the DB or screen'? Whenever you need to do this, the program reads the value, then you write a function to translate the stored value to cube state description or an image or a scramble or whatever it is you consider human readable.

#### Chree

##### Member
But the real question to this statement should be 'why do you want it readable straight from the DB or screen'? Whenever you need to do this, the program reads the value, then you write a function to translate the stored value to cube state description or an image or a scramble or whatever it is you consider human readable.
Oh, it doesn't need to be that way at all. I was just trying to bounce us off the Hamiltonian path... which Bruce wound up doing better than I could. It was more 'food for thought' than an actual request. And it's not like this is a question for practical application or anything, anyway. I'm just curious to see what else people can come up with.

For instance, the 90bit cube state method I described could be readable by humans. But if there was a way to compress or reencode it further to take up less space, maybe then only a program could read it.

I rather liked shadowslice's BLD code idea. Although there are tons of cube states that would introduce cycle breaks, flipped/twisted pieces, or parity... there'd also be lots of states with solved pieces, too, and those would take up less space. That might warrant some exploration to see if the savings outweight the costs.

##### Member
Well, I did some thinking and I think that the BLD suggestion can store any cube state in <15 bytes (each sticker takes 4 bits of memory to store);

This will only occur if you have the maximum number of 2 swaps and hence cycle breaks on a cube.

Number of corner targets=10 (40bits)
Null (4/44)
Number of edge targets=16 (64/108)
Null (4/112)
No twisted corners (0/112)
Null(may not be needed as the program could work out there are no twisted corners) (4/116)
No flipped edges (0/116)
Null (same as above) (4/120)

So the whole cube could be stored in 120/8 =15 bytes or perhaps less if the nulls are eliminated (so would be 14 bytes)

And again, this is the worse case that I can think of so I think this could be quite a good method of compression though the program may not be very nice to program. It's also quite nice because you don't need a large database for each cubestate.

#### SenorJuan

##### Member
I came up with a lower bit-count solution, just a modification of your edges & corners strategy:
7 corners x 3 bits (for the 8 positions) = 21 bits, deduce the 8th position by what the other 7 are.
Consider the orientation of the corners. For one face, eg the D face, there's 81 orientations (3x3x3x3), which could be defined with a 7-bit value. For the opposite U face, you only need to specify 3 corners orientation, so (3x3x3) = 27 possibilities = a 5-bit value. The final corner is deduced from what's already specified. So that's 21+7+5 = 33 bits for corners.
Edges can have two orientations, only 11 need specifying, so 11 bits are needed. Now for positions, it seems wasteful to use a 4 bit number for 12 cases. (12 x 12 x 12) can be represented with 11 bits. So this 'compression' can be used three times for the first 9 edges. The final 2 can be done with 8 bits (4 bits per, as in the simple case). Total = 11 + 11 + 11 + 8 = 41 bits. So edges total = 41 + 11 = 52 bits.

Edit: If you compressed 5 edge positions, you would need (12 x 12 x 12 x 12 x 12) states, which could be represented by an 18-bit number. So encoding as (5-edges + 3-edges + 3 edges) would use 18 + 11 + 11 = 40 bits. It would need a 256K x 20-bit lookup table, though.

This totals 85 bits. (or 84 with the 5+3+3 edge position encoding)
It does need modest look-up tables to do the 'decompression'. Eg the 11-bit to 12-bit converter is a whopping 3 kilobytes. You may need a similar 'compression' table, for speed.

I can't help think the 'factorial' nature of specifying positions offers scope for bit-saving. Ie. once you've stated where the first corner is, there's then only 7 possible locations for the next one, and after that, only 6 possibilities for the 3rd corner etc etc.

Last edited:

#### Chree

##### Member
That's pretty awesome! I remember thinking that 4 bits for EP was wasteful and wondered how 'compression' of this sort could be implemented. Nice work!

And don't worry about the space taken up by a lookup table. You just saved over 2 trillion gigabytes of overall storage #### mark49152

##### Super Moderator
Staff member
Can you get away with storing only 10 edge positions? Both the remaining 2 can be deduced by avoiding parity. Total would be reduced to 80 bits if grouping the edges as 5+5.

Last edited:

#### ch_ts

##### Member
I think that the most straightforward way is just encode EP, EO, CP, and CO separately then concatenate them all together:
EP from 0 to 12!-1, use 29 bits
EO from 0 to 2^11-1, use 11 bits
CP from 0 to 8!-1, use 16 bits
CO from 0 to 3^7-1, use 12 bits
Total of 68 bits.

For permutations, that would mean something like this (let's use 8 pieces as an example):
0,1,2,3,4,5,6,7 -> 0 -> 0000000000000000
0,1,2,3,4,5,7,6 -> 1 -> 0000000000000001
0,1,2,3,4,6,5,7 -> 2 -> 0000000000000010
0,1,2,3,4,6,7,5 -> 3
...
7,6,5,4,3,2,1,0 -> 40319 -> 1001110101111111

Edge orientation: use 0 if oriented, 1 if flipped:
something like 00100101110
(Deduce orientation of 12th edge from the other 11 so it's not necessary to include it)

For corners orientation: 0 if correct, 1 if twisted CW, 2 if twisted CCW:
something like 1002201
Get a base 3 number, convert to binary.
(Deduce orientation of 8th corner from the other 7 so it's not necessary to include it)

http://www.jaapsch.net/puzzles/compindx.htm

#### Chree

##### Member
Can you get away with storing only 10 edge positions? Both the remaining 2 can be deduced by avoiding parity. Total would be reduced to 80 bits if grouping the edges as 5+5.
Is this true? It seems true. Is that true? I mean... orientation is independent, and would still need to be stored... but permutation. Huh.

I think that the most straightforward way is just encode EP, EO, CP, and CO separately then concatenate them all together:
EP from 0 to 12!-1, use 29 bits
EO from 0 to 2^11-1, use 11 bits
CP from 0 to 8!-1, use 16 bits
CO from 0 to 3^7-1, use 12 bits
Total of 68 bits.

...

http://www.jaapsch.net/puzzles/compindx.htm
*bows*

Last edited:

#### nvpendsey

##### Member
How's this for CP

1st corner has 8 positions so 3 bits 2nd corner has 7 positions so 3 bits
3rd corner has 6 positions so 3 bits 4th corner has 5 positions so 3 bits
5th corner has 4 positions so 2 bits 6th corner has 3 positions so 2 bits
7th corner has 2 positions so 1 bits 1st corner has 1 positions so 0 bits

Total 17 bits.

(This might be wrong but I don't know anything about how this works but just had this idea so I posted it)

##### Member
That seems like the best shot overall. What does the database look like?

I suppose it would be relatively easy to write the code which could generate the numbers and work out what they mean though. Just wondering when this would become better than the BLD thing I posted above which averaged something like 10-12 bytes but seemed to have less in the database.

Also, I think we can. Now really see the power of collaboration reducing the initial total by over a factor of 10.

#### SenorJuan

##### Member
An alternative approach:
If you only need 20 moves to reach any state, and there are (6 x 3) = 18 possible ways to make a turn, then you would need theoretically 4.17 bits x 20 = 83.4 bits.
Edit: see later post: only 81 bits are needed.
This also brings up the fact that 20 moves is the max, you may need only 15, say, so less bits would be needed for these cases. Which makes me think that rather than the rigid '85 bits' or whatever, a flexible strategy where some states may only need 50 bits to describe them should give a lower average bit-count, assuming you were wanting to store lots of states.

I was also contemplating if you could somehow specify the position of (say) a corner in a relative way, rather than an absolute way, ie. after specifying the first corner, state what the next corner is from the remaining 7 choices, as clearly you can't have two pieces in the same location. In theory this may trim off 1 bit from corner positions, and 1 from edge positions?

An edge position 'test' run:
1st edge: 4 bits...wasteful
2nd/3rd edges: 11 x 10 =110 states, this can be stored as a 7-bit value.
4th/5th, 6th/7th, 8th/9th edges: as above.
10th edge: 4 bits..wasteful
11th edge: as Mark R. pointed out, not necessary to specify it.
Total 4 + 4 x 7 + 4 = 36 bits.
It's the same total as the [(5 edges + 5 edges) = 18 + 18 bits] result, but with tiny, rather than massive, look-up tables.

Something tells me the 1st edge and 10th edge should combine into a 7 bit value if you're able to specify any of the last 3 edges as your 10th one, ie. 12 x 10 states, =120, = 7 bits. (this assumes that if you can specify 10 out of the 11 positions, and there are 3 edges still unspecified, then you're going to 'hit' one.)
Combining 1st edge, 2nd/3rd edge and 11th edge gives a total no. of states that can be represented with 14 bits. That would give a total of 14 + 3 x 7 = 35 bits. Modest size look-up table needed.

Last edited: