# Announcing: New 4x4x4 Brute Force Solver

Status
Not open for further replies.

#### unsolved

##### Member
Corners will help prune the *search* a lot. You spend most of the time
in any standard IDA* search right at the outer limits of your pruning table.
Your search tree includes many move sequences that disturb the corners.
Let's say you are eight moves into a search, with a target depth of 15,
and the table says you are eight moves away from a corner solution. You
know then that you can prune your search immediately and backtrack.
No need to consider any further subnodes.
Originally I thought you meant to scan a database of corners-only to aid in the search for solutions where corners were unsolved in the original cube. I think I understand what you really meant now. So, what you do is, generate face moves only from the cube from a solved position, store the corner orientations along with how far they are from the solved position, and these are like "roadmaps" to your distance from being solved. Is that correct?

I would think at some point, you would duplicate a configuration, even though you are a different distance from the solution.

Or did someone already map every corner config possible to a "turns-from-solved" number?

#### cuBerBruce

##### Member
Originally I thought you meant to scan a database of corners-only to aid in the search for solutions where corners were unsolved in the original cube. I think I understand what you really meant now. So, what you do is, generate face moves only from the cube from a solved position, store the corner orientations along with how far they are from the solved position, and these are like "roadmaps" to your distance from being solved. Is that correct?
I would say "roadmap" is not the right analogy. The corner "distance-from-solved" database will give you a LOWER BOUND on the number of moves needed to solve the entire 4x4x4 position. This often allows aborting a search path before reaching the leaf nodes for your current search depth.

Such a database is usually created using a breadth-first search. This is a different type of search from a depth-first search. This is all talked about in Jaap's Computer Puzzling page I referenced earlier.

I would think at some point, you would duplicate a configuration, even though you are a different distance from the solution.
That's not important. What is important is that you get a true LOWER BOUND from how far you are from the solved state.

Or did someone already map every corner config possible to a "turns-from-solved" number?
Yes, it's been done a long time ago. I note that my optimal 4x4x4 solver, which I wrote in 2006, uses such a corner database. But such a breadth-first search had been done long before that.

#### Stefan

##### Member
Or did someone already map every corner config possible to a "turns-from-solved" number?
Korf, 1997 is the earliest I know (and quite a nice paper, I think).

#### cuBerBruce

##### Member
Or did someone already map every corner config possible to a "turns-from-solved" number?
Korf, 1997 is the earliest I know (and quite a nice paper, I think).
If you just care about solving corners with relative to each other instead of with respect to centers (i.e. 2x2x2 cube), it's been done as early as Sept. 9, 1981 (according to Mark Longridge's page).

Last edited:

#### Stefan

##### Member
Ah yes, I should have thought of 2x2, but only thought of pruning tables for something larger.

#### qqwref

##### Member
I know you have a bunch of experience already, in programming and computation-intensive code, but please listen! Some of the people posting (e.g. rokicki and cuBerBruce) have an amazing amount of experience in programming cube solvers, and the advice they are giving is definitely helpful. When you don't understand something, make sure not to just assume it's unhelpful - do some research, or ask questions, and try to see what the advice means. It may save you a lot of work later on.

I understand now. Speed cubers solve the corners last. In my lame way of solving, I do them much, much earlier. I solve the centers and dedges last.
As far as I can tell, you are doing something to prune the heck out of the tree, whereas I only have the filtering mechanism suggested by Jakube implemented (plus the tfs databases).
I am guessing by cubie you mean the 3x3x3? There are so many solvers out there for that now, I am letting them enjoy their peace

I want to concentrate on 4x4x4 for now

Anyway, you ARE making some good progress. I just wish you'd give your variables shorter names, so the code wouldn't be bloated with the same thing written several times (like original_4x4x4_cube.cube_right[0])

#### unsolved

##### Member
I know you have a bunch of experience already, in programming and computation-intensive code, but please listen! Some of the people posting (e.g. rokicki and cuBerBruce) have an amazing amount of experience in programming cube solvers, and the advice they are giving is definitely helpful. When you don't understand something, make sure not to just assume it's unhelpful - do some research, or ask questions, and try to see what the advice means. It may save you a lot of work later on.
Yes, I bow to their expertise and I readily acknowledge I am the noob of all noobs, I have no problem admitting that, at all. Please don't mistake my exuberance for anything but the extreme joy I found in programming a 4x4x4 solver. I had no idea how in the dark I was until I saw Bruce's program prune the heck out of the search tree like a hot knife through butter

It is possible to have a fair amount of programming experience and be a total dork when it comes to cubing. Dork #1, right here <------. I do read the posts/links/web pages and PDFs, but that does not necessarily translate into a complete understanding. That is not to say I don't make some underlining and often times incorrect conjectures about what I am reading. Please understand I am weak on the nomenclature, and some of what I am reading is not crystal clear to me. Not by a longshot.

When I do grasp something, I do have the innate ability to fully delve into it, and, at times, even raise the bar. I am not yet at the fully-grasping phase yet. I do understand the concept of indexing and pruning and searching, just not as it applies to cubing. To illustrate this, here is a fairly complex indexing function for the game of checkers where one side has 3 kings and 1 checker and the other side has 2 kings and 2 checkers. This function will magically enumerate from 0 to n-1, where n is the total number of possible arrangements, without duplicating an index, nor skipping any ranges.

Code:
/*********************************************************************/
/*                                                                   */
/* General Indexing Function Combinatoric Macros                     */
/* ---------------------------------------------                     */
/*                                                                   */
/* Given there are N identical pieces of the same color, how many    */
/* ways can they be arranged uniquely on a game board, independent   */
/* of the size of the board?                                         */
/*                                                                   */
/* The answer involves producing a polynomial expansion of the form  */
/*                                                                   */
/* a + (b-1)(b-2)/2! + (c-1)(c-2)(c-3)/3! + .. (z-1)(z-2)..(z-N)/N!  */
/*                                                                   */
/*********************************************************************/

#define _2_same_pieces_subindex(a,b) 		 ((a) + ((((b)-1)*((b)-2))/2))
#define _3_same_pieces_subindex(a,b,c)		 ((a) + ((((b)-1)*((b)-2))/2) + ((((c)-1)*((c)-2)*((c)-3))/6))
#define _4_same_pieces_subindex(a,b,c,d)	 ((a) + ((((b)-1)*((b)-2))/2) + ((((c)-1)*((c)-2)*((c)-3))/6) + ((((d)-1)*((d)-2)*((d)-3)*((d)-4))/24))
#define _5_same_pieces_subindex(a,b,c,d,e)	 ((a) + ((((b)-1)*((b)-2))/2) + ((((c)-1)*((c)-2)*((c)-3))/6) + ((((d)-1)*((d)-2)*((d)-3)*((d)-4))/24) + ((((e)-1)*((e)-2)*((e)-3)*((e)-4)*((e)-5))/120))

unsigned long long get_08_piece_index_3K1C_AGAINST_2k2c(unsigned char wk1, unsigned char wk2, unsigned char wk3, unsigned char wc1, unsigned char rk1, unsigned char rk2, unsigned char rc1, unsigned char rc2)
{
unsigned long long index_function_value;
unsigned long long index_part_1, index_part_2;
unsigned long combined_checker_contribution_to_index, index_part_3;

index_function_value = 0;

/*********************************************/
/* order of placement on the board: wrrWWWRR */
/*********************************************/

if(wk1 > wc1)
if(wk1 > rc1)
if(wk1 > rc2)

/**********************/

if(wk2 > wc1)
if(wk2 > rc1)
if(wk2 > rc2)

/**********************/

if(wk3 > wc1)
if(wk3 > rc1)
if(wk3 > rc2)

/**********************/
/**********************/

if(rk1 > wc1)
if(rk1 > rc1)
if(rk1 > rc2)

if(rk1 > wk1)
if(rk1 > wk2)
if(rk1 > wk3)

/**********************/

if(rk2 > wc1)
if(rk2 > rc1)
if(rk2 > rc2)

if(rk2 > wk1)
if(rk2 > wk2)
if(rk2 > wk3)

/**********************/

/*************************************************************************************************/
/*                                                                                               */
/*  February 12, 2009                                                                     */
/*                                                                                               */
/* Completely changed this indexing function. First, the order in which data is passed in now    */
/* matches the CONSTANT identifying the slice: k_WWWwRRrr -- this is now standard for all of the */
/* indexing functions. Second, the all checkers indexing function for 3 pieces is used. Why not  */
/* use what is already available? Third, the colors were reversed to match the spec.             */
/*                                                                                               */
/* July 11, 2001                                                                            */
/*                                                                                               */
/* With one black checker in the king row, count from 1-378 for each of the 4 squares.           */
/* The white checkers can reside on 28*27/2  = 378 squares, so the equations is of the form:     */
/*                                                                                               */
/* (rc1-1)* 378 + _2_same_pieces_subindex(wc1, wc2)                                               */
/*                                                                                               */
/* When the black checker has moved to square 5 or greater, then you add 378*4 (1512) to the     */
/* count and determine which of the 27*26/2 remaining combination of squares the white checkers  */
/* are on. With 27*26/2 = 351, the equation looks like...                                        */
/*                                                                                               */
/* (rc1-5)* 351 + _2_same_pieces_subindex(wc1, wc2) + 1512                                        */
/*                                                                                               */
/* This technique treats the black and white checkers as one "sub-index", rather than separate   */
/* components. We count from 1 to (4*378 + 24*351) = 9936, then pass this as the "Q" to          */
/* the  precompiled macro. This "Q" is a substitue for "rc1" and "wc1" and "wc2".                 */
/*                                                                                               */
/*  September 4, 2010                                                                         */
/*                                                                                               */
/* Changed the indexing function again, hopefully for the last time. Dealing with that wonderful */
/* Microsoft Compiler bug where intermediate calculations exceeding 2^31 are always treated as a */
/* signed 32-bit result, which invariably resulted in a negative number, which invariably would  */
/* decrease the index when an increase was expected, and, consequently, duplicate indices would  */
/* always be produced. It's a good thing I decided to re-test the indexing functions!!           */
/*                                                                                               */
/*************************************************************************************************/

combined_checker_contribution_to_index = 1 + get_03_piece_index_0K2C_AGAINST_0k1c(33-rc2, 33-rc1, 33-wc1);

/*************************************************************************************************************/
/*                                                                                                           */
/* September 4, 2010                                                                                */
/*                                                                                                           */
/* Gradually converting more GOOD, WORKING indexing functions over to this new format. See my comments from  */
/* August 26, 2010 for a full explanation. Thanks to Billy Gates for more wasted time.                       */
/*                                                                                                           */
/*************************************************************************************************************/

index_part_1 = (unsigned long long)(1187550) * (unsigned long)(combined_checker_contribution_to_index - 1);

index_function_value = (unsigned long long)(index_part_1 + index_part_2 + index_part_3);

return (index_function_value - 1);
}
So, I am thinking about how to index just the corners, and just the centers. I'd like to pre-solve turns-from-solution (tfs) for the corners, and all of the centers that are reasonable (1 missing per side, 2 per side, maybe no more).

If I could create such an indexing function, there is no need to store positions at all, just transform a position to an index, and write to that index during the solve, and read that many bytes into the file when probing.

The problem is: Creating the indexing function!

Checkers and chess, I had no problem with. Flat board, easy to enumerate the state space. Corners, ugh! 3D. Orientation variations. I am not sure how to map a turned vs. unturned cube. Also, there are some turns that are illegal positions, such as just one corner being flipped, etc., so I am not sure about how to go about it.

What I am doing: Sharing all of the portions of my code that would be of interest, in hopes that some of the guys who did write some easy-to-read code might be interested in doing the same. If I could get a lean/mean version of a program running, some of the hardware I have access to might crank out some very interesting finds!

The pillars of civilization were built upon ideas shared and improved upon over vast spans of time. Who knows, we might brainstorm some killer code through cooperation.

Korf, 1997 is the earliest I know (and quite a nice paper, I think).
Yes, very nice. Some code would have been even better.

Interestingly, the paper mentions the work of Dr. Jonathan Schaeffer. He cited one of the papers I wrote in two of his own papers (on checkers) and he was nice enough to thank me (and some others, of course) on his website after he solved the game of checkers.

I would say "roadmap" is not the right analogy. The corner "distance-from-solved" database will give you a LOWER BOUND on the number of moves needed to solve the entire 4x4x4 position. This often allows aborting a search path before reaching the leaf nodes for your current search depth.

Such a database is usually created using a breadth-first search. This is a different type of search from a depth-first search. This is all talked about in Jaap's Computer Puzzling page I referenced earlier.
If you want to see some really cool endgame databases used in run-time during a search, check out my 80-square chess program,
which can announce a mate in 268 moves in the endgame of King + Queen + 1 Pawn vs. King + Queen. I added 2 pieces to the board,
one that can move like a Knight + Rook, which I called the Chancellor. The other moves like a Knigh + Bishop = Archbishop.

The King + Archbishop vs. King has a longest win of 21 moves I think, and it is really hard to administer for the side
with the archbishop.

Anyway, if you like chess, this program will surprise you with some really strong moves on this 80-square board.

It's about 64 MB compressed with all of the 3-piece and some of the 4-piece endings already included.
A menu item will load the position for some of the longest win configurations.

lightningcloudcomputing.com/vortex.zip

I The corner "distance-from-solved" database will give you a LOWER BOUND on the number of moves...
Ah ha! Lighbulb moment! So if you are at a depth of 3 plies executing a 10 ply search, you are 7 plies from your terminal node, you can use the INEXACT information in the corner-tfs to basically check your own corner arrangement, and discard all of those that definitely can't be solved with a lower bound of 7 turns. Nice! You see, I am "too used to" the EXACT distance-to-win data in the databases I have experience with, so this experience was working against me!

So is there an indexing function that generates only one instance of each of the 8! * 3^7 corner configurations? Or would that just be the IDA* move generator with only face turns?

Last edited:

#### Herbert Kociemba

##### Member
So is there an indexing function that generates only one instance of each of the 8! * 3^7 corner configurations? Or would that just be the IDA* move generator with only face turns?
Since all combinations of orientations and permutations happen, you can use for example permIndex(corners)*2048+oriIndex(corners) for this index.
Examples for permIndex and oriIndex are here: http://kociemba.org/math/coordlevel.htm

Last edited:

#### unsolved

##### Member
Since all combinations of orientations and permutations happen, you can use for example permIndex(corners)*2048+oriIndex(corners) for this index.
Examples for permIndex and oriIndex are here: http://kociemba.org/math/coordlevel.htm
Thank you

And where can one see the definitions of things like Pred(BR) and PEdge^[ed].o?

#### Herbert Kociemba

##### Member
Thank you

And where can one see the definitions of things like Pred(BR) and PEdge^[ed].o?
Sorry, I thought more about the verbal explanations given (ternary number system, variable base number system etc.) and less about the code snippets when I gave the link. But no problem, PEdge^[ed].o ist just the orientation of the edge ed, and Pred(BR) is the predecessor of the - in my program - "last" of the 12 edges BR. So in the loop the last edge BR is ignored, because its orientation is forced by the other 11 edges.

#### unsolved

##### Member
Sorry, I thought more about the verbal explanations given (ternary number system, variable base number system etc.) and less about the code snippets when I gave the link. But no problem, PEdge^[ed].o ist just the orientation of the edge ed, and Pred(BR) is the predecessor of the - in my program - "last" of the 12 edges BR. So in the loop the last edge BR is ignored, because its orientation is forced by the other 11 edges.
Just a question: Why not post code people can actually compile to check it out? I would never have figured out what you meant there, in like a million years.

#### cuBerBruce

##### Member
Earlier in this thread, rokicki suggested a pruning table for two adjacent corners, the two edges in between, and the four center pieces of one of the faces sharing those edges. I have implemented such a pruning table in my solver. The distance distribution is given below for both single slice turn metric and block turn metric. One of the two corners is regarded as a fixed reference cubie with respect to which the other cubies must be solved.
Code:
Single slice turn metric:
dist  0: count =          1 total          1
dist  1: count =         21 total         22
dist  2: count =        361 total        383
dist  3: count =       6256 total       6639
dist  4: count =      97107 total     103746
dist  5: count =    1225834 total    1329580
dist  6: count =   10781087 total   12110667
dist  7: count =   48898689 total   61009356
dist  8: count =   58361514 total  119370870
dist  9: count =    3805702 total  123176572
dist 10: count =         20 total  123176592

Block turn metric:
dist  0: count =          1 total          1
dist  1: count =         27 total         28
dist  2: count =        556 total        584
dist  3: count =      12320 total      12904
dist  4: count =     234483 total     247387
dist  5: count =    3329496 total    3576883
dist  6: count =   27983342 total   31560225
dist  7: count =   75468139 total  107028364
dist  8: count =   16143357 total  123171721
dist  9: count =       4871 total  123176592

#### unsolved

##### Member
Earlier in this thread, rokicki suggested a pruning table for two adjacent corners, the two edges in between, and the four center pieces of one of the faces sharing those edges.
I have decided to focus on what I am at good at for now. I don't understand how to create an indexing function for the 2x2x2 subset of corners in order to build such a pruning table. I did, however, come up with a "magic bitboard" move generator which is over 20 times faster than my older code, and then I parallelized this code to run on the multicore servers here at the office. Running on a Dual E5-2697 (24 cores total, 48 with hyperthreading) it averages 12 billion 200 million nodes/second.

My move generator for U' is reduced to this:

Code:
// 3 lines of code to handle the 64 bits that wrap around 4 faces of the cube at once

preserve_left_side = x_axis[0] & 0x000F000F000F000F;
x_axis[0] >>= 4;
x_axis[0] |= (preserve_left_side << 12);

t_ rotation[0] = top_rotation_bits & 0x1111;
t_ rotation[1] = top_rotation_bits & 0x2222;
t_ rotation[2] = top_rotation_bits & 0x4444;
t_ rotation[3] = top_rotation_bits & 0x8888;

//a precomputed 16-bit rotation lookup table: pass in a 16-bit pattern as an index, get a counterclockwise rotated bit pattern back

new_rotated_top_bits = GLOBAL_COUNTER_CLOCKWISE_BIT_ROTATIONS[t_ rotation[0] | t_ rotation[1] | t_ rotation[2] | t_ rotation[3]];

x_axis[0] |= (new_rotated_top_bits >> TOP_BITBOARD_SHIFT_OFFSET) & SLAP_NEW_TOP_BITBOARD_BITS_IN_PLACE_BITMASK;

//shortened the variable names by request :)
My code for handling t slice moves is ridiculously small, 3 lines of code, since I don't have to worry about rotating 16 cubes of a face:

Code:
preserve_left_side = x_axis[1] & 0x000F000F000F000F;
x_axis[1] >>= 4;
x_axis[1] |= (preserve_left_side << 12);
Using bit patterns not only speeds up the move generator, the evaluation routine to determine if the cube is solved, or ANY cube configuration, is just 8 short comparisons.

Code:
solved = (x_axis[0] == TOP_RING_SOLVED) && (x_axis[1] == TOP_SLICE_SOLVED) && (x_axis[2] == BOTTOM_SLICE_SOLVED) && (x_axis[3] == BOTTOM_RING_SOLVED) &&\
(y_axis[0] == BACK_RING_SOLVED) && (y_axis[1] == BACK_SLICE_SOLVED) && (y_axis[2] == FRONT_SLICE_SOLVED) && (y_axis[3] == FRONT_RING_SOLVED);
In this case, the "TOP" variables are everything but the top and bottom. The front, right, back, and left faces wrap about the cube in the x-direction. 4 bits per face x 4 faces x 4 rows around.
The "BACK" variables are 90 degrees separated, looping from the left to the top to the right to the bottom. There are 8 faces instead of 6, since I stratify horizontal and vertical moves. All of the turns and moves along the z-axis are precomputed in terms of x- and y-, then transformed to reflect the correct bitboards for updating. It was easier this way.

The beauty of this approach is that I can add ANY position with a known optimal solve distance to the evaluation routine with literally no impact on the search speed. And, if the position is hit during a probe, it returns optimal distance-to-solve back down to the game tree. If the distance is "N" from a leaf node, I now know there is a win from that leaf node. If a shorter distance-to-solve is not found, then striving for that position is the best way to go.

http://www.lightningcloudcomputing.com/depth%209.jpg

I also solved the 7-turn-from-solution database with its 18,032,462,352 positions and came up with a new scheme how to index those positions in a RAM buffer. So, from depth-1, I actually complete an 8-ply search, depth-2 = 9 plies, etc.

I imagine if I ever figure out how to implement a pruning table like you have, I could probably do optimal solutions for 18-20 plies.

I'm afraid that's all the code I am going to share until other people reciprocate. I've given quite a few examples of ways to speed up your own code if you want to.

Last edited:

#### rokicki

##### Member
Wow, very quick work, Bruce! Thank you very much. If I can get a couple of
smaller things I'm working on done I hope to verify these numbers.

So the distance isn't as high as I was hoping. The tables are pretty small, and
the pruning table can be applied many times to a given position, but the distance
just isn't very high. Indeed, in the block turn metric it's almost depressing how
low it is. I guess that's just the high-ply factor at work.

If your code is general enough it might be illuminating to try it with some scattered
cubies (I expect the average distance to fall somewhat), and/or perhaps with
edges-only or centers-only up to some reasonable count of cubies. My earliest
explorations of computer cubing were just such explorations on the 3x3x3.

#### cuBerBruce

##### Member
I have decided to focus on what I am at good at for now. I don't understand how to create an indexing function for the 2x2x2 subset of corners in order to build such a pruning table.
Well, your use of a facelet-level representation does make it a little more involved to calculate an index for these types of cubie-based pruning tables. The cost of calculating pruning table indices could be large compared to the time required to apply a move.

Using bit patterns not only speeds up the move generator, the evaluation routine to determine if the cube is solved, or ANY cube configuration, is just 8 short comparisons.
Your solved state check seems to assume the orientation of the puzzle matters. (And by the way, it looks like you have some "=" operations that should be "==".) This would seem to require locating a particular corner cubie, then applying the particular whole cube rotation that results in putting that corner cubie in its standard position. And then applying your check to see if the resulting cube state is equal to the standard solved state. Or else you could define 24 solved states, one for each possible orientation of the cube, and check if the candidate position matches one of these 24 solved states.

I also solved the 7-turn-from-solution database with its 18,032,462,352 positions and came up with a new scheme how to index those positions in a RAM buffer. So, from depth-1, I actually complete an 8-ply search, depth-2 = 9 plies, etc.
Your depth-5 or depth-7 database can be thought of as a "perfect" pruning table. It's perfect in the sense that instead a lower bound for the distance from solved, it gives the exact distance. The drawbacks of such a pruning table is poor pruning depth and high memory consumption. Storing 18 billion positions in RAM? That would seem to require the better part of a terabyte at 48 bytes/position without using symmetry reduction, and still a lot with symmetry reduction.

In a 3x3x3 solver, I've experimented with using a hash table to store all positions for up to 8 moves from solved. I used symmetry to significantly the number of table entries. Still, I needed other pruning functions for higher pruning depth to make the solver compete with existing optimal solvers. And for homing in on the solved state once your close, a small pruning table can probably do almost as good as a perfect pruning table. It seems like "perfect pruning" does not provide the best tradeoff between performance and memory usage.

Another way the moves-from-solved hash table could be used on the 3x3x3 is checking if your sequence of moves already made in your current search branch is an optimal move sequence or not. If you detect that it is non-optimal, there is no way that some extension of the sequence will be optimal, so no point in pursuing the branch any farther. For the 4x4x4, however, you would need to use a supercube moves-from-solved table, which is not the same as a regular 4x4x4 moves-from-solved table. There is also the issue of whether this will add more time than it will save.

#### rokicki

##### Member
Another way the moves-from-solved hash table could be used on the 3x3x3 is checking if your sequence of moves already made in your current search branch is an optimal move sequence or not.
From our comparison of the count of canonical sequences of length d with positions at distance d, and the
close correspondence seen there, we know such a check on the 3x3x3 is not beneficial.

I believe this is still an open question on the 4x4x4.

#### unsolved

##### Member
Your solved state check seems to assume the orientation of the puzzle matters.
I initiate the solver with a Top = top, Front = front, Right = right orientation, but the multi-threaded code tackles one rotated/revolved orientation per thread. So rather than run into the solution first encountered by a single move generator, there are several instances of the same move generator working on cubes laying on the side, rotated so that front = top, etc. Pure laziness on my part, but the best way to make use of so many cores.

(And by the way, it looks like you have some "=" operations that should be "==".)
Yes. I knew some of the readers here hate my long variables name, which I retyped, and I retyped the entire line of code there as well, leaving out the == where I had the single =.

Or else you could define 24 solved states
That fits in nicely with my laziness philosophy.

Your depth-5 or depth-7 database can be thought of as a "perfect" pruning table. It's perfect in the sense that instead a lower bound for the distance from solved, it gives the exact distance. The drawbacks of such a pruning table is poor pruning depth and high memory consumption. Storing 18 billion positions in RAM? That would seem to require the better part of a terabyte at 48 bytes/position without using symmetry reduction, and still a lot with symmetry reduction.
Yes, but as I alluded to, I found a much better way to handle it that requires only 1 byte per position!

It seems like "perfect pruning" does not provide the best tradeoff between performance and memory usage.
I refer to your corner-cube pruning as Leaf Node Pruning, but Perfect Pruning is more like Trunk Pruning. Perfect Pruning primarily lets you complete your overall search at least one iteration sooner, but it is useless at the leaf nodes. For example, if at depth == 5 I find a 7-turn solution for a 12-ply total solution, you might think I would have to keep searching until depth == 11 is completed. However, as I continue to search, I am guaranteed to find at least one "tfs-1" solution at the next iterative depth. When I search two consecutive depths, and the optimal distance has not improved, then I can stop searching altogether.

Suppose it finds a tfs (turns from solution) == 7 @ ply 5, then the best tfs == 6 @ ply 6. If I complete ply 6 and only hit tfs == 6, then I know the 12-ply solution is optimal. I'm done a 12-ply search after only 6-ply of nominal depth. So, this pruned the trunk of the tree, and did literally nothing to the leaf nodes.

When I probe the leaf nodes, at present I have no other comparison to make to halt the interim search. The only improvement would be if the depth + tfs best < the corner prune distance, then I could shave the tree much more.

Last edited:

##### Member
Yes, indeed. But it is a start and it can only improve from here. You have to understand, I am programming on a computer that has a total of 16 CPU cores (Dual Intel E5-2687W with 8 CPU cores per chip) and I am running this in single-core mode. Once I get it to run as a parallel search, the speedup in nodes/second is guaranteed to be linear (16x faster or 1.6 billion turns/second)
Speedup is never guaranteed to be linear. I would be very surprised if you get linear scaling up to 16 cores on a dual socket machine.

#### rokicki

##### Member
For this particular problem, which tends to be CPU-heavy, no shared read/write data, random memory
access (to hash tables and pruning tables, read-only), you actually get pretty good scalability. Almost
all of my twisty puzzle search programs over the years have scaled very well.

Dual sockets is a big deal, though. You normally get a pretty good speedup per-core by locking your
threads and memory to a single core, and using the machine essentially as two UP machines. (For
a four-socket machine, this is even more dramatic.) This does mean replicating your pruning tables
to memory attached to each socket rather than using one big pool.

It does depend on the program. Lowering the instructions-per-lookup is critical; indexing is often the
main bottleneck, followed by memory access itself.

#### rokicki

##### Member
Yes, but as I alluded to, I found a much better way to handle it that requires only 1 byte per position!
Information theory demands that, in order to accomplish this you had to make some sort of tradeoff.
Either that or you've found a significant pattern in positions-at-a-distance that has eluded us so far.
Would you care to share your trick?

Is your hash table exact (that is, given a position, I can get a definitive answer in constant time that
tells me the distance of the position if the distance is <= n, or tells me definitively that the distance is
> n), and works for all positions? For this problem, indexing generally does not work well (the table is
too sparse) so you usually end up having to explicitly store each position (which takes more than a byte
simply because of the count of possible positions).

Tricks such as bloom filters can give you approximate answers in somewhat less memory---but they
are approximate, not exact.

Last edited:
Status
Not open for further replies.