• Welcome to the Speedsolving.com, home of the web's largest puzzle community!
    You are currently viewing our forum as a guest which gives you limited access to join discussions and access our other features.

    Registration is fast, simple and absolutely free so please, join our community of 35,000+ people from around the world today!

    If you are already a member, simply login to hide this message and begin participating in the community!

Loopover god's number upper bounds: 4×4 (25 STM), asymptotics (O(N log^2 N)), etc.

Dec 24, 2015
Thread starter #1
Loopover = this game; STM = single-tile metric (the one Loopover implementations use to count moves); MTM = multi-tile metric (a seemingly more sensible choice of metric).

STM bound of 26 moves is one move less than the best I could find (27 according to this spreadsheet); MTM bound has not been previously calculated before, as far as I can tell, although chances are that I'm just not aware of it. This puzzle has "only" 21 trillion states and we can get 128-fold symmetry+antisymmetry reduction, so in theory it shouldn't be too hard to BFS the whole thing. (I did also recently get a new 4 TB hard drive…) The solving procedure is essentially: solve a bunch of pieces optimally, then solve the remaining pieces also optimally. The second part is something that earlier work did not do, which is why we can get an immediate one-move improvement.

(I might bother to do a more sophisticated analysis later on; there's still a lot of low-hanging fruit left to pick.)

Solving the top two rows optimally takes at most 10 MTM, 13 STM.
(0, 1)
(1, 18)
(2, 255)
(3, 3648)
(4, 48787)
(5, 605886)
(6, 6632955)
(7, 56497090)
(8, 253413670)
(9, 199503998)
(10, 2212092)

(nice formatting wew)
total states: 518918400
distance 0: 1 nodes
distance 1: 12 nodes
distance 2: 114 nodes
distance 3: 1084 nodes
distance 4: 9931 nodes
distance 5: 86036 nodes
distance 6: 703024 nodes
distance 7: 5240586 nodes
distance 8: 32884017 nodes
distance 9: 142199734 nodes
distance 10: 258379100 nodes
distance 11: 78645932 nodes
distance 12: 768545 nodes
distance 13: 284 nodes
average distance: 9.717073

Solving the bottom two rows optimally once the top two rows are solved takes at most 11 MTM, 13 STM. Note that almost all of the optimal solutions here temporarily destroy the initially solved rows before finally restoring them (because there are no free columns at all), with the only exceptions being the solutions for the sixteen states that can be solved by moving just the two free rows.
move count stats:
 0:         1
 1:         6
 2:         9
 3:        24
 4:       288
 5:      1132
 6:      1340
 7:      8936
 8:     16721
 9:      8992
10:      2703
11:       168
 0:         1
 1:         4
 2:         6
 3:        20
 4:       137
 5:       428
 6:       934
 7:      3172
 8:      8410
 9:     11024
10:      9792
11:      5276
12:       880
13:       236

From these, we conclude an upper bound of 21 MTM and 26 STM.

Alternatively (for STM), following earlier work, it is known that solving a 3×3 block takes at most 13 STM; running the optimal solver on the 7! = 5040 possible permutations of the last side last layer leads to a maximum of 13 STM as well, so we have a total of 26 STM.
move count stats:
 0:         1
 1:         4
 2:        10
 3:        24
 4:        58
 5:       140
 6:       334
 7:       726
 8:      1258
 9:      1442
10:       829
11:       176
12:        30
13:         8

Similarly, solving a 3×2 block takes at most 11 STM; running the optimal solver on the 10! = 3628800 permutations of the last ten pieces leads to a maximum of 15 STM, again a total of 26 STM.
move count stats:
 0:         1
 1:         6
 2:        23
 3:        96
 4:       451
 5:      1996
 6:      8370
 7:     33748
 8:    125081
 9:    397755
10:    927554
11:   1230082
12:    748542
13:    150702
14:      4378
15:        15

The solver I coded for this doesn't "truly" exploit much of the inherent symmetry of the puzzle. The puzzle state is stored as a 64-bit integer with four bits for each cell, rather than as a permutation coordinate; this allows for switching the top and bottom halves of the puzzle just by doing a bit rotation and xor-ing by a constant. Only one pruning table is used (solving the top eight pieces); this pruning table is stored very inefficiently by using a 32-bit index, but that's fine, because I have 2 GB RAM to spare and it's only one pruning table. This pruning table is applied to the puzzle state itself, a shifted version of the puzzle state (solving the bottom two rows instead of the top two rows, effectively), as well as doing those two things to the inverse state as well to increase average pruning depth.

When solving the last two rows in MTM, this managed to do about 3500 solves per second, with all 8! = 40320 cases being solved in 11.65 seconds.

The solver has not been thoroughly tested for correctness (I literally just wrote it yesterday and haven't reviewed it), but there is a failsafe check to ensure that the generated solutions really do solve the input puzzle state. In other words, if there's a bug, at worst it just means that the solutions are not optimal, as opposed to being completely wrong. Thus this will not affect the correctness of the upper bounds in the rest of this post. (Unless there's a bug in the checking code…)

I tried doing the thing where the solver switches to the inverse scramble opportunistically (like this) but it ended up being 40% slower than just doing regular IDA* (because of branch mispredicts?). Maybe I just didn't implement it right, idk.


Update (2019-09-10): STM bound has been reduced to 25 STM.

Each of the 284 eight-piece cosets that are 13 moves away from solved have the eight pieces ABCDEFGH in the opposite half of the board, e.g.

(For this specific board, it takes 13 moves to solve either ABCDEFGH or IJKLMNOP, and 17 moves to solve the whole thing.) Note that there are four different partitions into 2×4 blocks we can consider:




The only way a piece can be in the opposite block of its home position under all four of these partitions simultaneously is if it's at the antipodal position, i.e.

But it takes only 12 moves to solve ABCDEFGH on this board (shift the bottom two rows by two cells (4 moves), then shift every column by two cells (8 moves)). Thus there is no state that is in a 13-move eight-piece coset under all four partitions simultaneously, and it always takes at most 12 moves to solve some 2×4 block (although solving a specific block may sometimes take 13 moves). This leads to a total upper bound of 12 + 13 = 25 STM.


TL note: "future" probably means like in the rest of the week, depending.

Consider the asymptotic behaviour of GN in the single-tile and multi-tile metrics. In the latter, a counting argument shows that it is bounded below by Ω(n^2) and a 3-cycle method shows that it is bounded above by O(n^2), so god's number in MTM is necessarily Θ(n^2). The former seems to be a lot more complicated. Again, a counting argument gives a lower bound of Ω(n^2), but the 3-cycle method gives an upper bound of O(n^3). I think I have a way of bringing the upper bound down to something like O(n^2 (log n)^4) but I haven't worked out the details (so the exponent of the log factor is probably wrong); I'm pretty sure the actual asymptotic behaviour is Θ(n^2), but I can't figure out how to prove that.

As for a lower bound, there are two basic approaches to consider. As mentioned above, we have a counting argument leading to a lower bound of cn^2 + o(n^2) for some constant c ≥ 2, but for small sizes we can also consider a bound given by Manhattan distance. As the sides wrap around (or should I say, they loop over), the maximum possible distance in either axis is ⌊n/2⌋, so the maximum possible Manhattan distance for each piece is 2⌊n/2⌋. Each move affects n pieces, the Manhattan distances of which change by at most ±1; hence the maximum possible change in the sum of Manhattan distances is ±n. The sum of Manhattan distances for a whole board is bounded by 2⌊n/2⌋ × n^2, so this gives a GN lower bound of 2⌊n/2⌋ × n. One example of a board hitting this bound is by shifting all of the rows by half of the board size, then shifting all of the columns by half of the board size. (If n is odd, it doesn't matter whether we round down or round up.)

In particular, this gives a lower bound of 36 for 6×6, which is better than the counting bound of 31, at least according to the formula in the Loopover Research spreadsheet. (I haven't verified it, but it seems roughly correct.)
Last edited:
Dec 24, 2015
Thread starter #3
I don't understand

TL;DR please?
The tl;dr version is the thread title. ;) Explanations are, by their very nature, the exact opposite of tl;dr, so in case that's what you really wanted:

God's number is the maximum among optimal solution lengths: GN := max {min {length of S | S is a solution to P} | P is a puzzle state}.

But how do you measure the length of a solution? Cary's original Loopover counts moving by one cell as one move, moving by two cells as two moves, et cetera. This is the single-tile metric (STM). Besides STM, we can also treat moving by multiple cells as one move instead of multiple moves; this leads to the multi-tile metric (MTM). Naturally, god's number for the different metrics can be expected to be different (just as god's number for 3×3×3 is 20 in FTM and 26 in QTM, or god's number for 15 puzzle is 40-something in MTM and 80-something in STM).

Generally, a lower bound for GN is obtained either by counting move sequences (if there are only X move sequences of length at most N, and there are more than X puzzle states in total, then GN must be greater than X), or by explicitly finding some puzzle state that takes so-and-so many moves to solve. On the other hand, an upper bound is obtained by devising a solving algorithm that can solve any puzzle state within a certain number of moves. Here, the algorithm is to solve the top half of the puzzle while ignoring the bottom half (500 million different cases; at most 13 STM) and then the bottom half of the puzzle while preserving the top half (40320 different cases; coincidentally, also at most 13 STM); this gives an upper bound of 26 STM.

Again, like in cubing, we can consider analogues of "colour neutrality". In this case, instead of always solving the top 2×4 rectangle on the board for the first step, what if we choose to start with the bottom half instead? The left half? The right half? Or what if we switch to the inverse like we do for FMC? Et cetera. In this case, the 13 STM cases for the first step can always be avoided, so the first step takes at most 12 STM with "colour neutrality". This reduces the upper bound to 12 + 13 = 25 STM.
Last edited:
Dec 24, 2015
Thread starter #4
So, I said this:
>I think I have a way of bringing the upper bound down to something like O(n^2 (log n)^4)

Turns out my original idea didn't work, but I made it stupider and that gave me a bound of \( O(n^2\log^2n) \). And unlike most of my other theoryposts, this one comes with code. It might be possible to drop this further to \( O(n^2\log n) \) by using a parallelism trick similar to what Demaine et al used for showing that GN for n×n×n Rubik's cube is \( \Theta(n^2/\log n) \), and if that works, combining this trick with a hypothetical \( O(n\log n) \) sorting network of exactly the type of structure we need (I believe the existence of such sorting networks is an open problem) would prove a bound of \( \Theta(n^2) \).

More specifically, let the board size be \( R\times C \) (with \( R\ge2 \) rows and \( C\ge2 \) columns), and let \( N:=R\cdot C \). The solving algorithm presented results in a solution of length \( O(N\log^2N) \), regardless of the ratio \( R/C \); in particular, we do not assume \( R\ll C\ll R \).

(The hidden constant for this new solving algorithm is relatively large (\( \approx2.6N\log^2 N \) moves on random roughly-square boards), so it only wins over spamming basic 3-cycles (\( \approx0.85N^{1.5} \) moves on random roughly-square boards) when N exceeds around 250000, with both methods taking over 100 million moves. Needless to say, this algorithm is not very relevant for typical board sizes like 4×4 to 20×20.)

Definition 1: An involution is a permutation \( \pi \) such that \( \pi^2=\mathrm{id} \). (We consider the identity permutation to be an involution here; other sources may instead define involutions to be permutations of order 2, excluding the identity permutation. This distinction is of little significance here.)

Definition 2: Let \( \pi,\pi'\in\mathrm{Sym}(\Omega) \) be permutations on the same set. Say \( \pi'\prec\pi \) if, for all \( x\in\Omega \), we have that \( \pi'(x)\in\{x,\pi(x)\} \). (In other words, the non-trivial cycles of \( \pi' \) are a subset of the non-trivial cycles of \( \pi \).)

Definition 3: An involution \( \pi \) on a set of integers is said to be of uniform distance if there exists some \( d>0 \) such that for every transposition \( (a,b)\prec\pi \), we have that \( |a-b|=d \). (Note that \( d \) is uniquely determined except when \( \pi=\mathrm{id} \). I don't know if this has a proper name in literature.)

Definition 4: More generally, an involution \( \pi\in\mathrm{Sym}(G) \) on an additive (abelian) group \( G \) is said to be of uniform displacement if there exists some \( d\in G \) such that for every transposition \( (a,b)\prec\pi \), we have that \( a-b\in\{d,-d\} \).

Proposition 5: Let \( \pi\in\mathrm{Sym}(\{0,\cdots,n-1\}) \) be an arbitrary permutation on \( n \) elements. Then there exists a decomposition of \( \pi \) as the composition of \( O(\log^2n) \) uniform-distance involutions. Furthermore, if there is some threshold \( t\ge0 \) such that \( \pi(x)=x \) for all \( x<t \), then every term of the decomposition fixes every point in \( \{0,\cdots,t-1\} \).

We do not provide a full proof of this proposition, but the basic idea is to apply an \( O(n\log^2n) \) sorting network that can be divided into \( O(\log^2n) \) stages of comparators, where the comparators of a stage all have the same distance, and the swaps performed by the comparators are all disjoint. Note that this is a weaker condition than the comparators being disjoint; for example, the provided implementation of our algorithm makes use of Shell sort with 3-smooth gaps, with one stage for each gap used. Within a stage, except around the endpoints, every element is involved in two comparators, so the comparators are very much not disjoint, but with 3-smooth Shell sort, an element is never swapped more than once for each gap, i.e. the swaps are disjoint.

Proposition 6: Let \( \pi\in\mathrm{Sym}(\{0,\cdots,R-1\}\times\{0,\cdots,C-1\}) \) be an arbitrary permutation on \( N=RC \) elements. Then there exists a decomposition into \( O(\log^2N) \) uniform-displacement involutions (taking \( G=\mathbb Z^2 \) in the above definition of a uniform-displacement involution). (In particular, the displacement is not allowed to "wrap" around the sides.) Furthermore, if there is some threshold \( t\ge0 \) such that \( \pi(0,x)=(0,x) \) for all \( x<t \), then every term of the decomposition fixes every point in \( \{(0,0),\cdots,(0,t-1)\} \).

Proof: Identify the underlying set with \( \{0,\cdots,N-1\} \) with the bijection \( (r,c)\mapsto Cr+c \). Per the previous proposition, this leads to a decomposition into \( O(\log^2N) \) uniform-distance involutions. For each of these uniform-distance involutions \( \pi_i \) with distance \( d \), switching back to the original underlying set \( (\{0,\cdots,R-1\}\times\{0,\cdots,C-1\} \), there are only two possible displacements (up to sign): \( (\lfloor d/C\rfloor,d\% C) \) and \( (\lfloor d/C\rfloor+1,d\% C-C) \); this leads to a decomposition \( \pi_i=\sigma_i\tau_i \) for each \( \pi_i\ \), where \( \sigma_i,\tau_i \) are of those two displacements, respectively. Then \( \pi=\sigma_0\tau_0\sigma_1\tau_1\cdots \), a decomposition with at most twice as many nontrivial terms as the earlier (one-dimensional) decomposition, and hence also \( O(\log^2N) \). The preserved points carry over from the one-dimensional case.

Lemma 7: Let \( a,b,c\in\{0,\cdots,R-1\}\times\{0,\cdots,C-1\} \) be three distinct positions on an \( R\times C \) Loopover board. Then there exists a sequence of moves cycling the three pieces at those positions of length \( \le\frac32(\|a-b\|_1+\|b-c\|_1+\|c-a\|_1) \) where \( \|\cdot\|_1 \) denotes the Manhattan distance on the board.

Proof outline: If the three pieces are all in distinct rows and distinct columns, slide any one of the three pieces so that two of them are in the same row and the third is in a different row. If the three pieces are all in the same row/column, slide one of them out of the way. Now, with two pieces in the same row (or column), slide the row (or column) containing those two pieces so that one of the pieces is in the same column (or row) as the third piece, then do a basic commutator to cycle the three pieces and undo all the setup moves. For details, see the code.

Lemma 8: Let \( a,b,c,d\in\{0,\cdots,R-1\}\times\{0,\cdots,C-1\} \) be four distinct positions on an \( R\times C \) Loopover board. Then there exists a sequence of moves swapping the piece at \( a \) with the piece at \( b \) and the piece at \( c \) with the piece at \( d \) of length \( \le3\|a-b\|_1+3\|c-d\|_1+6\|a-c\|_1 \).

Proof: Do the 3-cycles \( (a,b,c) \) and \( (a,d,c) \). The total move count is bounded by \( \frac32(\|a-b\|_1+\|c-d\|_1+2\|a-c\|_1+\|b-c\|_1+\|a-d\|_1) \), which simplifies to \( 3\|a-b\|_1+3\|c-d\|_1+6\|a-c\|_1 \) with the triangle inequality.

Lemma 9: Let \( \pi\in\mathrm{Sym}(\{0,\cdots,R-1\}\times\{0,\cdots,C-1\}) \) be an even uniform-displacement involution on \( N=RC \) elements. Then there exists a sequence of \( O(N) \) moves with the effect of permuting the pieces on an \( R\times C \) Loopover board by \( \pi \).

Proof: Will anyone even read this far… I'll type this out later. The argument is kind of long and messy. In the meantime, check the code (and the comments).

Theorem 10: Any scrambled \( R\times C \) Loopover board can be solved in \( O(N\log^2N) \) moves.

Proof: We first solve the first two pieces of row 0 (either directly or via 3-cycles); this takes \( O(N) \) moves. If the board now has odd parity, apply a move in either the last column (if \( R \) is even) or in the last row (if \( C \) is even) to bring it to an even-parity state, still with the first two pieces solved.

Per the previous propositions, the inverse of the board state (as a permutation) can be decomposed into \( O(\log^2N) \) uniform-displacement involutions. For each of these involutions \( \pi_i \), let \( \pi_i':=\pi_i \) if it is an even permutation and \( \pi_i':=\pi_i\circ(a,a+d) \) for some \( (a,a+d)\prec\pi_i \) if it is an odd permutation. Then \( \pi_i' \) is always an even permutation. In the former case, we directly invoke Lemma 9 to get a move sequence of length \( O(N) \); in the latter case, we first apply the transpositions \( (a,a+d)(0,1) \), which takes \( O(R+C)\subseteq O(N) \) moves and then invoke Lemma 9 on \( \pi_i' \) to get a move sequence of total length \( O(N) \). In either case, the length is bounded by \( O(N) \) for each uniform-displacement involution. Since the board state at the beginning of this step is even and the moves applied for each uniform-distance involution all have zero net effect on the parity, it follows that the first two elements will be swapped an even number of times in total, and hence the board will be fully solved after applying all of the involutions in the decomposition.

This leads to a total move count of \( O(N)+O(\log^2N)\cdot O(N)=O(N\log^2N) \). □

Edit (2019-09-15): Some typos and minor mistakes fixed. The constant factors might still be wrong but good thing those aren't the main point of this post. The code has also been updated to fix an inefficiency on non-square boards (which made it not \( O(N\log^2N) \) for extreme aspect ratios), as well as a silly bug in generating random states where it doesn't check for parity on odd-sized boards.
Last edited:
Feb 4, 2019
oh cool, i used to own the loopover discord server and own/helped a lot with the research sheet you linked. im not remotely smart enough to understand what's happening but its cool seeing others working on GN for loopover. ill add this to the research sheet
Dec 24, 2015
Thread starter #6
tl;dr STM GN for 5×5 is at least 22.

Precise counts of "canonical" 5×5 move sequences in STM (where "canonical" means that a consecutive bunch of row moves are done starting from the top row to the bottom, no redundant moves are made, and the shorter path is always taken; likewise for column moves):

We divide move sequences into syllables and count them via generating functions. For a single syllable corresponding to row/horizontal moves, we have \( 1+2x+2x^2 \) for how an individual row can move, and thus \( (1+2x+2x^2)^5 \) for how the five rows can move. One of those corresponds to a trivial move sequence (move all of the rows by 0), so subtract one to exclude that: \( s(x):=(1+2x+2x^2)^5-1 \). By symmetry, we get the same expression for column/vertical moves.

The trivial move sequence consists of no syllables: 1. Any non-trivial canonical move sequence (uniquely) decomposes into a finite sequence of nontrivial syllables of alternating directions, i.e. it must be either H-V-H-V-H-etc. or V-H-V-H-V-etc. This is just a standard geometric progression and we use the standard formula for that: \( \frac{2s(x)}{1-s(x)} \). Add these two cases together and we get the generating function \( 1+\frac{2s(x)}{1-s(x)} \) for the number of canonical move sequences. Expand this to 24 terms or so:

\( 1 + 20 x + 300 x^2 + 4320 x^3 + 62120 x^4 + 893584 x^5 + 12854320 x^6 + 184910080 x^7 + 2659940480 x^8 \\
+ 38263376000 x^9 + 550420568192 x^{10} + 7917827258880 x^{11} + 113898339051520 x^{12} + 1638433274027520 x^{13} \\
+ 23568944163699200 x^{14} + 339040434418122752 x^{15} + 4877113517349276160 x^{16} + 70157520597607214080 x^{17} \\
+ 1009219424336639902720 x^{18} + 14517671630674993315840 x^{19} + 208837428703515031881728 x^{20} \\
+ 3004136802167631885004800 x^{21} + 43214657363697298461511680 x^{22} + 621644996231283351323279360 x^{23} + O(x^{24}) \)

We note that the 5×5 board has \( 25!/2 \approx 7.8\cdot10^{24} \) states, and up to 21 moves, there are less than \( 4\cdot10^{24} \) possible move sequences. Therefore GN for 5×5 in STM must be at least 22 moves. This improves on the Manhattan-distance bound of 20 moves.

Bonus: This also lets us approximate the branching ratio (with e.g. brute force or IDA* search); taking the two largest terms printed above and dividing, we get a value of around 14.38.

Bonus #2: There is a different notion of canonicalisation where we defer "global" shifts to the last two syllables (if there are at least two syllables); "global shift" meaning something like shifting every row by the same amount, which can be cancelled by a later syllable shifting every row by the same amount in the opposite direction. This reduces the effective move count a bit compared to the naïve canonicalisation, and is roughly akin to excluding stuff like L2 R2 U2 D2 L R' on the Rubik's cube. The calculation required is considerably more complicated and (i) I don't feel like doing it; (ii) it probably won't affect the above calculation much.

Bonus #3: It's probably possible to increase the bound to 24-25 by just trying to solve a few random boards; we don't even need to let the solves run to completion. Gonna try this later.
(putting an update in this post because it seems too spammy to have a separate post for each minor result)

STM GN for 4×4 is at most 24.

We divide the board into its top half and its bottom half, and consider the cases where the pieces in the top half require 12 or 13 moves to solve and same for the pieces in the bottom half. Naïvely this might seem like it would have (768545+284)^2 = 591,098,031,241 cases, which is a very large number, but most combinations are impossible, in the sense that there would be duplicate pieces (e.g. there are two "A"s on the board) and missing pieces (e.g. there is no "B" on the board). The number of possible combinations is only 953,528,011, which is still large, but more manageable.

Among those 954 million positions, 951,485,926 (~99.79%) of them have a 2×4 block that can be solved in at most 11 moves, in one of these four positions, on normal or on inverse:
. . . .
. . . .

. . . .
. . . .

A B . .
E F . .
I J . .
M N . .

. . C D
. . G H
. . K L
. . O P
By solving this 2×4 block first, then solving the rest, we obtain an upper bound of 24 moves for these positions. For the remaining positions, an optimal solver was used on all 2042085 such positions and all of them can be solved in at most 18 moves; an example of an 18-mover is:
with optimal solution (in a sorta SiGN-like notation) 1U 2U 1L 1U 1L 4L2 2U 2L 2U 4L 1U' 2L 3U 4L 1U2 1L'. (Interestingly, this has a lot of symmetry; it's equivalent to just transposing the solved board.)

Either there is a 2×4 block (on normal or on inverse) that can be solved in at most 11 moves, in which case the entire board can be solved in at most 11+13=24 moves, or all eight of the 2×4 blocks looked at require at least 12 moves and the entire board can be solved in at most 18 moves; therefore GN is at most 24.

I had to optimise the performance of my solver a lot for this approach to be feasible. As before, the "main" heuristic used is still the 8-piece pattern database for solving ABCDEFGH / IJKLMNOP, but now we also transpose the board (this can be done pretty quickly with SWAR magic) and apply the pattern database to check ABEFIJMN and CDGHKLOP as well. If all four values returned are the same (which is rarely the case on random states, but not-so-rarely the case for the specific states we're looking at), then we can bump up the heuristic by one.

(The SWAR magic: )
static long transposeSymmetry(long state) {
	// transpose the cells
	state = (state & 0xff00_ff00_00ff_00ffl) | ((state & 0x00ff_00ff_0000_0000l) >>> 24) | ((state & 0x0000_0000_ff00_ff00l) << 24);
	state = (state & 0xf0f0_0f0f_f0f0_0f0fl) | ((state & 0x0f0f_0000_0f0f_0000l) >>> 12) | ((state & 0x0000_f0f0_0000_f0f0l) << 12);
	// then relabel
	state = ((state & 0x3333_3333_3333_3333l) << 2) | ((state & 0xcccc_cccc_cccc_ccccl) >>> 2);
	return state;
In addition, I also added the walking distance (WD) heuristic. This provides a very tiny speedup (like a few percent), but it doesn't seem to be worse and I already spent the time coding it, so I might as well just leave it in. By itself, the WD heuristic also allows solving random states reasonably quickly (a few positions per second) without large lookup tables; this is the distribution:

distance 0: 1 double cosets / 1 cosets
distance 1: 2 double cosets / 512 cosets
distance 2: 46 double cosets / 72544 cosets
distance 3: 540 double cosets / 3322528 cosets
distance 4: 2781 double cosets / 24128154 cosets
distance 5: 4350 double cosets / 27447240 cosets
distance 6: 1886 double cosets / 7456910 cosets
distance 7: 492 double cosets / 625312 cosets
distance 8: 49 double cosets / 9799 cosets
total number of double cosets: 10147
average (for one axis): 4.647083646512218
average (for both axes): 9.294167293024437

While the average value of the WD heuristic is smaller than the 8-piece PDB (9.29 versus >9.7), it also has a larger maximum (16 versus 13). Since we're already computing the transposed board for the 8-piece PDB, we reuse that instead of transposing again. (Note that walking distance is a uniform improvement over Manhattan distance, in the sense that WD ≥ MD for any board, so it makes no sense to compute MD once we have WD. Furthermore, viewing WD as distance in the double-coset space (with the subgroups on both sides being generated by arbitrary row permutations), it turns out that the WD of a board is always the same as the WD of its inverse, so there's no need to explicitly compute that either (like we do for the 8-piece PDBs).)

The last optimisation I did was the very obvious thing of making use of the permutation parity of the board to exclude known-fruitless searches. E.g. if the board has even parity, we can skip searches of odd-length solutions. This provides something like a 3× speedup. (This parity constraint only works if the width and height are both even, so e.g. it wouldn't work on 3×4 or 5×5.) All in all, my current optimal solver manages around 240-250 random positions per second, compared to 27-ish before yesterday.

The main motivation for doing this was really just to check the effectiveness of the WD heuristic, to prepare for writing a 5×5 optimal solver. While it's not especially useful on 4×4 (thanks to the 8-piece PDB being at just about the right size to fit in free RAM), it seems that n-piece PDBs would have to be really big on 5×5 in order to be effective. Unlike 15-puzzle/24-puzzle, it's not clear how to create useful additive pattern databases on Loopover because every move has "nonlocal" effects. Stuff like linear conflict or invert distance that apply for 15-puzzle are basically useless on Loopover.
Last edited:
Sep 28, 2019
tl;dr STM GN for 5×5 is at least 22.
Applying the same method to other n gives the following bounds:

3 - 6
4 - 14
5 - 22
6 - 35
7 - 48
8 - 66
9 - 86
10 - 109
11 - 134
12 - 163
13 - 194
14 - 229
15 - 265
16 - 306
17 - 348
18 - 394
19 - 442
20 - 495
21 - 548
22 - 606
23 - 666
24 - 730
25 - 796
26 - 866
27 - 937
28 - 1013
29 - 1091
30 - 1173
31 - 1256
32 - 1344
33 - 1434
34 - 1528
35 - 1623
36 - 1724
37 - 1825
38 - 1931
39 - 2039
40 - 2151
41 - 2265
42 - 2383
43 - 2503
44 - 2628
45 - 2753
46 - 2884
47 - 3016
48 - 3153
49 - 3291
50 - 3434

The branching factor is 1/x where x is the positive real root of \( (1+2x+2x^2+\cdots+2x^{(n-1)/2})^n=2 \) for odd n and \( (1+2x+2x^2+\cdots+2x^{n/2-1}+x^{n/2})^n=2 \) for even n. For n=5, this is \( \displaystyle{\frac{2}{\sqrt{2^{6/5}-1}-1}\approx 14.3850497528\cdots} \)
Last edited: