• Welcome to the Speedsolving.com, home of the web's largest puzzle community!
    You are currently viewing our forum as a guest which gives you limited access to join discussions and access our other features.

    Registration is fast, simple and absolutely free so please, join our community of 35,000+ people from around the world today!

    If you are already a member, simply login to hide this message and begin participating in the community!

Loopover god's number upper bounds: 4×4, asymptotics, etc.


Dec 24, 2015
Summary of current knowledge (2020-05-07):

4×4: ≤22 (by coolcomputery); ≥18
5×5: ≤53 (by coolcomputery)
6×6: ≤103 (by coolcomputery)
m×n: \( O(N\log^2N) \) where \( N=mn \); ≥\( \lfloor m/2\rfloor n+m\lfloor n/2\rfloor \) (from considering Manhattan distance)
2×n: \( \le n(\log_2n+20) \)

4×4: ≤21
m×n: Θ(mn) (from counting arguments)


Loopover = this game; STM = single-tile metric (the one Loopover implementations use to count moves); MTM = multi-tile metric (a seemingly more sensible choice of metric).

STM bound of 26 moves is one move less than the best I could find (27 according to this spreadsheet); MTM bound has not been previously calculated before, as far as I can tell, although chances are that I'm just not aware of it. This puzzle has "only" 21 trillion states and we can get 128-fold symmetry+antisymmetry reduction, so in theory it shouldn't be too hard to BFS the whole thing. (I did also recently get a new 4 TB hard drive…) The solving procedure is essentially: solve a bunch of pieces optimally, then solve the remaining pieces also optimally. The second part is something that earlier work did not do, which is why we can get an immediate one-move improvement.

(I might bother to do a more sophisticated analysis later on; there's still a lot of low-hanging fruit left to pick.)

Solving the top two rows optimally takes at most 10 MTM, 13 STM.
(0, 1)
(1, 18)
(2, 255)
(3, 3648)
(4, 48787)
(5, 605886)
(6, 6632955)
(7, 56497090)
(8, 253413670)
(9, 199503998)
(10, 2212092)

(nice formatting wew)
total states: 518918400
distance 0: 1 nodes
distance 1: 12 nodes
distance 2: 114 nodes
distance 3: 1084 nodes
distance 4: 9931 nodes
distance 5: 86036 nodes
distance 6: 703024 nodes
distance 7: 5240586 nodes
distance 8: 32884017 nodes
distance 9: 142199734 nodes
distance 10: 258379100 nodes
distance 11: 78645932 nodes
distance 12: 768545 nodes
distance 13: 284 nodes
average distance: 9.717073

Solving the bottom two rows optimally once the top two rows are solved takes at most 11 MTM, 13 STM. Note that almost all of the optimal solutions here temporarily destroy the initially solved rows before finally restoring them (because there are no free columns at all), with the only exceptions being the solutions for the sixteen states that can be solved by moving just the two free rows.
move count stats:
 0:         1
 1:         6
 2:         9
 3:        24
 4:       288
 5:      1132
 6:      1340
 7:      8936
 8:     16721
 9:      8992
10:      2703
11:       168
 0:         1
 1:         4
 2:         6
 3:        20
 4:       137
 5:       428
 6:       934
 7:      3172
 8:      8410
 9:     11024
10:      9792
11:      5276
12:       880
13:       236

From these, we conclude an upper bound of 21 MTM and 26 STM.

Alternatively (for STM), following earlier work, it is known that solving a 3×3 block takes at most 13 STM; running the optimal solver on the 7! = 5040 possible permutations of the last side last layer leads to a maximum of 13 STM as well, so we have a total of 26 STM.
move count stats:
 0:         1
 1:         4
 2:        10
 3:        24
 4:        58
 5:       140
 6:       334
 7:       726
 8:      1258
 9:      1442
10:       829
11:       176
12:        30
13:         8

Similarly, solving a 3×2 block takes at most 11 STM; running the optimal solver on the 10! = 3628800 permutations of the last ten pieces leads to a maximum of 15 STM, again a total of 26 STM.
move count stats:
 0:         1
 1:         6
 2:        23
 3:        96
 4:       451
 5:      1996
 6:      8370
 7:     33748
 8:    125081
 9:    397755
10:    927554
11:   1230082
12:    748542
13:    150702
14:      4378
15:        15

The solver I coded for this doesn't "truly" exploit much of the inherent symmetry of the puzzle. The puzzle state is stored as a 64-bit integer with four bits for each cell, rather than as a permutation coordinate; this allows for switching the top and bottom halves of the puzzle just by doing a bit rotation and xor-ing by a constant. Only one pruning table is used (solving the top eight pieces); this pruning table is stored very inefficiently by using a 32-bit index, but that's fine, because I have 2 GB RAM to spare and it's only one pruning table. This pruning table is applied to the puzzle state itself, a shifted version of the puzzle state (solving the bottom two rows instead of the top two rows, effectively), as well as doing those two things to the inverse state as well to increase average pruning depth.

When solving the last two rows in MTM, this managed to do about 3500 solves per second, with all 8! = 40320 cases being solved in 11.65 seconds.

The solver has not been thoroughly tested for correctness (I literally just wrote it yesterday and haven't reviewed it), but there is a failsafe check to ensure that the generated solutions really do solve the input puzzle state. In other words, if there's a bug, at worst it just means that the solutions are not optimal, as opposed to being completely wrong. Thus this will not affect the correctness of the upper bounds in the rest of this post. (Unless there's a bug in the checking code…)

I tried doing the thing where the solver switches to the inverse scramble opportunistically (like this) but it ended up being 40% slower than just doing regular IDA* (because of branch mispredicts?). Maybe I just didn't implement it right, idk.


Update (2019-09-10): STM bound has been reduced to 25 STM.

Each of the 284 eight-piece cosets that are 13 moves away from solved have the eight pieces ABCDEFGH in the opposite half of the board, e.g.

(For this specific board, it takes 13 moves to solve either ABCDEFGH or IJKLMNOP, and 17 moves to solve the whole thing.) Note that there are four different partitions into 2×4 blocks we can consider:




The only way a piece can be in the opposite block of its home position under all four of these partitions simultaneously is if it's at the antipodal position, i.e.

But it takes only 12 moves to solve ABCDEFGH on this board (shift the bottom two rows by two cells (4 moves), then shift every column by two cells (8 moves)). Thus there is no state that is in a 13-move eight-piece coset under all four partitions simultaneously, and it always takes at most 12 moves to solve some 2×4 block (although solving a specific block may sometimes take 13 moves). This leads to a total upper bound of 12 + 13 = 25 STM.


TL note: "future" probably means like in the rest of the week, depending.

Consider the asymptotic behaviour of GN in the single-tile and multi-tile metrics. In the latter, a counting argument shows that it is bounded below by Ω(n^2) and a 3-cycle method shows that it is bounded above by O(n^2), so god's number in MTM is necessarily Θ(n^2). The former seems to be a lot more complicated. Again, a counting argument gives a lower bound of Ω(n^2), but the 3-cycle method gives an upper bound of O(n^3). I think I have a way of bringing the upper bound down to something like O(n^2 (log n)^4) but I haven't worked out the details (so the exponent of the log factor is probably wrong); I'm pretty sure the actual asymptotic behaviour is Θ(n^2), but I can't figure out how to prove that.

As for a lower bound, there are two basic approaches to consider. As mentioned above, we have a counting argument leading to a lower bound of cn^2 + o(n^2) for some constant c ≥ 2, but for small sizes we can also consider a bound given by Manhattan distance. As the sides wrap around (or should I say, they loop over), the maximum possible distance in either axis is ⌊n/2⌋, so the maximum possible Manhattan distance for each piece is 2⌊n/2⌋. Each move affects n pieces, the Manhattan distances of which change by at most ±1; hence the maximum possible change in the sum of Manhattan distances is ±n. The sum of Manhattan distances for a whole board is bounded by 2⌊n/2⌋ × n^2, so this gives a GN lower bound of 2⌊n/2⌋ × n. One example of a board hitting this bound is by shifting all of the rows by half of the board size, then shifting all of the columns by half of the board size. (If n is odd, it doesn't matter whether we round down or round up.)

In particular, this gives a lower bound of 36 for 6×6, which is better than the counting bound of 31, at least according to the formula in the Loopover Research spreadsheet. (I haven't verified it, but it seems roughly correct.)
Last edited:


Dec 24, 2015
I don't understand

TL;DR please?
The tl;dr version is the thread title. ;) Explanations are, by their very nature, the exact opposite of tl;dr, so in case that's what you really wanted:

God's number is the maximum among optimal solution lengths: GN := max {min {length of S | S is a solution to P} | P is a puzzle state}.

But how do you measure the length of a solution? Cary's original Loopover counts moving by one cell as one move, moving by two cells as two moves, et cetera. This is the single-tile metric (STM). Besides STM, we can also treat moving by multiple cells as one move instead of multiple moves; this leads to the multi-tile metric (MTM). Naturally, god's number for the different metrics can be expected to be different (just as god's number for 3×3×3 is 20 in FTM and 26 in QTM, or god's number for 15 puzzle is 40-something in MTM and 80-something in STM).

Generally, a lower bound for GN is obtained either by counting move sequences (if there are only X move sequences of length at most N, and there are more than X puzzle states in total, then GN must be greater than X), or by explicitly finding some puzzle state that takes so-and-so many moves to solve. On the other hand, an upper bound is obtained by devising a solving algorithm that can solve any puzzle state within a certain number of moves. Here, the algorithm is to solve the top half of the puzzle while ignoring the bottom half (500 million different cases; at most 13 STM) and then the bottom half of the puzzle while preserving the top half (40320 different cases; coincidentally, also at most 13 STM); this gives an upper bound of 26 STM.

Again, like in cubing, we can consider analogues of "colour neutrality". In this case, instead of always solving the top 2×4 rectangle on the board for the first step, what if we choose to start with the bottom half instead? The left half? The right half? Or what if we switch to the inverse like we do for FMC? Et cetera. In this case, the 13 STM cases for the first step can always be avoided, so the first step takes at most 12 STM with "colour neutrality". This reduces the upper bound to 12 + 13 = 25 STM.
Last edited:


Dec 24, 2015
So, I said this:
>I think I have a way of bringing the upper bound down to something like O(n^2 (log n)^4)

Turns out my original idea didn't work, but I made it stupider and that gave me a bound of \( O(n^2\log^2n) \). And unlike most of my other theoryposts, this one comes with code. It might be possible to drop this further to \( O(n^2\log n) \) by using a parallelism trick similar to what Demaine et al used for showing that GN for n×n×n Rubik's cube is \( \Theta(n^2/\log n) \), and if that works, combining this trick with a hypothetical \( O(n\log n) \) sorting network of exactly the type of structure we need (I believe the existence of such sorting networks is an open problem) would prove a bound of \( \Theta(n^2) \).

More specifically, let the board size be \( R\times C \) (with \( R\ge2 \) rows and \( C\ge2 \) columns), and let \( N:=R\cdot C \). The solving algorithm presented results in a solution of length \( O(N\log^2N) \), regardless of the ratio \( R/C \); in particular, we do not assume \( R\ll C\ll R \).

(The hidden constant for this new solving algorithm is relatively large (\( \approx2.6N\log^2 N \) moves on random roughly-square boards), so it only wins over spamming basic 3-cycles (\( \approx0.85N^{1.5} \) moves on random roughly-square boards) when N exceeds around 250000, with both methods taking over 100 million moves. Needless to say, this algorithm is not very relevant for typical board sizes like 4×4 to 20×20.)

Definition 1: An involution is a permutation \( \pi \) such that \( \pi^2=\mathrm{id} \). (We consider the identity permutation to be an involution here; other sources may instead define involutions to be permutations of order 2, excluding the identity permutation. This distinction is of little significance here.)

Definition 2: Let \( \pi,\pi'\in\mathrm{Sym}(\Omega) \) be permutations on the same set. Say \( \pi'\prec\pi \) if, for all \( x\in\Omega \), we have that \( \pi'(x)\in\{x,\pi(x)\} \). (In other words, the non-trivial cycles of \( \pi' \) are a subset of the non-trivial cycles of \( \pi \).)

Definition 3: An involution \( \pi \) on a set of integers is said to be of uniform distance if there exists some \( d>0 \) such that for every transposition \( (a,b)\prec\pi \), we have that \( |a-b|=d \). (Note that \( d \) is uniquely determined except when \( \pi=\mathrm{id} \). I don't know if this has a proper name in literature.)

Definition 4: More generally, an involution \( \pi\in\mathrm{Sym}(G) \) on an additive (abelian) group \( G \) is said to be of uniform displacement if there exists some \( d\in G \) such that for every transposition \( (a,b)\prec\pi \), we have that \( a-b\in\{d,-d\} \).

Proposition 5: Let \( \pi\in\mathrm{Sym}(\{0,\cdots,n-1\}) \) be an arbitrary permutation on \( n \) elements. Then there exists a decomposition of \( \pi \) as the composition of \( O(\log^2n) \) uniform-distance involutions. Furthermore, if there is some threshold \( t\ge0 \) such that \( \pi(x)=x \) for all \( x<t \), then every term of the decomposition fixes every point in \( \{0,\cdots,t-1\} \).

We do not provide a full proof of this proposition, but the basic idea is to apply an \( O(n\log^2n) \) sorting network that can be divided into \( O(\log^2n) \) stages of comparators, where the comparators of a stage all have the same distance, and the swaps performed by the comparators are all disjoint. Note that this is a weaker condition than the comparators being disjoint; for example, the provided implementation of our algorithm makes use of Shell sort with 3-smooth gaps, with one stage for each gap used. Within a stage, except around the endpoints, every element is involved in two comparators, so the comparators are very much not disjoint, but with 3-smooth Shell sort, an element is never swapped more than once for each gap, i.e. the swaps are disjoint.

Proposition 6: Let \( \pi\in\mathrm{Sym}(\{0,\cdots,R-1\}\times\{0,\cdots,C-1\}) \) be an arbitrary permutation on \( N=RC \) elements. Then there exists a decomposition into \( O(\log^2N) \) uniform-displacement involutions (taking \( G=\mathbb Z^2 \) in the above definition of a uniform-displacement involution). (In particular, the displacement is not allowed to "wrap" around the sides.) Furthermore, if there is some threshold \( t\ge0 \) such that \( \pi(0,x)=(0,x) \) for all \( x<t \), then every term of the decomposition fixes every point in \( \{(0,0),\cdots,(0,t-1)\} \).

Proof: Identify the underlying set with \( \{0,\cdots,N-1\} \) with the bijection \( (r,c)\mapsto Cr+c \). Per the previous proposition, this leads to a decomposition into \( O(\log^2N) \) uniform-distance involutions. For each of these uniform-distance involutions \( \pi_i \) with distance \( d \), switching back to the original underlying set \( (\{0,\cdots,R-1\}\times\{0,\cdots,C-1\} \), there are only two possible displacements (up to sign): \( (\lfloor d/C\rfloor,d\% C) \) and \( (\lfloor d/C\rfloor+1,d\% C-C) \); this leads to a decomposition \( \pi_i=\sigma_i\tau_i \) for each \( \pi_i\ \), where \( \sigma_i,\tau_i \) are of those two displacements, respectively. Then \( \pi=\sigma_0\tau_0\sigma_1\tau_1\cdots \), a decomposition with at most twice as many nontrivial terms as the earlier (one-dimensional) decomposition, and hence also \( O(\log^2N) \). The preserved points carry over from the one-dimensional case.

Lemma 7: Let \( a,b,c\in\{0,\cdots,R-1\}\times\{0,\cdots,C-1\} \) be three distinct positions on an \( R\times C \) Loopover board. Then there exists a sequence of moves cycling the three pieces at those positions of length \( \le\frac32(\|a-b\|_1+\|b-c\|_1+\|c-a\|_1) \) where \( \|\cdot\|_1 \) denotes the Manhattan distance on the board.

Proof outline: If the three pieces are all in distinct rows and distinct columns, slide any one of the three pieces so that two of them are in the same row and the third is in a different row. If the three pieces are all in the same row/column, slide one of them out of the way. Now, with two pieces in the same row (or column), slide the row (or column) containing those two pieces so that one of the pieces is in the same column (or row) as the third piece, then do a basic commutator to cycle the three pieces and undo all the setup moves. For details, see the code.

Lemma 8: Let \( a,b,c,d\in\{0,\cdots,R-1\}\times\{0,\cdots,C-1\} \) be four distinct positions on an \( R\times C \) Loopover board. Then there exists a sequence of moves swapping the piece at \( a \) with the piece at \( b \) and the piece at \( c \) with the piece at \( d \) of length \( \le3\|a-b\|_1+3\|c-d\|_1+6\|a-c\|_1 \).

Proof: Do the 3-cycles \( (a,b,c) \) and \( (a,d,c) \). The total move count is bounded by \( \frac32(\|a-b\|_1+\|c-d\|_1+2\|a-c\|_1+\|b-c\|_1+\|a-d\|_1) \), which simplifies to \( 3\|a-b\|_1+3\|c-d\|_1+6\|a-c\|_1 \) with the triangle inequality.

Lemma 9: Let \( \pi\in\mathrm{Sym}(\{0,\cdots,R-1\}\times\{0,\cdots,C-1\}) \) be an even uniform-displacement involution on \( N=RC \) elements. Then there exists a sequence of \( O(N) \) moves with the effect of permuting the pieces on an \( R\times C \) Loopover board by \( \pi \).

Proof: Will anyone even read this far… I'll type this out later. The argument is kind of long and messy. In the meantime, check the code (and the comments).

Theorem 10: Any scrambled \( R\times C \) Loopover board can be solved in \( O(N\log^2N) \) moves.

Proof: We first solve the first two pieces of row 0 (either directly or via 3-cycles); this takes \( O(N) \) moves. If the board now has odd parity, apply a move in either the last column (if \( R \) is even) or in the last row (if \( C \) is even) to bring it to an even-parity state, still with the first two pieces solved.

Per the previous propositions, the inverse of the board state (as a permutation) can be decomposed into \( O(\log^2N) \) uniform-displacement involutions. For each of these involutions \( \pi_i \), let \( \pi_i':=\pi_i \) if it is an even permutation and \( \pi_i':=\pi_i\circ(a,a+d) \) for some \( (a,a+d)\prec\pi_i \) if it is an odd permutation. Then \( \pi_i' \) is always an even permutation. In the former case, we directly invoke Lemma 9 to get a move sequence of length \( O(N) \); in the latter case, we first apply the transpositions \( (a,a+d)(0,1) \), which takes \( O(R+C)\subseteq O(N) \) moves and then invoke Lemma 9 on \( \pi_i' \) to get a move sequence of total length \( O(N) \). In either case, the length is bounded by \( O(N) \) for each uniform-displacement involution. Since the board state at the beginning of this step is even and the moves applied for each uniform-distance involution all have zero net effect on the parity, it follows that the first two elements will be swapped an even number of times in total, and hence the board will be fully solved after applying all of the involutions in the decomposition.

This leads to a total move count of \( O(N)+O(\log^2N)\cdot O(N)=O(N\log^2N) \). □

Edit (2019-09-15): Some typos and minor mistakes fixed. The constant factors might still be wrong but good thing those aren't the main point of this post. The code has also been updated to fix an inefficiency on non-square boards (which made it not \( O(N\log^2N) \) for extreme aspect ratios), as well as a silly bug in generating random states where it doesn't check for parity on odd-sized boards.
Last edited:


Feb 4, 2019
oh cool, i used to own the loopover discord server and own/helped a lot with the research sheet you linked. im not remotely smart enough to understand what's happening but its cool seeing others working on GN for loopover. ill add this to the research sheet


Dec 24, 2015
tl;dr STM GN for 5×5 is at least 22.

Precise counts of "canonical" 5×5 move sequences in STM (where "canonical" means that a consecutive bunch of row moves are done starting from the top row to the bottom, no redundant moves are made, and the shorter path is always taken; likewise for column moves):

We divide move sequences into syllables and count them via generating functions. For a single syllable corresponding to row/horizontal moves, we have \( 1+2x+2x^2 \) for how an individual row can move, and thus \( (1+2x+2x^2)^5 \) for how the five rows can move. One of those corresponds to a trivial move sequence (move all of the rows by 0), so subtract one to exclude that: \( s(x):=(1+2x+2x^2)^5-1 \). By symmetry, we get the same expression for column/vertical moves.

The trivial move sequence consists of no syllables: 1. Any non-trivial canonical move sequence (uniquely) decomposes into a finite sequence of nontrivial syllables of alternating directions, i.e. it must be either H-V-H-V-H-etc. or V-H-V-H-V-etc. This is just a standard geometric progression and we use the standard formula for that: \( \frac{2s(x)}{1-s(x)} \). Add these two cases together and we get the generating function \( 1+\frac{2s(x)}{1-s(x)} \) for the number of canonical move sequences. Expand this to 24 terms or so:

\( 1 + 20 x + 300 x^2 + 4320 x^3 + 62120 x^4 + 893584 x^5 + 12854320 x^6 + 184910080 x^7 + 2659940480 x^8 \\
+ 38263376000 x^9 + 550420568192 x^{10} + 7917827258880 x^{11} + 113898339051520 x^{12} + 1638433274027520 x^{13} \\
+ 23568944163699200 x^{14} + 339040434418122752 x^{15} + 4877113517349276160 x^{16} + 70157520597607214080 x^{17} \\
+ 1009219424336639902720 x^{18} + 14517671630674993315840 x^{19} + 208837428703515031881728 x^{20} \\
+ 3004136802167631885004800 x^{21} + 43214657363697298461511680 x^{22} + 621644996231283351323279360 x^{23} + O(x^{24}) \)

We note that the 5×5 board has \( 25!/2 \approx 7.8\cdot10^{24} \) states, and up to 21 moves, there are less than \( 4\cdot10^{24} \) possible move sequences. Therefore GN for 5×5 in STM must be at least 22 moves. This improves on the Manhattan-distance bound of 20 moves.

Bonus: This also lets us approximate the branching ratio (with e.g. brute force or IDA* search); taking the two largest terms printed above and dividing, we get a value of around 14.38.

Bonus #2: There is a different notion of canonicalisation where we defer "global" shifts to the last two syllables (if there are at least two syllables); "global shift" meaning something like shifting every row by the same amount, which can be cancelled by a later syllable shifting every row by the same amount in the opposite direction. This reduces the effective move count a bit compared to the naïve canonicalisation, and is roughly akin to excluding stuff like L2 R2 U2 D2 L R' on the Rubik's cube. The calculation required is considerably more complicated and (i) I don't feel like doing it; (ii) it probably won't affect the above calculation much.

Bonus #3: It's probably possible to increase the bound to 24-25 by just trying to solve a few random boards; we don't even need to let the solves run to completion. Gonna try this later.
(putting an update in this post because it seems too spammy to have a separate post for each minor result)

STM GN for 4×4 is at most 24.

We divide the board into its top half and its bottom half, and consider the cases where the pieces in the top half require 12 or 13 moves to solve and same for the pieces in the bottom half. Naïvely this might seem like it would have (768545+284)^2 = 591,098,031,241 cases, which is a very large number, but most combinations are impossible, in the sense that there would be duplicate pieces (e.g. there are two "A"s on the board) and missing pieces (e.g. there is no "B" on the board). The number of possible combinations is only 953,528,011, which is still large, but more manageable.

Among those 954 million positions, 951,485,926 (~99.79%) of them have a 2×4 block that can be solved in at most 11 moves, in one of these four positions, on normal or on inverse:
. . . .
. . . .

. . . .
. . . .

A B . .
E F . .
I J . .
M N . .

. . C D
. . G H
. . K L
. . O P
By solving this 2×4 block first, then solving the rest, we obtain an upper bound of 24 moves for these positions. For the remaining positions, an optimal solver was used on all 2042085 such positions and all of them can be solved in at most 18 moves; an example of an 18-mover is:
with optimal solution (in a sorta SiGN-like notation) 1U 2U 1L 1U 1L 4L2 2U 2L 2U 4L 1U' 2L 3U 4L 1U2 1L'. (Interestingly, this has a lot of symmetry; it's equivalent to just transposing the solved board.)

Either there is a 2×4 block (on normal or on inverse) that can be solved in at most 11 moves, in which case the entire board can be solved in at most 11+13=24 moves, or all eight of the 2×4 blocks looked at require at least 12 moves and the entire board can be solved in at most 18 moves; therefore GN is at most 24.

I had to optimise the performance of my solver a lot for this approach to be feasible. As before, the "main" heuristic used is still the 8-piece pattern database for solving ABCDEFGH / IJKLMNOP, but now we also transpose the board (this can be done pretty quickly with SWAR magic) and apply the pattern database to check ABEFIJMN and CDGHKLOP as well. If all four values returned are the same (which is rarely the case on random states, but not-so-rarely the case for the specific states we're looking at), then we can bump up the heuristic by one.

(The SWAR magic: )
static long transposeSymmetry(long state) {
	// transpose the cells
	state = (state & 0xff00_ff00_00ff_00ffl) | ((state & 0x00ff_00ff_0000_0000l) >>> 24) | ((state & 0x0000_0000_ff00_ff00l) << 24);
	state = (state & 0xf0f0_0f0f_f0f0_0f0fl) | ((state & 0x0f0f_0000_0f0f_0000l) >>> 12) | ((state & 0x0000_f0f0_0000_f0f0l) << 12);
	// then relabel
	state = ((state & 0x3333_3333_3333_3333l) << 2) | ((state & 0xcccc_cccc_cccc_ccccl) >>> 2);
	return state;
In addition, I also added the walking distance (WD) heuristic. This provides a very tiny speedup (like a few percent), but it doesn't seem to be worse and I already spent the time coding it, so I might as well just leave it in. By itself, the WD heuristic also allows solving random states reasonably quickly (a few positions per second) without large lookup tables; this is the distribution:

distance 0: 1 double cosets / 1 cosets
distance 1: 2 double cosets / 512 cosets
distance 2: 46 double cosets / 72544 cosets
distance 3: 540 double cosets / 3322528 cosets
distance 4: 2781 double cosets / 24128154 cosets
distance 5: 4350 double cosets / 27447240 cosets
distance 6: 1886 double cosets / 7456910 cosets
distance 7: 492 double cosets / 625312 cosets
distance 8: 49 double cosets / 9799 cosets
total number of double cosets: 10147
average (for one axis): 4.647083646512218
average (for both axes): 9.294167293024437

While the average value of the WD heuristic is smaller than the 8-piece PDB (9.29 versus >9.7), it also has a larger maximum (16 versus 13). Since we're already computing the transposed board for the 8-piece PDB, we reuse that instead of transposing again. (Note that walking distance is a uniform improvement over Manhattan distance, in the sense that WD ≥ MD for any board, so it makes no sense to compute MD once we have WD. Furthermore, viewing WD as distance in the double-coset space (with the subgroups on both sides being generated by arbitrary row permutations), it turns out that the WD of a board is always the same as the WD of its inverse, so there's no need to explicitly compute that either (like we do for the 8-piece PDBs).)

The last optimisation I did was the very obvious thing of making use of the permutation parity of the board to exclude known-fruitless searches. E.g. if the board has even parity, we can skip searches of odd-length solutions. This provides something like a 3× speedup. (This parity constraint only works if the width and height are both even, so e.g. it wouldn't work on 3×4 or 5×5.) All in all, my current optimal solver manages around 240-250 random positions per second, compared to 27-ish before yesterday.

The main motivation for doing this was really just to check the effectiveness of the WD heuristic, to prepare for writing a 5×5 optimal solver. While it's not especially useful on 4×4 (thanks to the 8-piece PDB being at just about the right size to fit in free RAM), it seems that n-piece PDBs would have to be really big on 5×5 in order to be effective. Unlike 15-puzzle/24-puzzle, it's not clear how to create useful additive pattern databases on Loopover because every move has "nonlocal" effects. Stuff like linear conflict or invert distance that apply for 15-puzzle are basically useless on Loopover.
Last edited:

Ben Whitmore

Sep 28, 2019
tl;dr STM GN for 5×5 is at least 22.
Applying the same method to other n gives the following bounds:

3 - 6
4 - 14
5 - 22
6 - 35
7 - 48
8 - 66
9 - 86
10 - 109
11 - 134
12 - 163
13 - 194
14 - 229
15 - 265
16 - 306
17 - 348
18 - 394
19 - 442
20 - 495
21 - 548
22 - 606
23 - 666
24 - 730
25 - 796
26 - 866
27 - 937
28 - 1013
29 - 1091
30 - 1173
31 - 1256
32 - 1344
33 - 1434
34 - 1528
35 - 1623
36 - 1724
37 - 1825
38 - 1931
39 - 2039
40 - 2151
41 - 2265
42 - 2383
43 - 2503
44 - 2628
45 - 2753
46 - 2884
47 - 3016
48 - 3153
49 - 3291
50 - 3434

The branching factor is 1/x where x is the positive real root of \( (1+2x+2x^2+\cdots+2x^{(n-1)/2})^n=2 \) for odd n and \( (1+2x+2x^2+\cdots+2x^{n/2-1}+x^{n/2})^n=2 \) for even n. For n=5, this is \( \displaystyle{\frac{2}{\sqrt{2^{6/5}-1}-1}\approx 14.3850497528\cdots} \)
Last edited:


Nov 21, 2019
4x4 STM GD is at most 23

The solved 4x4 board will look like this:
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
1. Enumerate all ways of building the 2x3 whose upper-left corner is 0, 0 (i. e. the pieces 0, 1, 2, 4 5, and 6) and store the BFS tree
this gives GD 11
0: 1
1: 10
2: 89
3: 786
4: 6292
5: 44605
6: 267634
7: 1163031
8: 2605100
9: 1586474
10: 91700
11: 38
2. Improve the resulting distribution by seeing if any permutation with depth D can be avoided by solving a translated 2x3 block, ex. solving the pieces {1, 2, 3, 5, 6, 7} or {4, 5, 6, 8, 9, 10} that is guaranteed to be solved with depth <D.
This makes a very small improvement.
8: 2604856
9: 1572528
10: 58932
11: 0
no permutations with depth <8 are improved
3. Enumerate all ways of building the last two layers and last column and store the BFS tree
this gives GD 18 (it's not the optimal solver xyzzy mentioned; I don't know how to program it; this solver only moves the last two rows and the last column, so it does not allow temporarily destroying the solved 2x3)
0: 1
1: 6
2: 23
3: 84
4: 301
5: 1056
6: 3622
7: 11838
8: 36287
9: 101888
10: 251901
11: 535540
12: 929991
13: 1040929
14: 584196
15: 122857
16: 8076
17: 202
18: 2
(Steps 1 and 3 take 36.042 seconds; Step 2 takes 408.836 seconds.)
4. Create every permutation that will be solved in 24 moves or greater when applying these BFS trees on it directly (i. e. solving pieces {0, 1, 2, 4, 5, 6}, then all other pieces).
Try solving each permutation by applying the BFS trees, but translated (ex. first solving {1, 2, 3, 5, 6, 7}, then the rest; or first solving {4, 5, 6, 8, 9, 10}, then the rest).

There are 270,178,609,964 such permutations. Running them took 359547.840 seconds (about 4 days and 4 hours) on an Intel i5 and 16 GB RAM.
To make everything run fast, I compressed certain data into 64-bit ints and compressed specific sequences of moves into single permutation operations.

IMPORTANT UPDATE: the stuff crossed out below is incorrect; it assumes that the pieces in the 5x5 that are not in the 3x4 region are in specific positions and thus its upper bound might be more optimistic than it should be.
Update: 5x5 STM GD is at most 53
From earlier work on this Google Sheets page, the GDs of making a 2x3, expanding to a 3x4, and expanding that to the entire board are 13, 17, and 25 respectively.
0: 1
1: 10
2: 94
3: 880
4: 7558
5: 58432
6: 402862
7: 2335840
8: 10374813
9: 30419907
10: 47958515
11: 30962779
12: 4936603
13: 53706
0: 1
1: 4
2: 20
3: 102
4: 482
5: 2222
6: 9748
7: 39836
8: 150138
9: 511391
10: 1506067
11: 3451772
12: 5352468
13: 5069035
14: 2687355
15: 693629
16: 60221
17: 549
I did the same improvement algorithm I did above but this time with making a 2x3 and expanding it to a 3x4, reducing the total GD to 28 moves.
Total of 5973908667 permutations; took 8508.611 seconds to run.
I decided not to do the single BFS improvement like I did in step 2 of the above procedure because I didn't think it would give any significant improvement and it would be one less algorithm for the others to verify (unless someone has already checked my code).
Also I had to abandon using an int[][][] table to reduce memory usage (I'm not writing data to any files so everything has to run in Java heap space).
I might try reducing it to 27 moves in the future.

UPDATE: So far this algorithm cannot further reduce the 4x4 STM GD; I will add new transformations (ex. rotations, reflections) or add something completely different in the future.
Last edited:


Nov 21, 2019
5x5 STM GD is indeed at most 53
I improved the 0x0->2x3 and the 2x3->3x4 steps differently: instead of solving translated regions of each scramble (which requires testing over all permutations of where the non-3x4 pieces can go in a 5x5), I added 1 prefix move before running the block-building to lower the solution length, so that I only have to consider pieces in the 3x4 region. A total of 5,973,908,667 scrambles took 22581.416 seconds (including the BFS tree building time) to process. See https://github.com/coolcomputery/Loopover-Brute-Force-and-Improvement for more detail.

UPDATE: 6x6 STM is at most 114 111
Naively the GD of these two phases is 30. Using the same improvement algorithm as above for the 5x5 brings this down to 27 moves (133,268,049 scrambles processed in 240.018 seconds, including BFS tree building time). So far I can't improve this further.
(2020-3-14) The GD of 4x5->5x5 and 5x5->6x6 was reduced from 43 to 40. 161364 scrambles were processed in 1.716 seconds., along with 3.365 seconds of preprocessing. The BFS trees took 84.170 seconds to build. So far I can't improve this further.
(2020-3-15) UPDATE: 6x6 STM GD is at most 103.
The GD of building 3x3->4x4 is 21. 4,475,671,200 scrambles took 12029.851 seconds to calculate.
Last edited:


Nov 21, 2019
4x4 STM GD is at most 22
I added row reflection, column reflection, and transposition (flipping along the main diagonal) to my previous algorithm on improving the 0x0->2x3 and 2x3->4x4 block-building.
A total of 1647318090782 scrambles that required >=23 moves when the block-building was applied naively took about 17 days to process (I had to interrupt the search two times to restart my computer but I made sure it printed to stdout how far it was in the search so that I could resume it). Adding the transposition was very important as previous search attempts without it failed within the first hour.
See https://github.com/coolcomputery/Loopover-Brute-Force-and-Improvement for more detail. The version of the algorithm I used for this search should be a much faster version than the one I used for proving that 4x4 STM GD is at most 23.


Dec 24, 2015
GN for \( 2\times n \) in STM is at most \( n(\lfloor\log_2n\rfloor+20)-12\lfloor\log_2n\rfloor \).

Algorithm for 2×n implemented in Python, together with an analysis of the algorithm in the embedded comments.

(Reminder: I'm using "matrix notation" for dimensions here, so this means two rows and n columns; a width of n cells and a height of 2.)

The gist of the algorithm is to shift all the pieces that belong to the left half of the board to the top row first (this takes \( O(n) \) moves), then move the right half of the top row into the left half of the bottom row (also \( O(n) \) moves); repeat this on the left and on the right sub-boards and eventually it will be almost solved. Sorting within a column takes at most 1 move, so finishing the solve takes \( O(n) \) additional moves. This gives a bound of \( O(n\log n) \).

The top row is fixed throughout, and only the bottom row is used for moving pieces between different columns; by making use of the same bottom-row movement for the different sub-boards simultaneously, the constant factor in the \( O(n\log n) \) bound can be made as small as \( 1/\log2 \).

More generally, for any fixed \( R \), GN for \( R\times n \) Loopover is \( O(n\log n) \). The hidden constant here depends on \( R \), so this is not a uniform improvement over my earlier result of \( RC\log^2(RC) \) with a hidden constant independent of \( R,C \). I haven't analysed this as carefully as the \( 2\times n \) case, so it might be incorrect. The algorithm follows the same idea:

Bring \( n \) of the pieces that belong in the left half to the top row, then transfer the right half of the top row to the left half of the second row.
Bring \( n \) of the pieces not in the already-done section that belong in the left half to the third row, then transfer the right half of the third row to the left half of the fourth row.
Do the same thing with the 5th/6th rows, 7th/8th rows, etc. until the left half has only pieces that belong in the left half. (I think this takes \( O(R^3n) \) moves in total. Also, the last row needs special handling if \( R \) is odd.)
If \( R \) is odd, make sure the left half and the right half both have even parity. (This takes at most 7 moves.)
Recurse into the subboards until the board is sorted into columns.
Make use of Shell sort within the columns to decompose the resulting permutation into \( O(\log^2R) \) uniform-displacement involutions and finish the solve in \( O(Rn\log^2R) \) more moves.


The above construction for \( 2\times n \) can be adapted to applying any even permutation to a \( 2\times n \) section of a board of any size, still in \( O(n\log n) \) moves (although the hidden constant will necessarily be a lot bigger). The key adjustment we need to make is to replace any part of the algorithm that needs a column move (a 2-cycle) with a 3-cycle of the same effect.

For example, to insert a left-belonging piece from the bottom row to the top row, we split into these two cases:
If the piece in the top row one column to the right is also a left-belonging piece, do an anticlockwise cycle with the two pieces in the current column and that piece.
Otherwise, the piece in the top row one column to the right is a right-belonging piece; do a clockwise cycle instead.
(Plus two more cases if we're at the last column, where we do the mirror.)

This leads to only a constant-factor increase in the number of moves at worst.

An additional step we need to make in the algorithm is to ensure that the parities of the left and the right halves are both even. This can be done by swapping the two pieces in the rightmost column of the left half and the two pieces in the leftmost column of the right half, which takes 8 moves. This leads to \( O(n) \) additional moves.

(We're now restricted to doing only even permutations because the net movement of the rows and columns is forced to be zero.)

I initially thought this could be used as a building block for an \( O(RC\log^2\min(R,C)) \) algorithm but I made a mistake somewhere and I can't repair the argument. My idea was to use a variant of odd-even merge sort or bitonic sort to sort the board by rows, but it runs into the problem that rows that are far apart might need \( \Theta(RC) \) setup moves to bring them together first. (Relevant reading: this wall of text I wrote for DS&A homework that I'm pretty sure the TA never read…)


On another note, new lower bound for solving LSLL (last side last layer, i.e. last row + last column), moving only the last row and the last column. This is isomorphic to rotational graph puzzles of type (1+, 1, 1+) on Jaap's Puzzle Page.

(In group theory terms, this is a bound on the word metric on \( \langle(1,2,\cdots,m),(m,m+1,\cdots,m+n-1)\rangle \). It's possible to phrase everything about GN as a word metric problem, but the general case is messy to describe and we gain nothing. We also gain nothing here (lol) but at least it's easy to describe. Note that group theory is often more concerned with the word metric of finitely-generated infinite groups, where the choice of generators doesn't matter much, while this post is more concerned with asymptotic bounds on the word metrics of a family of finite groups wrt specific generating sets.)

First, the counting argument bound. There are at most four possible choices of move to make (up, down, left, right) and either \( (R+C-1)! \) or \( (R+C-1)!/2 \) states, so this gives a lower bound of \( \log_4((R+C-1)!/2)=\Omega((R+C)\log(R+C))\equiv\Omega(\max(R,C)\log\max(R,C)) \). The constant factor can be tightened by counting canonical move sequences more carefully (see e.g. posts #6 and #8 in this thread).

The new result: GN for 2-gen LSLL is \( \Omega\left(\max\left(\frac{R^2}C,\frac{C^2}R\right)\right) \) as \( R+C\to\infty \). Note that this bound is worse than the counting argument bound if \( R\ll C\ll R \) (i.e. the aspect ratio is bounded), but is better than the counting argument bound if we fix \( R \) and let \( C \) increase.

The first thing we need to do is to change perspective: instead of moving the bottom row, we imagine that we keep the bottom row fixed and move all the other rows instead. If this:
. . . . . a
. . . . . b
. . . . . c
d e f g h i
is a solved state, then so are all of these:
. . . . . a
. . . . . b
. . . . . c
d e f g h i

. . . . a .
. . . . b .
. . . . c .
e f g h i d

. . . a . .
. . . b . .
. . . c . .
f g h i d e

. . a . . .
. . b . . .
. . c . . .
g h i d e f

. a . . . .
. b . . . .
. c . . . .
h i d e f g

a . . . . .
b . . . . .
c . . . . .
i d e f g h
(This is only a matter of changing perspective to make analysis easier; on the actual Loopover board, we're still making moves on only the bottom row while leaving the other rows untouched.)

Now consider the sum of horizontal distances (SOHD), keeping the left-right wrapping in mind and ignoring all pieces that don't belong to LSLL. Given any LSLL state, we can compute its SOHD from any one of the solved states, and that will give a lower bound on the number of moves needed to bring it to that solved state. To get a correct lower bound for the underlying Loopover board, we need to use the minimum of the SOHD over all solved states. As an example, consider the following LSLL state:
. . . . . i
. . . . . c
. . . . . e
a b d f g h
The SOHD to the six solved states (listed above) are 10, 17, 20, 17, 10 and 7 respectively, so the minimal SOHD to solved is 7. (I calculated this by hand; it might not be correct, but you get the idea.)

Each horizontal move affects the horizontal distance of \( R-1 \) pieces by at most 1] each, so the SOHD changes by at most \( R-1 \) each move. This gives a lower bound of \( \lceil\text{minimal SOHD}/(R-1)\rceil \) horizontal moves needed. In the above example, this is \( \lceil7/3\rceil=3 \) moves.

How large can the minimal SOHD be? Consider the reversal permutation on the bottom row, with the other pieces in the column solved. (This might not be a solvable state due to parity if both \( R \) and \( C \) are odd.) The SOHD for the row's pieces (i.e. ignoring the column) to any of the solved states is:
  • \( \frac{(C-1)(C+1)}4 \) if \( C \) is odd. (The horizontal distances will be some arrangement of 0 and two copies of \( 1,\cdots,(C-1)/2 \). Proof: exercise.)
  • \( \frac{C^2}4 \) if \( C\equiv0\pmod4 \).
  • \( \frac{(C-2)(C+2)}4 \) if \( C\equiv2\pmod4 \).

When \( C=2 \), then \( C^2/R\to0 \) as \( R+C\to\infty \), so it is "trivially" true that we have a lower bound of \( \Omega(C^2/R) \).

Otherwise, when \( C\ge3 \), the SOHD for the row's pieces is \( \Theta(C^2) \). The reversal permutation may not lead to a solvable state, but swapping any two adjacent pieces to fix parity changes the SOHD by at most 2, which is still \( \Theta(C^2) \). (Exercise: As written, this is not an airtight argument. I'm lazy, but maybe you're not; work out the details.) Since \( R=\Theta(R-1) \), we have a lower bound of \( \Omega(C^2/R) \) here as well.

By transposing the above argument, we get a lower bound of \( \Omega(R^2/C) \); these can be combined to get \( \Omega\left(\max\left(\frac{R^2}C,\frac{C^2}R\right)\right) \).

An interesting consequence of this lower bound is that when solving LSLL on a \( 2\times n \) board, restricting yourself to not use the other columns means you have to do \( \Omega(n^2) \) moves in the worst case, while using the other columns lets you solve it in \( O(n\log n) \) moves.
Last edited:


Dec 24, 2015
First explicit example of a 5×5 state that requires 22 moves to solve:

(as a zero-indexed list)
18, 19, 15, 16, 17,
23, 12, 20, 21, 22,
3,  4,  24, 1,  2,
8,  9,  5,  6,  7,
13, 14, 10, 11, 0,
Optimal solution (SiGN-like notation): 2U2' 3U' 4U' 5U' 1L' 2L' 3L' 4L' 5L2' 5U' 4L' 3U' 4U' 3L' 2U' 1L' 2L' 1U' 5L' 4U' (22 STM)

It was already known that 5×5 STM GN ≥ 22, but only via "nonconstructive" counting arguments. Finite counting arguments are constructive only in a weak sense: they tell you there are some 22-move states out there, so you could find them via brute force in principle, but they don't tell you how to find them quickly.

This was found by using a slightly enhanced version of the Manhattan distance heuristic. With the usual MD heuristic, we compute the sum of each tile's MD and then divide by 5, since that's the maximum change to the MD, and finally round up to the nearest integer. We can squeeze more precision out of this approach by upgrading to the walking distance heuristic (see spoilerbox in post #6 and the link within), but WD seemed a bit troublesome to implement performantly. After some thinking I came up with this compromise that I'll call the enhanced MD (EMD) heuristic.

In EMD, we don't immediately sum up each tile's distance; instead, we compute the number of tiles that are in the correct column, the number that need to move one cell left, the number that need to move two cells left, the number that need to move three cells left (= two cells right), and the number that need to move four cells left (= one cell right). This gives a 5-tuple \( (x_0,x_1,x_2,x_3,x_4) \) adding up to 25 (since there are 25 tiles), with the additional constraint that the weighted sum \( \sum_{i=0}^4ix_i \) is divisible by 5 (since each move has zero change to this sum mod 5). There are 4751 such 5-tuples, and we can do a breadth-first search among these 5-tuples.

total states: 4751
distance 0: 1 nodes / 1 weighted
distance 1: 2 nodes / 106260 weighted
distance 2: 17 nodes / 1046003800 weighted
distance 3: 101 nodes / 1092278370160 weighted
distance 4: 385 nodes / 260012086280560 weighted
distance 5: 715 nodes / 6047479429371242 weighted
distance 6: 1069 nodes / 25764503650691770 weighted
distance 7: 1083 nodes / 23320404112024440 weighted
distance 8: 804 nodes / 4110971424442750 weighted
distance 9: 440 nodes / 100044933950310 weighted
distance 10: 130 nodes / 135814034332 weighted
distance 11: 4 nodes / 115000 weighted
average distance (weighted): 6.423998
It is a priori not obvious whether every one of these 5-tuples is actually possible on a real Loopover board. I'm guessing that's the case, but I have no proof. It's also not relevant for our current purposes: if there's an "impossible" tuple, that would make EMD weaker, which retains admissibility.

The weighting here is approximate, with each 5-tuple \( (a,b,c,d,e) \) being weighted by \( \frac{25!}{a!b!c!d!e!} \). This is definitely not the exact weight (which is hard to compute), but it's probably close enough. I'll try a Monte Carlo approximation later on to see how accurate this approximation is.

Edit (2020-05-23): Actual distribution, from 10^8 samples:
total samples: 100000000
distance 0: 0 samples
distance 1: 0 samples
distance 2: 7 samples
distance 3: 2662 samples
distance 4: 503231 samples
distance 5: 10467480 samples
distance 6: 42636133 samples
distance 7: 38886264 samples
distance 8: 7297714 samples
distance 9: 206104 samples
distance 10: 405 samples
distance 11: 0 samples
average distance (weighted): 6.426197
The tails of the distribution are underestimated with the multinomial approximation, but distances 5 to 8 are fairly close and the weighted averages obtained with sampling and with the multinomial approximation are very close.

The original MD heuristic has an average of about 6.4 and a maximum of 10 (per axis). While EMD prunes only very slightly further on average, the maximum is one higher. The four distance-11 5-tuples are:
(0, 1, 3, 21, 0)
(0, 0, 21, 3, 1)
(0, 2, 1, 22, 0)
(0, 0, 22, 1, 2)

The board given at the start of this post was constructed to have (0,0,22,1,2) both horizontally and vertically, which means that it needs at least 11 horizontal moves and 11 vertical moves to be solved, for a total of 22 moves. This was done by starting with the solved board and shifting everything two cells down and two cells right, then applying a 3-cycle along the diagonal. Feeding this to an optimal solver, it turns out that optimal is indeed 22 moves.


Note that MD ≤ EMD ≤ WD always holds. Just as EMD is a refinement of MD, WD can be thought of as a refinement of EMD. Compared to EMD, it's harder to start from a WD pattern and manually construct an example state that has that specific WD pattern, which was also part of why I decided to implement EMD first.

While WD is still feasible to implement for 5×5, I don't think it's feasible for 6×6. On the other hand, EMD is feasible up to maybe 8×8. But that said, EMD also provides very little improvement over MD on average. Memory requirement for MD is basically nothing (you can speed it up with a lookup table, but that's optional); memory requirement for EMD is roughly n^(2n); memory requirement for WD is… hard to estimate, but likely around n^(n^2).

The analogue of EMD on 15-puzzle is just the same thing as the original MD heuristic, since on 15-puzzle we're only ever moving one piece at a time and there's no looping around the sides.


I'm running my optimal solver on the transposed state
and after a few hours it's finished exhausting depth 21, so this must also take at least 22 moves. It will probably take over a day to exhaust depth 22 (if there are no depth-22 solutions). My naïve expectation is that this is either an antipode or close to it, considering that on 4×4, the transposed state is believed to be an antipode. (Thanks to symmetry, we do know it must be a "local maximum": applying any one move must reduce its depth.)
Last edited: