• Welcome to the Speedsolving.com, home of the web's largest puzzle community!
    You are currently viewing our forum as a guest which gives you limited access to join discussions and access our other features.

    Registration is fast, simple and absolutely free so please, join our community of 40,000+ people from around the world today!

    If you are already a member, simply login to hide this message and begin participating in the community!

Probability Thread

Though I do not see the logic behind this we now have same numbers and I am glad that my combinatorical approach was correct. Thanks for confirming the numbers with your program.
I would argue that when a face (layer) is solved that defines the color scheme of the cube. For a second layer to be solved it must conform to that color scheme. If two parallel faces are solved wrt two different color schemes then they can't both be solved. One is solved and the other is one turn from being solved.

But this is a mere quibble. Our numbers agree. I congratulate you on performing a first principles analysis which I shied away from as "too complicated" and went with a brute force approach instead.
 
There's actually more than that. You can use FreeFOP and have 3/4 of a cross and use F moves to insert pairs, or you finish some pairs first before completing the cross
Ok then ig my question is rly hard to answer, but like using standard cfop and doing the cross normally without forcing free pairs and stuff like that
 
(i) Please don't reply to nine-year-old posts unless it's absolutely relevant and necessary.

(ii) This is almost correct, but it doesn't account for scrambles with multiple crosses already solved… or it would be, if you actually calculated your approximation correctly. Your calculation is off by a factor of two and you should've gotten 1/31680.

(iii) The exact probability is around 1/31704.
The recent post about the probability of a solved layer on a random 2x2x2 motivated me to do a paper and pencil computation for the probability for at least one cross on a random 3x3x3 which looks quite difficult on first sight if you want to include multiple crosses. I did not find anything better than the reply above in this probabilty thread.
I got 15471187/490497638400 which matches the depth 0 count for color neutral cross solving in http://www.cubezone.be/crossstudy.html (30,942,374 cases out of 980,995,276,800).
If anybody is interested I can give a relatively simple derivation here.
 
Last edited:
Incomplete results, research in progress, might make standalone thread later. But for now, more 5×5×5 scramble analysis.

We use this regularity metric: count the number of 2×1 centre bars of each colour (note: overlaps are counted multiple times), then take the maximum across all six colours. In the event of a first centre skip, this value will be 12. The legal range of values is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12}. (Note that 11 is not a possible value.)

With stratified sampling, we can obtain reasonably good estimates of the distribution of this metric with a smaller sample size. What I did here was to go through all \( \binom{24}{4}^2 \) ways of placing the blue centre pieces then picking a fixed number of random scrambles with this layout of blue centre pieces; this reduces the sampling variance by a factor of around 2 (empirically determined). This is the distribution for random-move scrambles (the ideal) with a sample size of \( 4\binom{24}{4}^2=451647504 \).
Code:
0	1076
1	1556631
2	51692008
3	190485759
4	150556101
5	46701969
6	9230481
7	1217028
8	193092
9	11410
10	1933
11	0
12	16
We actually can compute how frequently first-centre skips happen in closed form (it's roughly 1 in 18800000) so this sample slightly underrepresents centre skips. The estimated mean is 3.49321(3).

We can do the same thing with random-move scrambles and wide-first random-move scrambles. The table below also includes a new scramble type I'll explain later.

Scramble typeAverage number of centre bars
Random state3.49321(3)
40 random moves3.6701(2)
50 random moves3.5566(2)
60 random moves3.5166(2)
70 random moves3.5016(2)
80 random moves3.4965(2)
Wide random + 25 moves3.49362(13)
Wide random + 30 moves3.49355(13)
Wide random + 35 moves3.49332(13)
Wide random + 40 moves3.49309(13)
Wide random + 45 moves3.49325(13)
New 40 moves3.53268(13)
New 50 moves3.50331(13)
New 60 moves3.49571(13)
New 70 moves3.49388(13)
New 80 moves3.49358(13)

(The 3×3×3 states for the wide-first random-move scrambles were actually sampled with MCMC because I don't have code to generate random states here. The cube is initialised with a few hundred random moves, and the (n+1)th 3×3×3 state is the nth state with some random moves applied. With sample sizes on the order of 10^7, this hopefully does not impact the results. Also, the first syllable of the random-move part is restricted to never include wide moves.)

The standard errors are different because different sample sizes were used. 2e7 for pure random-move, 5e7 for the rest.

We can model centre states (× scrambler state, if relevant) as a Markov chain, and the largest eigenvalue of the transition matrix tells us the convergence rate. This matrix is too large to handle, but we can still estimate the largest eigenvalue from the above table (albeit to low precision). Normal random-move scrambling and wide-first scrambling have the same transition matrices and hence the same eigenvalues; the largest eigenvalue is approximately 0.905. The new scrambling method is naturally quantised to multiples of 10 moves so we'll report the 1/10th power of the largest eigenvalue, which is approximately 0.873. (The error bars are huge and you should not think of these figures as really having three d.p. of precision. Unfortunately I'm not sure what the correct way of estimating the sizes of the error bars is.)

One way of thinking about this is that with random-move scrambles, every additional move reduces the centre bias by a factor of 0.905, while with this new scrambling method, every additional move reduces the centre bias by a factor of 0.873. We can compare this on a log scale: \( \log(0.873)/\log(0.905)\approx1.36 \), so the new scrambling method is about 36% better! N moves done with the new method will have a scramble quality comparable to 1.36 N random moves. As a sanity check, 50 moves new method is roughly the same as 65-70 moves classic.

So what is this new method anyway? Probably easier to explain with an example scramble sequence:
 Twizzle link 
u2 d' l2 f b2 L' U D' F' B2
f l b' r u2 L2 R' F B D'
f u r l b U F' D R2 L
b u2 r' f l' L2 B F' D R2
u d' f l' b R' B' D F U'
r' d b2 f2 l U' L2 R2 B2 F

Code:
u2 d' l2 f  b2 L' U  D' F' B2
f  l  b' r  u2 L2 R' F  B  D'
f  u  r  l  b  U  F' D  R2 L
b  u2 r' f  l' L2 B  F' D  R2
u  d' f  l' b  R' B' D  F  U'
r' d  b2 f2 l  U' L2 R2 B2 F
1. Pick five random faces in a random order, do double-layer moves on them by a random nonzero amount.
2. Pick five random faces in a random order, do single-layer moves on them by a random nonzero amount.
3. Repeat.

The biggest con of this scrambling method is that the reachable state space is (much) smaller. Each section of 5 moves has \( 6!\cdot3^5=174960 \) possibilities, so with 60 moves (= 12 sections), there are \( 174960^{12}\approx8.23\cdot10^{62} \) possible scramble sequences. Due to equivalences (e.g. U D = D U), the number of distinct scrambles is somewhat smaller. The wide part has 113436 distinct possibilities, while the normal part has 113616 distinct possibilities; we thus get \( (113436\cdot113616)^6\approx4.58\cdot10^{60} \) locally canonical scramble sequences with this method. (The discrepancy between normal moves and wide moves is because of centre orientation, e.g. Rw2 Fw2 Bw2 Lw2 Dw2 = Uw2 Lw2 Fw2 Bw2 Rw2, but R2 F2 B2 L2 D2 ≠ U2 L2 F2 B2 R2.)

In contrast, the number of canonical scramble sequences of length 60 is approximately \( 1.13\cdot10^{87} \), and the total number of states on a 5×5×5 is \( 2.83\cdot10^{74} \). Like Pochmann megaminx scrambles, this new scrambling method only covers a small part of the state space (much less than 1%), but the hope is that these states are not meaningfully distinguishable from true random states (or at least, not any more so than the current random-move scrambles). It is not currently known whether this new scrambling method suffers from "accidental small subgroups" as much as random-move scrambles do (see [1]).

With sufficiently many rows of moves, this new type of scramble can reach any legal state. This can be accomplished with these algorithms that emulate F and Fw, respectively, with 4 rows of moves each:
 Twizzle link 
u2 d2 l2 r2 f2 U2 D2 L2 R2 F2
u2 d2 l2 r2 f2 U2 D2 L2 R2 F2
u2 d2 l2 r2 f2 U2 D2 L2 R2 F2
u2 d2 l2 r2 f2 U2 D2 L2 R2 F'

 Twizzle link 
f' u2 d2 l2 r2 F2 U2 D2 L2 R2
f2 u2 d2 l2 r2 F2 U2 D2 L2 R2
f2 u2 d2 l2 r2 F2 U2 D2 L2 R2
f2 u2 d2 l2 r2 F2 U2 D2 L2 R2

The stationary distribution is thus equivalent to the uniform distribution over all legal states.

The code used for the computation.

There are some cons of wide-first scrambles that we should pay attention to. Let's say, for the sake of example, we're using wide + 35 moves.

1. Inconsistent scramble sequence length. Not an issue when it comes to using online/app timers, since they often just display scrambles as run-on sequences, but TNoodle formats scramble sequences in rows of 10 moves in a monospace font, with each move padded to a fixed length. In theory, scramble length can range from 35 to 56 moves, or more practically, from 51 to 56 moves. This probably looks ugly. (yes, a very important consideration)

2. While wide-first mixes up centres very effectively, I did a separate analysis on bars anywhere on the cube (i.e. including centre-wing, wing-corner, etc.) a few years ago and if memory serves, there's a slight increase in the number of non-centre bars compared to classic random-move scrambles. Intuitively, this is probably because the initial wide scramble leaves a lot of corner-wing, wing-x and midge-t pairs around, and they only get broken up and mixed for 35 moves.

Edit/update: I actually looked at a TNoodle scramble sheet for 555 and it turns out I was wrong. It's not formatted in six rows of 10, but five rows of 12.
 
Last edited:
What are the odds of having 3 edges solved after solving the first four edges in the yau method on 5x5? This happened to me and all I had to do was the parity alg and I was done! It wasn't PB cause I screwed up earlier. (If the odds are astronomically low then I probably thought this happened but it didn't. I'm not trying to lie about this.)
 
What are the odds of having 3 edges solved after solving the first four edges in the yau method on 5x5? This happened to me and all I had to do was the parity alg and I was done! It wasn't PB cause I screwed up earlier. (If the odds are astronomically low then I probably thought this happened but it didn't. I'm not trying to lie about this.)
100%
After you solve 4 edges, you'll always have at least 3 edges solved.



(yes, I know this is not what you mean)
 
What are the odds of having 3 edges solved after solving the first four edges in the yau method on 5x5? This happened to me and all I had to do was the parity alg and I was done! It wasn't PB cause I screwed up earlier. (If the odds are astronomically low then I probably thought this happened but it didn't. I'm not trying to lie about this.)
If I didn't mess up the math, it should be 1 in 20160, which is a little lower than a LL skip. Pretty impressive, but not impossible.
 
If I didn't mess up the math, it should be 1 in 20160, which is a little lower than a LL skip. Pretty impressive, but not impossible.
Wow, that's crazy. I've probably done about 150-300 5x5 solves, as I don't practice or care for the event. Meanwhile for 3x3, it took me 2 years to get my first LL skip, and I was sub 15 when my first one occurred. It also had no AUF, and that solve was my first sub 10 on stackmat, and my first ever sub 9. Keep in mind that I had 8 OLL skips into either an H, Na, or Nb perm before the LL skip occurred, (Those PLL's and a PLL skip are tied for the rarest PLL) and the odds of that happening are 0.75^8, which is about a 1/10 chance. I guess I'm significantly more lucky on 5x5 then 3x3.
 
If I didn't mess up the math, it should be 1 in 20160, which is a little lower than a LL skip. Pretty impressive, but not impossible.
According to my calculations, this is the probability that you would get if the edges were GGGB or GGGG. However, there are five total ways that this could happen: GGGB, GGBG, GBGG, BGGG, or GGGG. Each has an equal probability, so we need to multiply 1/20160 by 5/2, which results in a 1/8064 chance, which is still quite lucky.

To understand how this works, consider the following problem: What is the probability that at least one coin in 2 fair coin flips will end up as heads? Well, if one coin came up heads, then the other doesn't matter, so it's 1/2, right? Actually, it's 3/4 because we made a mistake in fixing which coin came up as heads.
 
According to my calculations, this is the probability that you would get if the edges were GGGB or GGGG. However, there are five total ways that this could happen: GGGB, GGBG, GBGG, BGGG, or GGGG. Each has an equal probability, so we need to multiply 1/20160 by 5/2, which results in a 1/8064 chance, which is still quite lucky.

To understand how this works, consider the following problem: What is the probability that at least one coin in 2 fair coin flips will end up as heads? Well, if one coin came up heads, then the other doesn't matter, so it's 1/2, right? Actually, it's 3/4 because we made a mistake in fixing which coin came up as heads.
Oh, I missed that. Thanks for correcting!
 
Does anyone have the probability distribution for Squan CS cases? I would assume it isn't an even distribution given the symmetry of some shapes (e.g. square) but I don't know if it works the same as say 3x3 PLL for calculating probabilities.
 
My "first-order" estimate for quad CN cross is that it should be around 5.01 moves average, interpolating between the dual CN and full CN values on Lars's site.

Python:
dualcn = [53759, 806253, 8484602, 74437062, 506855983, 2031420585, 2311536662, 175751822, 3672]
fullcn = [30942374, 462820266, 4839379314, 41131207644, 239671237081, 543580917185, 151019930400, 258842496, 40]
cdf = [sum(dualcn[:k+1])/sum(dualcn) for k in range(9)] # this is exact
cdf2 = [1-(1-x)**2 for x in cdf] # computing quad CN from dual CN, assuming approximate independence
cdf3 = [1-(1-x)**3 for x in cdf] # computing full CN from dual CN, assuming approximate independence

extrapolated_quadcn_mean = sum(1-x for x in cdf2) # ~ 4.984
extrapolated_fullcn_mean = sum(1-x for x in cdf3) # ~ 4.753

# since we know the ground truth for the full CN mean, we can use that to roughly
# quantify how much the crosses on different axes fail to be independent
correction = sum(i*x for i,x in enumerate(fullcn))/sum(fullcn) - extrapolated_fullcn_mean

corrected_quadcn_mean = extrapolated_quadcn_mean + correction/2
print(corrected_quadcn_mean) # prints 5.012148701428693

Probably won't be too hard to compute a much better approximation using random sampling, or with enough CPU time, the true exact value.

---

Wrote a bit of code, spent a bit of CPU time. Quad cross with 2^27 ~ 134 million total samples:

cross colours: white, yellow, red, orange
depthsamplesproportion
0​
2826​
0.000021​
1​
42502​
0.000317​
2​
443883​
0.003307​
3​
3827135​
0.028514​
4​
24023111​
0.178986​
5​
70399118​
0.524514​
6​
35158823​
0.261954​
7​
320330​
0.002387​
8​
0​
0.000000​
Mean: 5.01942(7) moves

My estimate above was pretty close! Also, nobody asked for this, but here's the same table for triple CN (@OreKehStrah may be interested?):

cross colours: white, red, green (three mutually adjacent choices)
depthsamplesproportion
0​
2162​
0.000016​
1​
31567​
0.000235​
2​
331994​
0.002474​
3​
2887422​
0.021513​
4​
18711235​
0.139410​
5​
63047910​
0.469744​
6​
47742095​
0.355706​
7​
1463343​
0.010903​
8​
0​
0.000000​
Mean: 5.18663(7) moves

cross colours: white, yellow, red (two opposite choices + a third one)
depthsamplesproportion
0​
2068​
0.000015​
1​
31820​
0.000237​
2​
333504​
0.002485​
3​
2897351​
0.021587​
4​
18868616​
0.140582​
5​
63934436​
0.476349​
6​
46916485​
0.349555​
7​
1233448​
0.009190​
8​
0​
0.000000​
Mean: 5.17570(7) moves

NOTE: 8-move crosses are vanishingly rare when you can do at least three cross colours and my sampling just happened to miss them. They're obviously not impossible.

To put all the information together:
crossesmean%
15.81200%
2 (adjacent)5.408140.3%
2 (opposite)5.387242.4%
3 (adjacent)5.186662.4%
3 (opp + another)5.175763.5%
4 (all except adj)5.025978.4%
4 (all except opp)5.019479.1%
54.905490.4%
64.8095100%
The last column is the "percentage benefit relative to going from fixed cross to full CN", so e.g. going from fixed cross to quad CN is about 79% as good as going to full CN.
I don't know how I missed this ping by almost 3 years, but this super neat.I actually used white-red-green 3N for a while so it's cool to see the stats on it. If you happen to have (or have the time to write) a program to go through every case to get the true values, I'd be glad to run it!
 
Back
Top