• Welcome to the Speedsolving.com, home of the web's largest puzzle community!
    You are currently viewing our forum as a guest which gives you limited access to join discussions and access our other features.

    Registration is fast, simple and absolutely free so please, join our community of 40,000+ people from around the world today!

    If you are already a member, simply login to hide this message and begin participating in the community!

"How fast should I solve 5x5?" - Time across WCA events

SlowerCuber

Member
Joined
Mar 11, 2021
Messages
12
YouTube
Visit Channel
Hi there,
Occasionally I see questions like "I'm sub-XX in 3x3, how fast should I solve a 5x5?". Therefore I did some simple statistics of people's time in WCA competitions and tried to find the correlation. Here are some examples: https://github.com/slower-cuber/wca-time/tree/master/plots

3oh-3.png7-6.png
I have a video regarding how to read those plots, and my thoughts on the questions like "How fast should I solve XXX". Hope you enjoy


This is my first post here, please tell me if I'm doing anything inappropriate or violating some rules

Regards,
Slower Cuber
 

Kit Clement

Premium Member
Joined
Aug 25, 2008
Messages
1,610
Location
Portland, OR
WCA
2008CLEM01
YouTube
Visit Channel
Haven't watched the video yet, but note that least squares linear regression and Pearson's correlation are not valid measures for heteroskedastic data like this. The correlation will be more heavily weighted by the larger times, thus making the predictions from the linear regression line very inaccurate for the faster times in your data.
 

qwr

Member
Joined
Jul 24, 2019
Messages
2,615
YouTube
Visit Channel
Haven't watched the video yet, but note that least squares linear regression and Pearson's correlation are not valid measures for heteroskedastic data like this. The correlation will be more heavily weighted by the larger times, thus making the predictions from the linear regression line very inaccurate for the faster times in your data.
it's true, but the relationship does appear to be linear judging by eye. so I think it is a reasonable approximation.
 

Kit Clement

Premium Member
Joined
Aug 25, 2008
Messages
1,610
Location
Portland, OR
WCA
2008CLEM01
YouTube
Visit Channel
it's true, but the relationship does appear to be linear judging by eye. so I think it is a reasonable approximation.

Linearity is only one assumption when fitting data using least squares. The approximation of the line is not valid when the criteria used to determine the line assume constant variability across all values of x.
 

qwr

Member
Joined
Jul 24, 2019
Messages
2,615
YouTube
Visit Channel
Linearity is only one assumption when fitting data using least squares. The approximation of the line is not valid when the criteria used to determine the line assume constant variability across all values of x.
You can fit any kind of model, including a linear one, with least squares because it is a free country it is a general technique of emperical risk minimization using squared loss. The inference results might not be as nice and not the ones of ordinary least squares, but there are some more complicated inference results you can make.
 

Kit Clement

Premium Member
Joined
Aug 25, 2008
Messages
1,610
Location
Portland, OR
WCA
2008CLEM01
YouTube
Visit Channel
You can fit any kind of model, including a linear one, with least squares because it is a free country it is a general technique of emperical risk minimization using squared loss. The inference results might not be as nice and not the ones of ordinary least squares, but there are some more complicated inference results you can make.

I mean, sure, you can fit anything to whatever you want. Throw it into the black box, see what comes out. But because you're minimizing the sum of least squares, the line will fit more toward values with larger variability, because they produce the largest sum of squares values from the line. The line will often completely miss the data in sections with less variability because of that, as the minimization technique doesn't need to actually follow the signal of the data in order to minimize that quantity and will care about trying to minimize the larger squared differences at the expense of accuracy in the regions with lower variability. This effect is most pronounced in the plot shown at 1:30 in the video.
 

xyzzy

Member
Joined
Dec 24, 2015
Messages
2,469
This is a bit off-topic but: thank you so much for including closed captions. (And even Chinese translations!)

More on topic, there's also the "2019 Cubing Time Standards", and more recent iterations of that as well with slightly different formulas. The 2019 CTS is fundamentally centred on single rankings, with the single standards being directly based on percentiles and the average standards being calculated from the the averages of people at certain percentiles when ranked by singles. (Sounds wtf-y, but this is to account for people who have singles but not averages.) This more recent thing just straight up uses a fixed multiplier against top-ranked results, which is more in the spirit of Kinchranks.

I had this idea once upon a time (then didn't write the code to actually do it): I wanted to sort everyone by some variation of Kinchranks, so there's a global notion of skill across all events (*), then compute averages across deciles/percentiles/whatever to get a time standard there. Of course, you have to deal with how most people haven't done all 17 events (note: there were 18 events when I first thought about this), so there has to be some way of "filling in the blanks".

(*) Does it make sense to include the BLD events with the sighted events? And even FMC? Probably not, but this also ties into my crazy idea for ranking reform that I teased but never finished writing up. The attention deficiency is real.
 
Top