# "How fast should I solve 5x5?" - Time across WCA events

#### SlowerCuber

##### Member
Hi there,
Occasionally I see questions like "I'm sub-XX in 3x3, how fast should I solve a 5x5?". Therefore I did some simple statistics of people's time in WCA competitions and tried to find the correlation. Here are some examples: https://github.com/slower-cuber/wca-time/tree/master/plots

I have a video regarding how to read those plots, and my thoughts on the questions like "How fast should I solve XXX". Hope you enjoy

This is my first post here, please tell me if I'm doing anything inappropriate or violating some rules

Regards,
Slower Cuber

#### Kit Clement

Haven't watched the video yet, but note that least squares linear regression and Pearson's correlation are not valid measures for heteroskedastic data like this. The correlation will be more heavily weighted by the larger times, thus making the predictions from the linear regression line very inaccurate for the faster times in your data.

#### qwr

##### Member
Haven't watched the video yet, but note that least squares linear regression and Pearson's correlation are not valid measures for heteroskedastic data like this. The correlation will be more heavily weighted by the larger times, thus making the predictions from the linear regression line very inaccurate for the faster times in your data.
it's true, but the relationship does appear to be linear judging by eye. so I think it is a reasonable approximation.

#### Kit Clement

it's true, but the relationship does appear to be linear judging by eye. so I think it is a reasonable approximation.

Linearity is only one assumption when fitting data using least squares. The approximation of the line is not valid when the criteria used to determine the line assume constant variability across all values of x.

#### qwr

##### Member
Linearity is only one assumption when fitting data using least squares. The approximation of the line is not valid when the criteria used to determine the line assume constant variability across all values of x.
You can fit any kind of model, including a linear one, with least squares because it is a free country it is a general technique of emperical risk minimization using squared loss. The inference results might not be as nice and not the ones of ordinary least squares, but there are some more complicated inference results you can make.

#### Kit Clement

You can fit any kind of model, including a linear one, with least squares because it is a free country it is a general technique of emperical risk minimization using squared loss. The inference results might not be as nice and not the ones of ordinary least squares, but there are some more complicated inference results you can make.

I mean, sure, you can fit anything to whatever you want. Throw it into the black box, see what comes out. But because you're minimizing the sum of least squares, the line will fit more toward values with larger variability, because they produce the largest sum of squares values from the line. The line will often completely miss the data in sections with less variability because of that, as the minimization technique doesn't need to actually follow the signal of the data in order to minimize that quantity and will care about trying to minimize the larger squared differences at the expense of accuracy in the regions with lower variability. This effect is most pronounced in the plot shown at 1:30 in the video.

#### qwr

##### Member
ok sure. we could use weighted least squares to assign higher times more variance.
I think it's nice that any kind of statistics was tried at all

#### xyzzy

##### Member
This is a bit off-topic but: thank you so much for including closed captions. (And even Chinese translations!)

More on topic, there's also the "2019 Cubing Time Standards", and more recent iterations of that as well with slightly different formulas. The 2019 CTS is fundamentally centred on single rankings, with the single standards being directly based on percentiles and the average standards being calculated from the the averages of people at certain percentiles when ranked by singles. (Sounds wtf-y, but this is to account for people who have singles but not averages.) This more recent thing just straight up uses a fixed multiplier against top-ranked results, which is more in the spirit of Kinchranks.

I had this idea once upon a time (then didn't write the code to actually do it): I wanted to sort everyone by some variation of Kinchranks, so there's a global notion of skill across all events (*), then compute averages across deciles/percentiles/whatever to get a time standard there. Of course, you have to deal with how most people haven't done all 17 events (note: there were 18 events when I first thought about this), so there has to be some way of "filling in the blanks".

(*) Does it make sense to include the BLD events with the sighted events? And even FMC? Probably not, but this also ties into my crazy idea for ranking reform that I teased but never finished writing up. The attention deficiency is real.