Avg of 100 discussion

keemy · Oct 8, 2010

To sum of the following message how should an avg of 100 be defined (or any avg of n) in such a way that will be consistent throughout the community and in timers? (for more details read quoted text)

jfly said:
Hey guys,

I would like to take this opportunity to formally apologize for the day I wrote the piece of cct that computes what I deemed to call a "session average". This definition is flawed (see explanation below). I would like to take this opportunity to atone for this grave mistake by asking you guys to come up with a new definition for me. I've spent (wasted) a not inconsiderable amount of my (and Leyan, Andy, and Patricia's) time researching this. To hopefully inspire new ideas and introduce you to the common pitfalls of a potential definition of "session average", what follows are the various ideas people have had and the flaws associated with them. If we can agree on a good statistic to use when talking about large numbers of times, then I'm sure that between Michael and I, we can force the whole cubing community to start using it. As thing stands right now, I have no idea what someone means when they say they got a XX.YY average of 100.

If you didn't already know, cct defines a session average as the average of all finite (non-dnf) times in your session. This encourages users to DNF times slower than whatever average they're shooting for! I've done this myself. When I was just barely sub15, I distinctly remember DNFing times >17 as I neared 100 solves in order to keep the session average under 15. SO BAD.

qqTimer has a session ave and a session mean. A qqTmer session *mean* is a cct session average. This is flawed exactly as cct's session average is. A qqTimer session average is a trimmed average of all the times in your session. This means your average will quickly turn into a DNF if you are used to the pampered world of cct. The only possible flaw with this statistic is that it is just *too* harsh. It may be reasonable to only allow 1 DNF in 100 solves. But the same restriction is ridiculous if you want to talk about ave 1000 (which qqTimer *does* support). Or even 10,000 solves. At some point, the number of DNFs we allow just has to scale with the number of solves.

This thought of scaling the number of trimmed solves with the number of solves is very appealing to me. To state this precisely, I want a function trimmed(n) = the # of trimmed solves out of n solves. Obviously, trimmed(n) is always even. The WCA has already defined trimmed(5) = 2, and by convention, trimmed(12) = 2. qqTimer's session average also defines trimmed(100) = 2. We don't have to fit all of these data points, although it would be nice to preserve the definition of Ra 5 and Ra 12.
Here are a two possible definitions of trimmed(n):

* trimmed(n) = 2. This is precisely qqTimer's session average. As previously discussed, this works, but it's just too harsh.
* trimmed(n) = 2*ceil(n/10). This gives us trimmed(5) = 2, trimmed(12) = 2, and trimmed(100) = 20. This is nice because it's easy for a human to compute (just remove the 1's place and multiply by 2). Does this just seem too damn arbitrary? Or perhaps it grows too fast?
* trimmed(n) = 2*ceil(log10(n)). This gives trimmed(5) = 2, trimmed(12) = 2, and trimmed(100) = 4. This grows a bit slower, which is nice. But it probably grows just too slow. If you do 1 million solves, you trim all of 12 solves, big whoop.

A while back, Leyan proposed the idea of looking at session medians. I jumped at the idea, because it's a well defined concept that *just works* even in the presence of infinite times. After doing a couple of averages of 100 and looking at the session median, I noticed that it was consistently a good bit (0.3-0.5) seconds better than my session average. Apparently I **** up only occasionally but when I do, the **** really hits the fan. It so happens that session median is just a special case of trimmed averages, where trimmed(n) = floor(n/2) + 1. In plain English, a session median is a session average where you trim as many times as possible until you have exactly 1 or 2 times remaining. This just seems too lenient to me.

Curious about where the trimmed average of 12 convention came from, I asked Lars Petrus. He said Jessica Fridrich and Mirek Goljan first started talking about it, and everyone (including the WCA) adopted it. Lars's suggestion for defining a session average that scales to n solves is to cut off exactly 1 of your best times for each DNF. To use the notation I've used so far, that would be trimmed(n) = max (# of DNFs in those n solves)*2. I suppose this doesn't deal with the case that >= half of your solves are DNFs, so to be terribly precise, you could do something like trimmed(n) = min(# of DNFs in those n solves, floor(n/2) + 1)). I really liked this idea at first, since it seems to just do a regular average if there are no DNFs, and if there are, it penalizes you by removing your corresponding best times. However, if you allow the user to go back and change times to DNFs, they can always force this to be a median. Or something in between. Basically, it's like letting the user choose how many solves to trim. You could no allow users to change times after they've entered them, but I don't like that from a usability standpoint. I also fear that it would train users to learn when to DNF times efficiently. Knowing when to DNF a time because it will help your average is not a speedcubing skill.

If this all comes across as just a little ridiculous an amount of effort for what is essentially an arbitrary number that doesn't *really* mean anything, then I agree with you. If you think this email has gotten comically long, then I also agree with you. I just hope you can see the appeal of having some well defined statistic without any of the flaws of what I've discussed above.

Ideas? Cubing times are a very odd set of data. Nothing I've thought of seems to match it. Sometimes you get luck and skip a step that normally takes 2 seconds (PLL skip), sometimes you **** up and don't even solve the thing (DNF), sometimes you get penalized by 2 seconds, and most of the time you are still jumping around your "average" (whatever that means) by quite a bit. Running a mile is running a mile is running a mile. Sometimes you may get disqualified or break your leg, but you never skip the last 100 meters. And I'm not sure that anyone does averages of 100 1 mile times. Maybe this would be better suited to a forum, but I am awful about checking speedsolving.com, and I'm not sure how good you guys are about it. I wanted to make sure that at least each of you saw this. Does anyone know if this has been discussed before? If so, I'd love to see the results of that discussion.

If you actually bothered to read this, then I'd love to hear your thoughts!

Thanks,
Jeremy

qqwref · Oct 8, 2010

Next time you do something like this, don't put it in a quote, because then it's impossible to respond properly without copy paste.

Anyway.

jfly said:
If we can agree on a good statistic to use when talking about large numbers of times, then I'm sure that between Michael and I, we can force the whole cubing community to start using it.

haha

jfly said:
A qqTmer session *mean* is a cct session average. This is flawed exactly as cct's session average is.

Yes. But it's really meant as a statistic that you can look at in the case where a session average gives you a DNF. I don't expect anyone to celebrate a session mean, just to use it to get a sense of how fast you are.

jfly said:
The only possible flaw with [trimming two] is that it is just *too* harsh.

Yeah. DNFs happen. So:

jfly said:
* trimmed(n) = 2*ceil(n/10). This gives us trimmed(5) = 2, trimmed(12) = 2, and trimmed(100) = 20. This is nice because it's easy for a human to compute (just remove the 1's place and multiply by 2). Does this just seem too damn arbitrary? Or perhaps it grows too fast?

I like a variation of this. 20 is a lot though. How about this: The number of trimmed solves is (n/10) rounded up to the nearest even integer. This remains the same for 5 and 12, but above that it is roughly 10%. So for an avg100, we trim a total of 10 solves; for an avg1000 we trim a total of 100. I think this would be a good compromise, as it allows you to get a reasonable number of DNFs (up to 1 every 20 solves) but doesn't get rid of too many solves.

jfly said:
A while back, Leyan proposed the idea of looking at session medians.

Yeah, the problem is that the median doesn't account for the shape of your distribution - you can't tell whether someone tends to get almost no very fast times, or very many, or what. It's just the 50th percentile and that is boring. Also, of course, you have no incentive to try if a solve feels slow.

jfly said:
Knowing when to DNF a time because it will help your average is not a speedcubing skill.

I completely agree. An average of N should never improve when you change a time to a DNF.

dimwmuni · Oct 8, 2010

qqwref said:
I like a variation of this. 20 is a lot though. How about this: The number of trimmed solves is (n/10) rounded up to the nearest even integer. This remains the same for 5 and 12, but above that it is roughly 10%. So for an avg100, we trim a total of 10 solves; for an avg1000 we trim a total of 100. I think this would be a good compromise, as it allows you to get a reasonable number of DNFs (up to 1 every 20 solves) but doesn't get rid of too many solves.

I think if we want an equation to govern how many solves to trim for any value of n we should decide how many we would trim on an average of 100 arbitrarily and then create an equation that fits. To get even more accurate we could just arbitrarily assign more data points to get a function that fits what exactly we want/expect for certain averages (i. e. average of 500, average of 1000 etc.) and gives a basis for deciding other averages (i. e. average of 18, average of 53 etc.).

StachuK1992 · Oct 8, 2010

Something that I did maybe a year ago:

Do five 'average(s) of five' and ofc take the mean of the middle three for each one.

Take those 'averages' and take the average of 5 of THOSE times.
Essentially an
average of five averages of five

Just a thought.

Joker · Oct 8, 2010

I'll read all that text later.
But when I do avg 100, I just do 8 averages of 10 of 12 (meaning I do 8 sets of 10/12 averages)
Then I throw in 4 extra solves (because 12*8 = only 96)
Sinced I dropped the best 8 and the worst 8 for the avg 12, I just find the average of the remaining 84 solves.
In other words, I do 84/100, dropping the best 8 and the worst 8 (considering they are in one of the 8 sets of avgs of 12, if they are in the 4 extra solves, I don't drop them.).
I don't know if that made sense, but whatever.

cyoubx · Oct 8, 2010

I almost don't even see the point of caring about skips and lucky solves anymore.

Std dev is definitely important, as well as outliers, but I quite honestly think that solves with skipped steps are as valid as nonlucky solves. In reality, we've tried to simplify the cube, but in it's most basic forms, a cube is either solved, or unsolved. If a person skips PLL, it shouldn't matter, because the steps that we use are arbitrarily used. Their used to make the puzzle systematic, but if you think about it, it shouldn't matter.

Example: If I used Fridrich and skipped F2L, the solve would essentially be deemed "lucky" not because I skipped a step, but ULTIMATELY because I skipped a few MOVES.
Following that same logic, should we be discounting all Petrus solves, since they use less moves than Fridrich? Absolutely not!

I think it's best to just see cubing as a process to restore the scrambled cube to its original state.
Sorry if that didn't make that much sense.

EDIT: Obviously, pops will effect your time, but things like skips should not be considered as anything unique.

Rpotts · Oct 8, 2010

StachuK1992 said:
Something that I did maybe a year ago:

Do five 'average(s) of five' and ofc take the mean of the middle three for each one.

Take those 'averages' and take the average of 5 of THOSE times.
Essentially an
average of five averages of five

Just a thought.

An interesting thought but I would probably prefer to just take a strait avg12+ then bother to do that.

Joker said:
I'll read all that text later.
But when I do avg 100, I just do 8 averages of 10 of 12 (meaning I do 8 sets of 10/12 averages)
Then I throw in 4 extra solves (because 12*8 = only 96)
Sinced I dropped the best 8 and the worst 8 for the avg 12, I just find the average of the remaining 84 solves.
In other words, I do 84/100, dropping the best 8 and the worst 8 (considering they are in one of the 8 sets of avgs of 12, if they are in the 4 extra solves, I don't drop them.).
I don't know if that made sense, but whatever.

It did make sense. But it kind of defeats the purpose of doing an avg100 if your dropping 16% of the solves.

Just take an average and don't DNF yourself; this is not possible if your cube pops often.

Joker · Oct 8, 2010

My cube pops often.
But other than that it's a good cube.
And yeah, dropping 16% of the solves might defeat the purpose...but when I do an avg 100, I usually get alot more than 1 really good/lucky solve and one really bad/poppy/locky solve.
I probably will change it to drop only the top and bottom 5 rather than 8 if I get back into cubing.
Cyoubx: valid point. But I personally still consider it lucky, because as you said, you skip moves.
I'll have to think about that.

dimwmuni · Oct 8, 2010

cyoubx said:
I almost don't even see the point of caring about skips and lucky solves anymore.

Std dev is definitely important, as well as outliers, but I quite honestly think that solves with skipped steps are as valid as nonlucky solves. In reality, we've tried to simplify the cube, but in it's most basic forms, a cube is either solved, or unsolved. If a person skips PLL, it shouldn't matter, because the steps that we use are arbitrarily used. Their used to make the puzzle systematic, but if you think about it, it shouldn't matter.

Example: If I used Fridrich and skipped F2L, the solve would essentially be deemed "lucky" not because I skipped a step, but ULTIMATELY because I skipped a few MOVES.
Following that same logic, should we be discounting all Petrus solves, since they use less moves than Fridrich? Absolutely not!

I think it's best to just see cubing as a process to restore the scrambled cube to its original state.
Sorry if that didn't make that much sense.

EDIT: Obviously, pops will effect your time, but things like skips should not be considered as anything unique.

I agree, skips are lucky but we shouldn't count them because you could just say that you coincidently used an alg of a method (i can't remember if it has a name) and you solved both OLL and PLL in one alg, so it looked like you got a PLL skip but you didn't you did full step whatever the name of the method is I can't remember (something like OPLL or COPLL)

That's how I look at skips at least.
(If that's confusing or stupid please tell me)

qqwref · Oct 8, 2010

If you want to know your overall average, how many skips you get does indeed matter. You shouldn't just ignore solves that turn out to be good by luck, any more than you should ignore solves that turn out to be bad by luck. This is why we should trim a constant proportion of solves (for averages longer than avg12) rather than trying to discount skips or worry about how outliers affect your time. If more than 5% of your times are extremely bad or extremely good, I think it says something about your technique, not about your luck.

DavidWoner · Oct 8, 2010

StachuK1992 said:
Something that I did maybe a year ago:

Do five 'average(s) of five' and ofc take the mean of the middle three for each one.

Take those 'averages' and take the average of 5 of THOSE times.
Essentially an
average of five averages of five

Just a thought.

I don't like the idea of trimming 40% of all my solves.

Dene · Oct 8, 2010

cyoubx said:
Std dev is definitely important, as well as outliers

I disagree with both of these points.
Having a low standard deviation doesn't show anything more than consistency, which is not important in cubing.
Outliers aren't important, because as long as you average low enough (sub14ish) then unless you get an obscenely lucky solve you are not going to have low outliers. However you are always going to have high outliers unless you get on an extremely good roll because everyone screws up. All it says is that you screwed up. It is in no way a reflection of your cubing ability.

As for the issue at hand, I have to say I think jfly cares way too much about something that doesn't really matter. A good cuber is not going to come in bragging about their latest awesome avg100 where they manipulated their DNFs to get a better overall average. Just cut off the best time for every DNF. There shouldn't be more than a few in there.

Stefan · Oct 8, 2010

I've been thinking about this as well, mostly because I'd like to see average-of-100 equivalents for stuff other than 3x3x3 speed. For 7x7x7 or gigaminx or one-handed megaminx, a 100 solves session is unreasonable. And having standard average formats for higher numbers would be good for this.

I like the 10% or 20% suggestions, that's about what I've been thinking of. And if that grows too quickly and log grows too slowly, square root might be nice.

Not sure Jeremy's occasional disasters are the only reason for his median consistently being better than his mean. Should they be the same? Ideally? Surely if your distribution is symmetric, like a perfect bell curve. But is cubing really like that? When you average about 60 seconds, you standard deviation is probably higher than when you average 20 seconds. In other words, the times are spread around further. Could this be in the nature of the thing, that the density increases at faster times? That would be a skewed distribution for a reason other than occasionally screwing up. And when I do an average of 100, I have periods where I feel great and get many good times in a row, and periods when I feel tired and get many bad times in a row. Not extremely good or bad ones, but like around 15 seconds for a while and then around 17 seconds for a while. So in a way, I'm a "slow cuber" part of the time and a "fast cuber" another part of the time. While I'm fast, my standard variation might be smaller and thus lead to a higher density at faster times.

Cyrus C. · Oct 8, 2010

DavidWoner said:
I don't like the idea of trimming 40% of all my solves.

Avg of 5

I've never really had a problem with this (since Nebraska I finish all my pops). I like Stachu's idea though, Average of Average of 5. Using just AO5 obviously has it's limits though. To make it easier, you could split the average into fifths, average the fifths, then average the average of the fifths. In really large averages though, you still may have the DNF problem though.

keemy · Oct 9, 2010

Stefan said:
Not sure Jeremy's occasional disasters are the only reason for his median consistently being better than his mean. Should they be the same? Ideally? Surely if your distribution is symmetric, like a perfect bell curve. But is cubing really like that? When you average about 60 seconds, you standard deviation is probably higher than when you average 20 seconds. In other words, the times are spread around further. Could this be in the nature of the thing, that the density increases at faster times? That would be a skewed distribution for a reason other than occasionally screwing up.

leyan gave a pretty large set of data that suggests that cubing is not a perfect bell curve (was already hinted at by jfly saying his median was faster than his mean) and that in fact the distribution is skew to the right (I have noticed this before in my solves and it should be obvious it's a lot more common to get a 12.xx rather than a 7.xx second solve it you average 10).

Quoted from leyan (qq asked me not to use quotes for things like this so just using a spoiler):

I took a look at the 1159 solves I had on TNT and have come to the conclusion that there is a tiny advantage to the median. Here are the results (pardon my slowness. Also, I didn't get any DNFs):

* Number of solves: 1159
* Mean: 13.84s
* Median: 13.75s
* Mode: 13.52s
* Standard Deviation: 1.55s
* Min: 9.46s
* Max: 21.21s

* Times within 1 stdev of Mean: 70.15%
* Times within 1 stdev of Median: 70.06%
* Times within 1 stdev of Mode: 70.06%

* Times within 0.5s of Mean: 25.88%
* Times within 0.5s of Median: 26.49%
* Times within 0.5s of Mode: 27.09%

* Times within 0.25s of Mean: 13.37%
* Times within 0.25s of Median: 14.93%
* Times within 0.25s of Mode: 14.41%

I would say that the median is better statistic than the mean because more data falls within a 1s window and a 0.5s window around the median than the mean. 2.33% more solves falls within a 1s window of the median than the mean, and 11.61% more solves falls within a 0.5s window of the median than the mean.

In conclusion, if our goal is to measure where most of the solves will fall, I would like to propose that the median is a better measurement than the mean.

Histograms!

Similar threads	Forum	Replies	Date
Brendyn Dunagan 3.01 Avg DNF out of WR :(	General Speedcubing Discussion	19	Apr 8, 2024
[WR] Failed 3.06 wr clock avg :c (carter thomas)	WR/CR/NR solves	14	Apr 6, 2024
[NR] Ziyu Wu (吴子钰) - 5x5 44.16 Avg, 40.69 Single - 6x6 1:21.14 Mean - 7x7 1:54.58 Mean - China NRs	WR/CR/NR solves	1	Mar 30, 2024
[Unofficial] Yiheng Wang 3.79 3×3 WB avg (+YTWB)	Puzzle Video Gallery	20	Mar 14, 2024
[WR] Zayn Khanani 0.92 avg	WR/CR/NR solves	58	Mar 9, 2024

Avg of 100 discussion

keemy

Member

qqwref

Member

dimwmuni

Member

StachuK1992

statue

Joker

Member

cyoubx

Member

Rpotts

Member

Joker

Member

dimwmuni

Member

qqwref

Member

DavidWoner

The Punchmaster

Dene

Premium Member

Stefan

Member

Cyrus C.

Member

keemy

Member

Forum statistics

Share this page