Probability Problem

hawkmp4

Member
Hi all-
So, as I'm sure many of you here are also, I am a fan of xkcd. However, I've never read all of them. When I get interested and the time, I just click 'random' and read that strip.
A probability problem came to mind while I was doing this- right now, there are 650 comics. If I were to start with a random strip, and after that view a random strip, how many clicks of the random button will it take on average to view them all?
I have honestly no clue how to approach this problem. So I coded up a small C++ program to run trials. When I did 50000 trials and averaged them, I got 4789.12 clicks. If anyone wants the source code I can post it but I think that if you have a C++ compiler on your computer you're likely to have the knowledge to do this program just as quickly.

miniGOINGS

Member
If there are 650 comics, after viewing the first one, you would have a 649/650 chance of getting one you haven't gotten before. After that you have a 648/650 and so on. So, I'm not sure where I'm going with this...

Kian

Member
Hi all-
So, as I'm sure many of you here are also, I am a fan of xkcd. However, I've never read all of them. When I get interested and the time, I just click 'random' and read that strip.
A probability problem came to mind while I was doing this- right now, there are 650 comics. If I were to start with a random strip, and after that view a random strip, how many clicks of the random button will it take on average to view them all?
I have honestly no clue how to approach this problem. So I coded up a small C++ program to run trials. When I did 50000 trials and averaged them, I got 4789.12 clicks. If anyone wants the source code I can post it but I think that if you have a C++ compiler on your computer you're likely to have the knowledge to do this program just as quickly.
Obviously you're saying there is no memory, so you can view comics more than once, but can you view the same comic twice in a row?

hawkmp4

Member
Sorry, yes. The same comic can be read twice in a row. I've had it happen.

Lucas Garron

Member
Code:
650/Range[650] // Total // N
4585.72

(Exactly 63831456166927021027897688982409523840079723655177756757385154819327249055220395337577425411943059928738366402548673763637871816237574812922515807829815912878123022246488297162498137983372957779194332711228426589279104837112919469830209420512539312536831601470985277560735016189123/13919608497009248302265225001144540266268274743416992682659057601117912620608837589733207717708687740132008672980575760393152350834153132281208077972742408454120214120753562179083476204756521752848358813656470038396901009606006304950742293374056103259950525515941400574057171200)

EDIT: That's
650/650 clicks on average to see the 1st new strip
+650/649 clicks on average to see the 2nd new strip
+650/648 clicks on average to see the 3nd new strip
...
+650/1 clicks on average to see the 650th new strip

EDIT 2: (Which is obviously 650*H650)

EDIT 3: (Which I just realized Wolfram|Alpha would handle: http://www.wolframalpha.com/input/?i=650*%28harmonic+number+650%29)

Last edited:

hawkmp4

Member
Code:
650/Range[650] // Total // N
4585.72

(Exactly 63831456166927021027897688982409523840079723655177756757385154819327249055220395337577425411943059928738366402548673763637871816237574812922515807829815912878123022246488297162498137983372957779194332711228426589279104837112919469830209420512539312536831601470985277560735016189123/13919608497009248302265225001144540266268274743416992682659057601117912620608837589733207717708687740132008672980575760393152350834153132281208077972742408454120214120753562179083476204756521752848358813656470038396901009606006304950742293374056103259950525515941400574057171200)

EDIT: That's
650/650 clicks on average to see the 1st new strip
+650/649 clicks on average to see the 2nd new strip
+650/648 clicks on average to see the 3nd new strip
...
+650/1 clicks on average to see the 650th new strip
...I was overthinking that just a little. Thanks Lucas.

qqwref

Member
Code:
650/Range[650] // Total // N
4585.72
For those of you who don't read obfuscated Mathematica, this is the sum from i = 1 to 650 of 650/i. As Lucas explained, for each i, the function is the expected number of strips you would have to ask for to get a new one, if there were exactly i of them left to see.

cmhardw

I'm proud of myself for actually figuring out how to do this without reading Lucas' or Michael's posts. I treated it as a set of nested geometric series problems. Let's take a fair die with 6 sides. I consider the probability of success to be the probability of seeing a new, previously unseen side. This probability at the start is 1. 1/p is the mean of a geometric series of probability of success p, so I have to toss a die 1 time on average to see a new side on my first toss

Now it gets interesting. I have seen one of the sides, so the probability of success (of seeing a previously unseen side) is now 5/6. 1/(5/6) is the average number of tosses required to see a new side or 6/5.

Doing this through all 6 numbers gives the same expression: 6/6 + 6/5 + 6/4 + 6/3 + 6/2 + 6/1 = 6*sum(i=1 to i=6, 1/i)

Now, my question is this though: Can this be used as a rating for how good a computer random number generator is?

For example, the average number of tosses required for seeing all 6 sides of a die is 14.7 using the expression above. If I use a computer's random number generator to see how many trials it takes to see all of the numbers from 1 to 6 (generating only numbers from 1 to 6), and write a program to do this millions of times, can this experimental average be compared to the theoretical average in some meaningful way?

Chris

qqwref

Member
There are statistical tests we can do to see how closely the stats we get approximate the assumption that the computer is truly random, but all we can get is a probability: there is no way to say for certain that something is or isn't random, without knowing the algorithm by which the numbers are generated, because a true random number generator could output any sequence.

From what I've heard there are some very sophisticated tests to check if a RNG is close to random, though, and there are only a few generators that can pass all of the tests (and of course you can never pass all tests all the time anyway).

Johannes91

Member
There are statistical tests we can do to see how closely the stats we get approximate the assumption that the computer is truly random, but all we can get is a probability: there is no way to say for certain that something is or isn't random, without knowing the algorithm by which the numbers are generated, because a true random number generator could output any sequence.

From what I've heard there are some very sophisticated tests to check if a RNG is close to random, though, and there are only a few generators that can pass all of the tests (and of course you can never pass all tests all the time anyway).
For example, Diehard tests. Some widely used algorithms perform so badly it's scary. :/

krazedkat

Member
The least amount of clicks would be 649 (since you start out viewing one) and the most would be inf. considering you could get one in a row inf times.

hawkmp4

Member
The least amount of clicks would be 649 (since you start out viewing one) and the most would be inf. considering you could get one in a row inf times.
I'm not sure how to respond to this without copious amounts of sarcasm...
*twitches*