Warning, dirty statistics that will probably make real statisticians cringe.
It's sufficiently large, with sufficiently being the key word. It will also never average out, it'll approach the average as the amount of repetitions approaches infinity.
100 repetitions with 20% probability of success:
Expected mean(µ) = 20
Standard deviation(σ) = 4
So really plus or minus 8 (two standard deviations) is plenty inside the bounds of the norm, so you're really looking at 12-28 out of 100 before it's out of the norm. Even then, something out of that norm still has a ~1 in 20 chance of occurring so really you shouldn't be two concerned unless you're 3 standard deviations out of expected mean which would be less than 8 or greater than 32 in 100 attempts.
That's about 6 times the variance you're expecting to see, which is a big issue with people thinking about what they should get when it comes to repetition of chance. That and 100 repetitions can't really be considered a large sample. Consider this, the range within 3 standard deviations (32-8 = 24) is 24% of your domain of 100 repetitions.
For 1000 repetitions with 20% probability:
Expected mean(µ) = 200
Standard deviation(σ) = 12.65
Low end of third deviation = 200 - (3*12.65) = ~162
High end of third deviation = 200 + (3*12.65) = ~238
Range = 238-162 = 76
% of third deviation over domain = 76/1000 = 7.6%
I guess I could have compared the expected mean to the standard deviation, same thing would have happened. As the repetitions increase, the ratio of deviation/expected decreases. This implies that as the repetitions approach infinity that the ratio will approach 0, which covers the whole with enough rolls idea reality will approach theory law that I can't remember the name of.
Finally, what most people mistake as proof that percents/rng/whatever is off is runs in successes/failures. People find it odd that they got 5 20% successes in a row or 5 80% failures in a row. Considering a large amount of repetitions, it's actually STRANGER for runs to NOT happen. I'm not going into that math, cause it makes my brain hurt to calculate.
tldr:
Reasonable variance on 100 rolls on a 20% chance is +- 12. Your reasonable variance is also covering 24% of your entire domain. Better number for a large sample before even beginning to worry about any variance would be about 10000, which would cause your reasonable variance to cover 2.4% of your domain.


Reply With Quote

