I don't see how you cannot immediately recognize the importance of #1 when looking at the data for #2. Going from #1 to #2 we have an increase in both dex and parry yet the Parry rate for #2 is less than #2. If we believe that no thresholds were hit, they share the same mean. The parry rate for #2 is highly unlikely to be 25% (2.957 sigma away). If we assume a parry rate of 23%, then test falls outside a 3 sigma range. We know that #2 must be greater or equal to #1. If we assume it is 24% for both, both values are within 1.5 sigma of mean and very acceptable values.


The fact that you say #1 is irrelevant makes me think that you have absolutely no idea what you are talking about. The numbers have to reconcile with each other. Do you actually think you'll get a good convergence on 5000 trials? We're dealing with ~+-2% at this n at these parry rates. The problem is not the method of the test. The problem is that to have a good convergence on the mean is that we're going to need somewhere in the vicinity of 100,000 trials which would take literally days and days of non-stop parsing. This is why #1 is so important. With #1 and #2, we can establish with some confidence that the parry rate for the two of them is 24%. Once we go beyond this point, we're speculating largely. We can rule out possibilities but that's really about it.