There is a huge difference between the testing process (i.e. how Seiken Valk and I actually get the data and information) and what you see presented in our posts / LJ.
The best way I can put this is what you read sounds like we went line by line - I tested this, I got this result, therefore I proceed to this, etc. But in reality the testing process is more like: "wtf happened? I don't know, maybe this explanation? Let's test it." Then after about 10 tests, 9 of them probably very irrelevant, we reach some conclusion based on our best judgment and of course unfortunately our own biases. Once we reach whatever conclusion we've decided, we present the strongest case possible to back up said conclusion. That is the product everyone sees and reads.
We both keep and exchange a mind-numbingly large data bank of all our data, probably about 70%+ of which never gets posted in a public arena. If you ask us specific questions, we can point to a test maybe, but much of it is never shown. To emphasis this point again, the testing results contain biases and extrapolations (for example, testing only +70 crit gear and extending this to situations of 300+). All these things are written in my LJ posts, but they are almost always ignored (unfortunately). Basically, the test could be flawed, the analysis of the data could be biased or flawed, and the conclusions may be extrapolations. If you truly want to get into the "science" of this, these are all things you must accept.
Despite these potential errors and flaws, we try to present a clean case with as strong an explanation as we can. We wait a long time between testing and actually posting to make sure we have it right. Basically, the 2 of us, Valk and I, are our own strongest critics. But you still need to realize that it's not about us being right or wrong. It's about us having the strongest data / analysis out there. Until someone else presents a similarly in depth case to counter what we say, it's difficult to match the strength of our presentation.
****
Regarding the argument between delay and base damage. This is not something we talk about. We talk about formulas and very basic game mechanics. Rarely to do get into very in-depth application (for example, maxing DPS). There is a huge jump between saying "this is how the game calculates damage" and "this gear set is better than that gear set". For me to come in here to say I have some clairvoyant understanding of this topic would be very arrogant on my part. So I say have an honest debate about it, maybe even test a bit. If the test sucks, call out the flaws and come up with a better one. In the long run the truth comes out. Hurray science.