That has been posted here a few times. The comments section does a fairly good job of dissecting some of the main problems with that post.
Outcome Measures
I think that if someone provides you a set of data, it's worthwhile to critically evaluate it for yourself, rather than accepting everything presented at face value. Is the primary variable being measured relevant to you? Is it something that would change your decision making? If it isn't, then the rest of the discussion has no value to you. GIGO.
Measuring total damage taken is a big red herring. Not all damage taken has equal importance. Are you mitigating a tank buster that would otherwise kill you, or are you mitigating a random auto that would have been soaked up by regens/fairy? A lot of healers won't even bother throwing out direct heals unless you need to meet a specific HP threshold to survive.
Inclusion/Exclusion Criteria
Next, look at the inclusion/exclusion criteria. The post states that it's looking at a subset of optimal play. Optimal play for whom? Was there a statistically significant difference in the combined tank dps in order to achieve that level of mitigation?
There are some situations in which using TBN will be optimal, for a given run. There are some situations in which it won’t be. The exclusion criteria vaguely states that "Excessive TBN usage will skew results" but doesn't clarify what measures were implemented to evaluate or prevent this.
If we're not really being strict about dps optimisation, it's worth noting that the exclusion criteria intrinsically excludes all usages of Unchained, despite it being a relatively minor dps loss.
It’s not clear what subset this is supposed to be representative of. Mitigation-light speedrun optimisation? Mitigation-heavy early clears?
The section on Living Dead states that "all damage for the duration is counted as mitigation". The poster has previously gone on record as stating that Living Dead is 10-20 seconds of invulnerability. Most of us know that this is factually incorrect.
It also states that initial data collection was collected from voluntary submissions, before switching over to random sampling. Were these initial samples done with a different mindset? Did any of the players involved know what the data was being submitted for? Or were they blinded to the process? If not, it skews the results.
Methodology
One of the difficulties in evaluating mitigation is that it's multiplicative. The damage that registered is only the apparent damage. To exclude the effect of party mitigation effects, you need to know what the true incoming damage is before buffs. This means looking at every active mitigation effect and working backwards to remove their individual effects. On every single hit.
The rule-set also doesn't clarify what happens to single target healer mitigation effects, like Adlo and Benison. How are these accounted for? If they are excluded, how so? If they aren't, then how does healer composition influence the results?
There is a lot of variation in how tanks split up responsibilities in a fight. Sometimes, the responsibility is roughly equal, with regular swaps. Sometimes, it's very one-sided. Does WD mean that the WAR was only active for add pick-ups, with the DRK holding the boss? Does it mean that they swapped after tank busters? Is WD one homogeneous population, or does it represent multiple, distinct populations, each of which divvy up the tanking responsibility differently?
Which then brings us to the last point. If your ANOVA shows that there isn't sufficient evidence to reject the null hypothesis, then what was the power calculation? Phrased differently, with only 10 cases looked at for each group, what was your likelihood of a type II error? And even if the data-set isn't underpowered, does failing to reject the null hypothesis make the null hypothesis true?
There's a reason why peer-reviewed journals specifically publish data which has statistical significance. Absence of evidence is not evidence of absence.
Wrap-up
It’s fantastic that people took the time and effort to gather this data, even if it’s for a parameter that generally doesn’t influence player decision-making (But who knows? Maybe you're interested in maximising your mitigation per second).
The main weakness is the write-up. The data presented is a red herring that doesn’t really have anything to do with the meme conclusions (i.e. DRK is fine as it is, git gud). The formatting of the essay, which would be commonly seen in scientific, peer-reviewed journal articles, is used here as rhetorical tool to give it undue authority and disguise the fact that it’s an opinion piece. The title is similarly sensationalistic, and likewise has nothing to do with the data collected. The only disappointment in all this was the read.
Do you know what really gets people to play a job? Show people how much you enjoy playing it. Watching people do eight WAR content clears back in ARR did a lot to popularise that job in the community. It showcases the job and what it’s capable of.
Likewise, when you see genuinely talented players, as in the case of the WF group for the first Ultimate, come out and say that they love playing DRK, it sends a message to the player-base. Not that DRK is free from problems, or that it’s perfectly balanced, but that it can be a lot of fun to take into difficult content.
There’s no ego there. No sense of superiority. Just a genuine love of the game.
We need more of that.


Reply With Quote

