That's because they release it as a playground available to both subscribed and non-subscribed players for free gear; the climate therefore is even less useful to actual information from its broad strokes than the cacophony of the live servers. Compare that to closed beta testing, where Blizzard specifically asks leading guilds who know what they're doing, what apparent flaws will actually penalize choices in loadout or composition, by what margins, what feels "off" to a skilled player, and maybe even suggest what can be done about those issues, to join and test their stuff.
You can read Jeff Kaplan's comments on what would be necessary to make the PTR a useful testing ground in Overwatch. It's not an easy process. In the end any game's mechanics must be developed with its optimal outputs in mind, but a random testing bay is not going to spontaneously produce the player conditions necessary to find or make use of those.
But that doesn't remove the value of allowing those who can actually make something of the information given to do so. It just means that there's very little usable information discrepancy between theory-crafting from the paper data and from the practical data, so long as the paper data is willing to include fight conditions. In all other cases, the theorycrafting, done well, is usually more accurate to the outcomes of a given tier than the experiential findings of a PTR by people who, frankly, are highly divorced from optimal play. Their opinions on the gameplay, numbers taken out of the equation, is essential and likely even matter more than the voices of those who tend to theorycraft, simply because they, unlike dedicated performers, are the majority. But when people start arguing performance in based on experiential findings (or worse yet, without any objective data collection, e.g. by detailed parsers) in non-optimal settings, that's where **** hits the fan.
tl;dr: Your average engaged (e.g. willing to learn) players best informs on gameplay appeal to the overall playerbase. Your high performers best inform on performance. Mixing the two data sets unknowingly can greatly diminish test accuracy. This nonetheless does not invalidate that data; it just makes it harder to sort when the testing grounds are not well set or defined.