There may be something going on with how cities/encampments are being rendered vs. out in the field as well---kind of like how FFXI would nosedive in some cities/areas because of specific structures and such. Remember North Sandy during the Christmas event? If you ran through there and had your camera pointed in the direction of that glorious tree, frames would drop about 30%--even when you couldn't see the tree because you were inside your nations office. And the sandworms and raptors in Abyssea could make things go bonkers when a bunch were on screen too.

In general things are optimized well, but when certain things come into view (single or en mass), it may be straining a marginal setup. The trick is narrowing down exactly what those things are (possibly the crystals, like it was with maws and confluxes in XI?).