Another aspect may be in play here too... fill rate may be getting impacted by the amount of data getting dumped. Could be choking down coming through the network layers or through memory (system or graphics). I found on my 4800 series cards I got much more "bang for the buck" raising the clocks on the graphics memory than the core clocks. I also saw diminishing returns as my CPU got closer to a 4GHz overclock--but by revamping the timings to use a lower multiplier and faster FSB I saw slight improvements in fill rate (and load times).

May be something worth tinkering with to see if you can mitigate the slow downs a little. See if you can increase the system and graphic memory clock speeds a bit and see what happens?