In 1.x the intention was to take the graphical styling on XI and push it even further beyond that graphical fidelity as far as they could. To that end, the graphics contained an ASTOUNDINGLY high polygon count to the point where even simple objects were achieving counts that character objects reach today (aka: OP flower pots). However, in chasing down this ideal they went a goodly bit too far and ended up with a graphical system that nearly no computer of the day could handle. Even by today's standards the graphics were overkill when it came to processing power.
Due to this, the cutscenes needed to be just as graphically stunning and all had to be prerendered cinematics whereas the cutscenes we have today are mostly rendered with the in-game engine. The result (since the game had to be rebuilt graphically from the ground up to a sensible standard) is that the cutscenes are much lower graphical fidelity but they can also do a lot of things that the pre-renders can't, such as account for differences in the player character's appearance, class and gear.
Would it be nice too have that level of graphical detail back? Yes, but the amount of computing resources that would make it feasible to do that in the game system is significant and would likely erode playability at its expense.
Edit-
For the voices... Voice actors move on to other things, some stop doing VA and some (as we unfortunately know all too well) pass away while their voiced characters live on. Without being able to produce the content quickly and consistently you're probably going to see voice changes here and there, which is why so many MMOs have little to no voice acting in consistent characters.