To be honest, the thing I think would take the most investment, but provide the most long-term mileage, would be to devote a sector of their engineering staff to unwinding all that 1.0 spaghetti code. It's not an easy thing to do. I've spent 3+ years already at my current job trying to do something similar with a codebase perhaps 1% as large. But there are so many serious improvements they could make to this game that are more or less hard-blocked by legacy code and decade-old server design. The fact that they had to wipe the partial level XP for jobs with the XP scaling changes in EW is a good example. They said they would have had to take the servers down for weeks to do it without wiping those records. That's a really big red flag in how they have their persistence system set up.
That's not to blame them for it. 1.0 played rather fast and loose with development, and the majority of what came after was either frenetic recovery efforts towards the game as a whole, or trying to keep up with content without breaking the house of cards they somehow managed to balance on all of that 1.0 broken glass. The devs have done admirably with the hand they've had to play. But I think they'd gain an immense amount of mileage, as far as unlocking and streamlining further development, by trying to unwind some of that and get better, more stable, more maintainable and extensible foundations in place. That's especially the case given the massive boom the game has seen lately, the corresponding strain on the game's infrastructure, and the raw flaws that has exposed at times.
The 2002 error punting you out of queue at EW launch was a brilliant example of this, actually. The way the client used to be configured, every 15 minutes you were in the queue, it would drop and reopen the connection to the server. The server would treat this as a new connection, the same as someone hitting "play" from the main menu, and would throttle those connections based on the capacity of the server, rejecting a portion of them to keep the servers from becoming overloaded. And then the client wouldn't retry, if the server rejected its connection, it would simply die. But it wouldn't even die immediately. Despite knowing essentially instantly if the connection would succeed, it would wait 20-30s to tell the user. And then users were only given ~60s to reconnect to the server or lose their place in line (and that 60s includes the 20-30 the client just sat on the failure before telling the user about it).
Nearly all of that is somewhere north of "questionable" from a network and application design and robustness perspective, but it never mattered until now, because the server had never be placed under such strain that it had to start blocking connections to save itself from overload. There's a lot of ground to be gained here, even if it's not "hey, look, a new raid!"