Casting a spell works like this,

Client sends command to server, server process's command and records event start, sends client confirmation of event start. Client starts the animation. Server also records event completion pending at a certain time. Clock goes on. Eventually the heartbeat will hit the prerecorded event time and the pending event will execute. In this case the actual spell calculation will be made and results computed / applied to persistent world. Server will send the client an event notification that the event completed successfully and what it's results were (damage / status messages, pretty graphics, or failure). Even attack rounds are like this although their invisible to the client, when you initiate an attack round the server records the action and when the action complete time is hit (your delay) then it'll process the event and send you the message that you just swung your sword. Certain client initiated actions can interrupt this timer, most noticeably are job abilities which the server then places approx 60 delay units worth of wait into your future pending event.

Client initiates attack round -> record start
Client initiates provoke -> process provoke -> insert wait
Server process's attack round

In the case of auto-attack that just means the server will automatically create a new event starting your next attack round following your previous attack rounds finish.

Theoretically there is nothing stopping the server from having multiple initiates and event completions intermixed.

Client start attack round -> process event start
Client start spell cast -> process event start
Complete attack round -> process attack round
Complete spell -> process spell event

It would seem that SE's server is programmed in such a way that an actor (any object capable of independently processed events) can only have one pending action. This limits what the programmers can do but saves memory space and guarantees a responsive server. Having multiple pending actions would create a much larger processing load as each action would have to be calculated during every heartbeat. When we're talking less then 32 players, this isn't an issue, but as the number of players and AI controlled monsters goes up, so does the amount of required memory and processing power. It's a cost decision done by them, kinda archaic by today's standards though.

Also things like evades / guards / parry's and other reaction events are not generated by the client, their in response to an event generated by another actor (the monster) and are thus calculated during that monsters action.

What may seem an easy solution to a player can often be very complex from the servers point of view. Having multiple events pending for a single actor isn't impossible, it's not even really that hard, BUT the server's event engine needs to be programmed for it. This wouldn't be a small programing effort, it's not just changing a few action scripts, recompiling the module then restarting the server.

In short, SE's not going to do it just for RDM, maybe for BLU WAR or SAM, but not RDM.