Results -9 to 0 of 591

Threaded View

  1. #11
    Player
    Ayrie's Avatar
    Join Date
    Aug 2013
    Posts
    136
    Character
    Ayrie Lumire
    World
    Mateus
    Main Class
    Dancer Lv 80
    Quote Originally Posted by TaleraRistain View Post
    We know this isn't entirely true. They have taken down individual servers before more than once. And you have a pretty poor system if you design it in such a way that you can't take problem pieces down to fix them and have to take the whole thing down every time there is a problem with one area. Which it doesn't appear that they do, since as I mentioned, they've taken just the problem pieces down before.
    So that specific and targeted server fix - that was likely done for a specific reason, by a specific person or team. That is a manual process. It was probably done for a very specific reason that was unplanned (like, a failing SAN or they lost a blade on a cluster).

    I know I'm getting a little deep here, but the point is: emergency maintenance is likely just for that reason. This would not fall into "emergency" maintenance. They're not going to sit and measure server performance vs load, and then reboot the server when the performance does not equal what is expected for the load (aka: people are idle and not consuming system resources). This would be what is known as a process. A temporary process, but a process. And you want processes to be divorced from human intervention as much as possible. Humans work slower, less efficient, and more accident prone than scripts and code. If possible, you want code to handle the event.

    Likely, what I would do is set up a crontab job (task scheduler) to send out notices, and then to actually kick off the reboot job. Your team then monitors the condition of the job, because this is pretty high profile and you'll want to have eyes on it, to make sure everything executes properly and then responds quickly in the event things did not go well.

    I would also venture to say that Square's team is meagre in size when compared to their server footprint.
    (1)
    Last edited by Ayrie; 06-30-2017 at 09:32 PM.