Over-extended downtime 2010.07.01

We had to schedule an extended downtime today to do some over due database maintenance.

Over the 2 months that we have been live, vast amounts of logs have been produced; this has been slowly degrading SQL server performance. It first became and issue last Sunday. Database load coupled with node deaths as a result of that, was the biggest reason for the network not setting a new simultaneous users record last Sunday (we have set a new record each Sunday since release) and signup/churn last week was more than enough for a new one.

We decided do the monthly DB shrink that is planned as part of the maintenance schedule today, instead of doing it at the same time as the patch, which will be Thursday and will unfortunately mean another extended downtime (should be shorter, given smaller DB).

During the operation today we saw opportunities for improvements, so future monthly database log cleanup session should be much shorter.

We have also escalated the acquisitions of additional database cluster resources, the database cluster was built with it in mind to grow with the network so we have quite a lot of headroom there.

With the database cleanup we are good for some weeks more, based on current subscription increase. By that time we will have acquired the DB resources required, so we still have quite a bit to grow before we have to start a new cluster (shard the world).

Share

Comments are closed.

Fullscreen Gallery