As most of you know we experienced our first major hiccup since we opened over a year ago. Over a period of 72 hours several of our key DNS systems went offline as we were in the process of transitioning over to our new in-house DNS system.
Shared Hosting took the full force of the hit as it knocked web, FTP, mail, and DNS offline all at the same time.
We totally understand that we screwed up big time this week and we would like to assure all of our members that we have added several brand new redundancy systems to ensure that if one system goes down, it won’t take everything else with it in the future.
While we try to always be 100% excellent in everything we do, from time to time things do go wrong. The important thing to do next is to learn why it went wrong, what should have been done different, and then to add extra safeguards to make sure it doesn’t happen again.
Why did this happen?
The UPS System
To take advantage of the service loss we were experiencing, we decided to move deployment of our new DNS system forward so that we would only have one block of downtime instead of two. Clearly this was a mistake.
Squidwolf Syndicate was originally a service provided by Project: Hazel. While most of our systems now operate completely independent of them, the DNS system was the last service provided by them. on 15 December we were (and still are) scheduled to activate our own in-house DNS system.
Remedy: No more unscheduled deployments, ever.
DNS not configured
It turns out that instead of severing ourselves from Project: Hazel and activating our own DNS, the two systems were trying to compete with each other inside of our own network, this is why nobody could receive a DNS record during the crisis.
Remedy: After Potion Forest DNS is deployed, we will no longer be reliant on 3rd party services.
Why am I telling you this?
When we screw up, we will always tell you. We will never make excuses as that doesn’t help anyone. We learn from our mistakes and we listen to your concerns. This is something that is written in our Charter and we intend to stick to it.
No crisis is the same and as you can see from this one, there isn’t a single failure that we can blame.
Full transparency, always.
What we did to compensate our members: We aim for 99.9% uptime which only allows precisely 45 minutes and 47.9 seconds of unscheduled downtime each month. As we massively exceeded this, nobody paid for services for the month of November 2014. Credit is automatically added to all member accounts.
If you still have questions then leave a comment below and we will reply to it. Alternatively if you want to talk to use privately then just open a ticket!
All the best,