Postmortem April 26th, 2019 Downtime -

Null

Ooperator
kiwifarms.net
TL;DR: We are a big boy site with big boy problems that cost money to adequately deal with. If you want to support the site, get Brave, get an Uphold account, and regularly contribute BAT. It's as good as cash. It feeds me and pays the bills.

Problem
The server harddrive failed.

Timeline
At about 8pm Moscow time, the server shut off. No one pinged me so when I naturally f5'd the site at 9pm Moscow time and saw it wasn't responsive, I started my diagnosis and determined pretty quickly the server was off and booted it. The web server VM was read-only and nothing worked. I determined the disk was in read-only mode, ran a check on it, and encountered an error. I rebooted the server and there were parts of the disk that were corrupt. I checked these and hoped that only a few random attachments were lost, because I can restore those easily. I did restore those attachments, but the database failed to boot and basically said it was fucked. I checked database backups which occur at 12 hour intervals and saw the last one had occurred 11 hours and 30 minutes before the site went down. The import process took 2 hours. The site came up at about 3:30am Moscow time.

Remedies going forward
All services that are not the actual KF have been moved off the KF's server to new VPSs. In particular, our in-house analytic suite is being reinstalled on a different machine. The analytic DB is actually bigger than the KF's so this will about halve the strain on the server and the disk -- I hope.

I'm also going to go ahead and start up replication because now that the DB is so big, SQL file backups as a primary backup system is not feasible. The backups themselves take many minutes to complete which can create issues in restoration as data doesn't match up. This is going to be on existing hardware I own.


My big fear right now is that the main server is actually fucked. If the disk corrupts again I have to replace the corrupted disks and RAID new ones. Our installation is now several terabytes so that's not cheap, it's to the tune of hundreds of dollars that I don't have. So fingers crossed it's just some random faggot fluke issue.
 
Last edited:

Reverend

Avatar of Change
kiwifarms.net
Mentioned in the Discord server, but once I graduate from college in about two years or so, would you be able to let me volunteer to help out with back-side server stuff?

Tech moves fast as Hot Pockets+Mt. Dew Tendie mix through a 4Chanr's small intestine. You want to help out on the site learn the ins and out of Rational Database Systems, Load Balancing, Cross Data Center replication, and either AWS/Azure/Google Cloud certs.

Then, and really only then, would I allow someone fresh out of university to help out with infrastructure. Beyond that all I'd want you to do is check a dashboard and say "yup, the site's down, time to message the On Call crew."

Best Advice: You build, run, and maintain your own system, your own playground, you GitGud3.0 on your own VPS', and then you have the knowledge. Not the best, not the most, but you at least know what's run in the real world and not in some University/College simulation.
 

Sidon's fleshlight

fig 24. A "balanced and nutritious" vegan meal
kiwifarms.net
Tech moves fast as Hot Pockets+Mt. Dew Tendie mix through a 4Chanr's small intestine. You want to help out on the site learn the ins and out of Rational Database Systems, Load Balancing, Cross Data Center replication, and either AWS/Azure/Google Cloud certs.

Then, and really only then, would I allow someone fresh out of university to help out with infrastructure. Beyond that all I'd want you to do is check a dashboard and say "yup, the site's down, time to message the On Call crew."

Best Advice: You build, run, and maintain your own system, your own playground, you GitGud3.0 on your own VPS', and then you have the knowledge. Not the best, not the most, but you at least know what's run in the real world and not in some University/College simulation.
I'm heavily aware of those cloud/server certs that are active at them moment now, so I'll be looking into those, especially since I want to look for a job maintaining the security of such technologies. I'm just getting out of sophomore year of college, and it's a process to get through everything, but I'm still pushing my way through it.
 

Reverend

Avatar of Change
kiwifarms.net
I'm heavily aware of those cloud/server certs that are active at them moment now, so I'll be looking into those, especially since I want to look for a job maintaining the security of such technologies. I'm just getting out of sophomore year of college, and it's a process to get through everything, but I'm still pushing my way through it.

AWS Solutions Architect was the hardest cert I ever studied for, but it's made me the most amount of ROI yet.
 

KeyserBroze

Sleep is for people who run out of Cocaine
kiwifarms.net
Any volunteers for the Farms Tupperware party?

738634


In all seriousness is there a public BTC/XMR address for the site?
 
Last edited:

BOLDYSPICY!

ONE MORE COD REJECTED, I AM THE PUFF INSIDE YOU
True & Honest Fan
kiwifarms.net
738637738638738639

Brave is actually pretty easy to use & donate & isn't absolute fucking garbage on mobile. It takes a little getting used to, but it's a good browser.

EDIT: I actually like doing shit this way much better than futzing around with other cryptocurrency, opening up a Bitcoin wallet, etc. Seriously, donating takes all of two seconds.
 
Last edited:
Top