In-reply-to » This weekend (as some of you may now) I accidently nuke this Pod's entire data volume 🤦‍♂️ What a disastrous incident 🤣 I decided instead of trying to restore from a 4-month old backup (we'll get into why I hadn't been taking backups consistently later), that we'd start a fresh! 😅 Spring clean! 🧼 -- Anyway... One of the things I realised was I was missing a very critical Safety Controls in my own ways of working... I've now rectified this...

This is an example of what I believe every SRE should master and whatever Post Incident Review (PIR) should focus on. Where did the system fail. What are the missing or incomplete Safety Controls.

⤋ Read More