I am frequently asked how to get better at Incident response. My answer typically is not better Observability, ChatOps, or Automation. Although these are important ingredients my response simply is: Practice, Practice, Practice.
This week's 30sec SRE features "Wheel of Misfortune". It's a game that aims to build confidence in on-call engineers via simulated outage scenarios. The exercise involves spinning a metaphorical wheel that randomly selects and simulates different failure scenarios. The goal is to see how well the team can respond, recover, and mitigate the impact of these simulated incidents.
------------------------------
Ingo Averdunk
Distinguished Engineer
IBM
------------------------------