Most Difficult Outages
Most challenging outages
In this chapter we will detail some of the most challenging outages over many years, how we responded to these, and what was learned.
List of most challenging outages
- Production keytab invalidated
- Production home folder deleted
- All boxes rebooted on New Years morning
- Shared infrastructure oddly slows down
- Overloaded databases (deserves a chapter.)
Kafka specific outages faced
- Kafka home disks full
- Kafka not enough file descriptors
- Kafka not enough memory
- Kafka not enough networking bandwidth