Opening Book
Why invest in an outage process?
Many software teams will not experience a severe software outage more than once a year. For those teams who have achieved this high degree of software release and operations quality engineering, bravo!
The state of software process in 2023 has not changed much since 1998 when I began a professional software career. Each team of developers invents a standard for software releases and managing outages. Most of these environments are not robust, and many of the quality standards are low or non-existent. Software change process is in itself is enough material for a book on change management. But here we will instead focus on what to do when an outage is taking place. Later we will touch lightly in a few best practices for software change but this is not the goal of this particular book.
This book is about what to do when things go wrong. Even once a year or occasionally, an outage event can be solved in a matter of minutes. Or it may drag on for hours or days. With a small investment of preparation, when these events do occur you can handle them with relative ease and a well understood response.
This book will walk through everything you need to know and do well and correctly during an outage. The primary goal of any business is “make money” and save time and money. By being fast and effective at outages you will save your firm from reputational risk. But you will also save time triaging the problem itself. Time improved triaging makes Your team more effective as less time is spent ‘on-call’ managing problems. This time saved, energy and focus can be directed at improving your product or avoiding overworking the engineering staff.