Despite the name, planning a Chaos Day should help you deliver a carefully developed series of experiments that test for specific weaknesses within an application or system. Here, we run through the key steps in planning your next chaos engineering game day
Before you start
There are a number of questions that it’s important to ask before you start planning the specifics of a Chaos Day. Without a clear goal in mind, you could just end up causing chaos that doesn’t deliver any learning or system improvements.
So ask yourself – who do I need to attend my Chaos Day? What systems and processes do I want to focus on? How will we measure success? Do we have a specific budget? Where will the Chaos Day take place?
Whether you’re experimenting on a single service or at scale on an entire digital platform, planning your Chaos Day is essential to make the most of your investment of time and energy. While the process is similar regardless of scale, the organisational complexity, commitment and elapsed time increase with the number of services and teams involved. Because of this, our advice is to start small, so you can learn and adapt the process to your particular situation. Start with one service or team, not an entire engineering platform, then grow incrementally with each subsequent Chaos Day.
The risks involved with chaos also increase with the number of systems and teams involved. For this reason, we advise building in small, time-boxed system risk assessments using a system like FAIR. This will give you the chance to explore what potential failures might happen, their frequency and the magnitude of their image.
Identify target teams
Running a Chaos Day requires people’s time and system usage, so it needs to be carefully scheduled. Remember that the benefits of a Chaos Day might not seem as compelling to engineers who are working on new features, so it’s important to let people know the Chaos Day is happening, and to communicate the value and outcomes of the Chaos Day.
Several weeks before your Chaos Day is scheduled, start to plan your Chaos Day team and ask people to ‘save the date’ for the project. Check out our guidance on when is the right time to schedule a Chaos Day in this post. Make sure you have an appropriate venue secured for your Chaos Day.
Your agents of chaos are people who will carry out experiments. Gathering this team together and brainstorming ‘what if’ scenarios helps you to generate ideas for candidate experiments.
Identify target services and malfunctions
At the same time as planning your Chaos Day team, you need to think about what experiments you’ll run, and what you’re testing for. You might have a specific system that is about to experience a peak in traffic that needs to be tested for resilience. Or you might be considering a new process or integration that needs to be tested for potential bugs or unexpected events.
Understanding what systems are to be tested and why is important in helping senior stakeholders understand the value and purpose of the Chaos Day.
Plan and design experiments
Start by agreeing on benchmarking data for what is normal system performance, and then plan experiments that will allow you to monitor performance when the system is put under strain. Typically, chaos experiments introduce variables that reflect events such as a server crash, network failure or hard drive malfunction.
We recommend allowing around 2-4 weeks for experiment design, making sure to check dates and environment with stakeholders. Be sure to communicate with stakeholders about what, why, where, when and how experiments will happen. For more guidance on what experiments to run on a Chaos Day, see our Playbook.
Communicate at all times
Throughout your planning process, ensure that stakeholders are aware of what is being planned, and the potential impact on business as usual.
Ahead of your Chaos Day, you may also need to distribute pre-event materials to your chosen agents of chaos, or an agenda for the day, so they are fully prepared.
If you’d like to find out more about how to plan, organise and run your own chaos day, don’t miss our Chaos Days Playbook.