How to Handle Planned Downtime for Your API
When you turn the handle on your bathroom faucet, you feel confident that water will stream from the spigot. There is a lot of infrastructure–either at the municipal or property level–to keep the supply flowing. The same is true of APIs. When you request data from an API, you expect it to be available—though most developers would not place the same level of confidence in getting a 200 response from an API as they would in getting water from their faucet.
API downtime happens and most of the time can’t be avoided or predicted. But sometimes, such as for maintenance or upgrades, you do know when your API will be down, and you should let your users know, much as the city would announce planned city water outage. This post covers four steps API providers should take to handle planned downtime, communicating before, during, and after the outage:
- Avoid planned downtime
- Give developers a warning
- Update your status page
- Help developers retry
Nobody likes when downtime happens. The four steps, described in detail below, will help you minimize its impact.
Step 1: Avoid Planned Downtime
Today's apps operate in an era of high availability. Much like our water taps, we can turn on computational access and pay for what we use, down to the hour. That likely means that many reasons behind planned downtime can be avoided with preparation.
Let’s say you need to upgrade your database servers. Could you keep copies running as masters while you perform the upgrade? Yes, you’ll just have to figure out how to sync any new data before you switch back. Or could you handle write downtime and still be available for reads?
Any approach that helps avoid downtime will take more effort. The outcome of those efforts is expanded trust from your users, a valuable asset for anyone who wants to see their API relied on.
Step 2: Give Developers a Warning
Some planned downtime is as unavoidable as sewer work shutting off your water for a few hours. Take a page from the water bureau playbook and make sure developers know in advance.
Unfortunately, it may not be as easy as hanging a note on every door handle in the neighborhood, but here are some ways to reach developers about your planned downtime:
- Schedule downtime in advance. Give at least a week of warning or this might not feel so “planned.” Remember that you’re giving developers a chance to prepare their systems to ensure they can handle the downtime, which will take time (and they have a lot of other things going on!).
- Send an email (or several). You have email addresses for developers, right? They'll want to receive an email announcing unavoidable planned downtime.
- Post a message on your site. Your documentation site and other developer portals are great places to warn of upcoming planned downtime. You’ll catch your most-engaged developers who are the most likely to be impacted by your maintenance.
- Post on forums, Twitter, et cetera. You want to get your message anywhere that developers might check. Heck, put it in your API response headers leading up to your downtime, similar to the API deprecation headers we’ve recommended previously:
X-API-Downtime-Date: 2017-08-03T06:00:00Z
Your goal with these communications is to reach as many developers as possible before you take your API down for planned maintenance.
Step 3: Update Your Status Page
Every developer may not see your maintenance announcement, though. If your API is unreachable or does not send expected results, a status page is the very first place a developer would check for more information. This page clearly displays any active issues and provides insight into previous (and potentially future) downtime.
For example, you can check on Zapier’s own status at status.zapier.com. In the case of downtime (planned or otherwise), we would create an incident. You can see previous incidents to get a feel for what we post on our own status page.
Typically a status page is reactive, but you want to make sure your planned downtime proactively goes on here, too. Our page is powered by StatusPage.io, which has a scheduled maintenance feature that can even email and Tweet about upcoming maintenance.
When you schedule downtime, your status page should note it in a scheduled maintenance section, and should also alert anyone subscribed to receive notifications. For extended downtime, you may want to also send your own notifications. And while the maintenance is ongoing, you can update your scheduled maintenance post to share progress.
Regardless of whether you use StatusPage.io, another similar tool (such as status.io or the self-hosted Stashboard), or your own creation (a simple blog hosted elsewhere can work well), make sure you follow the steps of including details on the status page before, during, and after your downtime.
Step 4: Help Developers Retry
Communicating your scheduled maintenance is important, but so is helping developers deal with the actual downtime itself. The most important thing you can do is don’t go dark.
While this isn’t a repeat of Step 1 (avoid planned downtime), it’s very much related. The usable version of your API may not be reachable, but it can still provide a response. This is not nearly as hard as avoiding the downtime altogether, and helps developers differentiate your scheduled, temporary downtime from the accidental variety whose duration may not be known.
Include these two helpful pieces of data in your downtime API response:
- Status code
503 Service Temporarily Unavailable
- Include header
Retry-After: Wed, 21 Oct 2015 23:29:00 GMT
The HTTP status code 503 and Retry-After header will help both machines and developers understand that your downtime is temporary and tell them when you expect to return. These are the same suggestions Google gives to webmasters for website downtime.
Retry-After, introduced as part of RFC 2616 in 1999, should have either a full date value in GMT/UTC, or a number of seconds to wait before trying again. We prefer the date version because it provides more information and can be prepared as a static response.
Developers (And Your Users) Are Worth Your Effort
The four steps outlined in this post will take some work. At a minimum, you’ll have to spend some time thinking and discussing how best to handle maintenance projects. You may end up taking hours of preparation to avoid minutes of downtime—or to avoid making those minutes of downtime stressful for those who use your API. To make that tradeoff, you need to know that your API is important to your business.
Remember most developers are either customers themselves or building tools used by your customers. API uptime builds and maintains trust, which will keep developers building.
Comments powered by Disqus