What If None of My Webhook Notifications Can Be Delivered?

The preceding article explains what happens if a single event notification (or 2 or 3 or any other number of notifications) can’t be delivered. But suppose something more catastrophic happens, and none of your event notifications can be delivered. What happens then?

Let’s start by taking a look at failure on the receiving end; for example, what happens if your listener endpoint crashes and can’t be restarted?

At first, nothing will happen: Webhooks v3 will continue to send event notifications and, when those notifications can’t be delivered, will assign each notification to the retry cycle. That continues for 24 hours. If 24 hours has gone by and Webhooks v3 has not been able to deliver any notifications to a subscription then that subscription will be disabled. After a subscription has been disabled, Webhooks will not deliver notifications for that subscription. However, you can use the /webhooks/subscriptions/{subscriptionId} API endpoint and the PATCH method to re-enable the subscription. After the subscription has been re-enabled deliveries will restart immediately.

Now, what happens if the problem lies on the Identity Cloud side of things? Here’s a look at the things that could go wrong with Webhooks v3 and what the ramifications are:

The Webhooks API nodes go down

If some of the API nodes go down, the Amazon Web Service auto-scaler (a service that monitors your application workload and then adds or removes virtual machines as needed to match demand) will create new API nodes to replace the failed nodes. In that case, Webhooks management could, at worse,  slow down a bit until the replacement nodes are all up and running.

If all the API nodes go down organizations will not be able to manage their Webhooks subscriptions until service has been restored. However, even an extreme case like that won’t prevent webhook deliveries: events will be generated and added to the event queue and dispatchers will collect those events and send event notifications, with little discernible impact on performance. What will be affected is webhooks management: organizations won’t be able to make API calls against the webhooks database nor will they be able to create, update, or manage webhooks subscriptions until service has been restored.

The Webhooks event monitoring nodes go down

If some of the event monitoring nodes go down, the AWS auto scaler will create nodes to replace the failed ones; in that case, notifications could slow down a little (until all the failed nodes have been replaced) but will not stop. If all the nodes fail then no new events will be added to the event store; dispatchers will continue to deliver events already in the store, but after the store is cleared deliveries will stop (because there won’t be anything left that requires delivery). As soon as even one monitoring node is available events will again be added to the store, although that process will be slowed until the monitoring nodes have been restored to their usual number.

To be honest, it’s unlikely that all the nodes will be unavailable at the same time; should that happen, it’s also unlikely that the outage would last very long (the AWS architecture allows new instances of virtual servers to be created in a remarkably short amount of time). If it does happen, however, data will not be available for events generated during the outage.

The Webhooks delivery nodes go down

If some of the delivery nodes go down, the AWS auto scaler will create replacements for the failed nodes; in that case, notifications could slow down a bit (until all the failed nodes have been replaced) but will not be stopped. If all the nodes fail then event deliveries will be suspended until a replacement is available to delivery notifications. However, while the nodes are down new events will continue to be added to the event store; as soon as new nodes are in place they can begin delivering those event notifications. There should be no loss of data even if all the nodes go down; at worst, event deliveries might be delayed a bit.

The Webhooks database goes down

Identity Cloud databases rely on the use of “hot standbys” to provide high availability. When an event is written to the events database, copies of that event are instantly synched to one or more hot standbys: mirror databases located in other Amazon Web Service availability zones. Should the primary database fail, one of the hot standbys can take over, with little (if any) disruption in service.

In the unlikely event that neither the primary database nor any of the hot standbys are available, the event service will temporarily stop: new events will not be added to the event queue, dispatchers will not be able to deliver those events, and organizations will not be able to use the APIs to manage their Webhooks subscriptions. When database service is restored Webhooks will immediately restart; however, no data will be available for any events that occurred during the outage.