TaskCall Blog

How to communicate with your clients during AWS outage

By Asif Al Shahriar
October 21, 2025
Manage Uniform Inventory

It was a calm morning at the clinic until the receptionist picked up the phone and heard nothing. The service that connects patient calls to nurses had stopped responding. A patient who had just arrived wanted to speak with a triage nurse, but when the receptionist pressed the button, nothing happened. She tried several times-still no luck. The patient grew worried, asking, “Did you get my call?” Meanwhile, the clinic staff tried to figure out what was wrong. Was their phone line broken? Was the call routing service down? There were no alerts, updates, or a status page from the service provider - just silence.

What seemed like a small technical issue quickly turned into frustration and confusion. The clinic's reputation took a hit, showing how even minor system failures can cause major communication problems when there's no clear way to find out what's happening.

That scenario might sound niche, but it highlights a broader truth: when a critical service fails and there is no public or stakeholder-facing communication channel, the damage multiplies. The technical fault is one problem; the lack of transparency compounds the harm.


The major outage: when the cloud went dark

Now fast-forward to the big leagues: on Monday, October 20, 2025, Amazon Web Services (AWS) experienced a significant outage in its US-EAST-1 region (Northern Virginia). Services around the globe - Snapchat, Reddit, Venmo, Twilio, Signal, Spotify, Roblox, you name it; were affected by this. Even AWS's own Alexa ecosystem was not out of its grasp.Here is a rough timeline (all Eastern Time):


  • 3:11 a.m.: AWS reported an “operational issue” affecting the infrastructure.
  • 5:01 a.m.: Engineers identified the root cause as a fault with the DynamoDB endpoint and began working on remediation.
  • 6:35 a.m.: The database issue was “fully mitigated” and services began recovering.
  • 10:14 a.m.: Despite the fix, AWS confirmed “significant API errors and connectivity issues” were still impacting multiple services in US-EAST-1.
  • 11:35 a.m.: They acknowledged renewed network connectivity problems, with user-reported issues rising across services.

From midday onward: While many services gradually stabilized, recovery continued through the afternoon; fluctuating issues lingered as the backlog of requests cleared.


The initial culprit

A degraded endpoint in AWS's DynamoDB service - part of the backend “phone-book” for how other services locate their data. For several hours, apps were effectively separated from their data stores. Later, the failure extended: an internal subsystem responsible for monitoring the health of network load-balancers also misbehaved, complicating recovery further.


The wider takeaways

This event showcased how deeply the internet relies on a small number of large cloud providers. One region, one major provider, one misbehaving subsystem-and suddenly a global cascade of disruptions. For businesses, it underscores a fundamental risk: even “outsourced” infrastructure must be accompanied by clear communication channels when things go wrong.


Why a status page matters, in this context

So what is the link between that clinic-call failure, the AWS outage, and a status page? Essentially: transparency, trust and control. A status page is a public (or private) dashboard that shows the current health of your services - and can deliver updates during incidents. The good news is that services like TaskCall support this explicitly: their documentation explains how Public Status Pages provide a channel of communication to keep business stakeholders updated on business impacts caused by technical incidents.

Status Page (TaskCall)
Here are the key reasons why having a status page is no longer optional but a must:

1. Immediate visibility for stakeholders:

When something goes wrong, your users, clients or internal teams ask: “Is the service up?” Without a dedicated status page they often go to social media, support chat or just assume the worst. With a status page, you publish the incident, show which components are impacted, the status, and updates on resolution. For example, general status-page guides talk about how you can publish a “Major outage” notice, list impacted components and send subscriber notifications.


2. Reduced support burden:

If you are down and users do not know what is happening, support channels flood. “Why can't I speak to a nurse?”, “My app won't load.”, “What's wrong?” All of that clogs your team. A good status page helps deflect repetitive queries because it provides a central source of truth. Guides emphasize this benefit.


3. Building trust through transparency:

In many industries - including healthcare, financial services, SaaS - how you respond to failure is as important as avoiding failure. If you have a visible incident page that shows “we are aware, working on it, ETA 30 minutes”, users feel less abandoned. That builds trust and can mitigate reputational damage.


4. Segmented communication: internal vs external

Some incidents only affect internal stakeholders or specific clients. Tools like TaskCall support internal status pages (stakeholder-facing) and external status pages (public-facing) so you can tailor visibility appropriately.


5. Historical record and metrics:

A status page often holds a history of incidents, uptime percentages and component availability. That helps you analyze patterns, show reliability to customers and feed into your service-level commitments. As one “ultimate guide” to status pages points out: “provide historical reliability data for prospective customers.”


6. Better incident workflow integration

When integrated with monitoring and incident-management tools, a status page can automatically update based on component failures, speed up communication and reduce human error. Many modern platforms support API integration, automation of notices, subscriber emails, etc.


Putting it all together: Best practice steps for your organization

Given what we have seen - a small clinic client unable to call, a global cloud provider outage and the benefits of status pages - here is a recommended approach for integrating status pages into your operations:


1. Map your critical services and components

List the services your users rely on: e.g., live-call routing, web dashboard, payment processing, data API. For each, identify dependencies (database, network, external API). For example, TaskCall's dependency graph shows workflows around services, incident actions and business services.


2. Decide page scope and audience

Determine whether you need one page (public) or separate internal vs external pages. Should clients, internal staff or both see it? Decide what components are visible to which audience.You can even make a status page and decide not to make it visible if you are using TaskCall.


3. Define status levels and communication protocols

Choose clear terminology: “Operational”, “Degraded Performance”, “Partial Outage”, “Major Outage”. Define when to publish a notice, who drafts it and who updates it.


4. Choose and configure a status-page solution

Whether via TaskCall (if part of your stack) or another platform, set up your URL (e.g., status.yourdomain.com), branding, subscriber options, component list and automation. TaskCall's “Public Status Pages” feature is part of the Business and Digital Operations plan; from where you can design your status page as you wish.


5. Integrate monitoring/incident tools

Link your monitoring alerts (e.g., API failure, database down, high error rate) to trigger status-page notices. Automate where possible, but make sure human review is built in. TaskCall's external status page lets you go either way - human review or complete automation.


6. Train your team and embed in your processes

Make sure your ops, support and business stakeholders know the process: “If this happens then update status page, notify subscribers, log initial incident, update timeline.” Running drills or “table-top” incident exercises helps.


7. Embed the status page and promote awareness

Make the status page reachable from your website footer, help center, or inside applications. Encourage users to subscribe to notifications. The better your visibility, the more trust you build.


8. Post-incident review & transparency

After each major incident (like the AWS event), publish a post-mortem: what went wrong, what steps you are taking, what the ETA was and how you communicated. This improves accountability and future trust.


Reflections and the takeaway

The clinic's failed call routing was not glamorous, but it was real: when your service falters and you have no clear communication channel, your customers and stakeholders are left in the dark - a breeding ground for frustration, churn and reputational damage. On the other end of the spectrum, the AWS US-EAST-1 outage is a stark reminder that even the biggest providers can fail, and that failure propagates fast when dependencies are layered and invisible.

When an incident happens, your communication infrastructure matters as much as your technical infrastructure. A well-configured status page is more than a “nice-to-have” - it's a trust anchor.

For organisations of all sizes, the lesson is clear: Build transparency before you need it. Configure your status page now, train your teams and embed it in your process. Because when the lights go out - either locally or in the cloud - the first thing your users will look for is: “What's happening?”

You may also like...

Incident Response - A Digital Solution

Incident response is the process of addressing technical issues that occur in a company. It could be business application errors, database issues, untested deployment releases, maintenance issues or cyber-security attacks. Automation allows such incidents to be resolved fast and save losses.

Role of Incident Response in Cybersecurity

Cyber attacks are serious; as hacking and data extraction methods are becoming more advanced, the need to secure sensitive information is more crucial than ever. All companies that have an online presence should invest time and effort into creating a systematic incident response plan to respond to cyberattacks.

Don't lose money from downtime.

We are here to help.
Start today. No credit cards needed.

81% of teams report response delays due to manual investigation.

Morning Consult | IBM
Global Security Operations Center Study Results
-- March 2023