Investigating issues with connectivity via all API endpoints

Incident Report for Supabase

Postmortem

Incident Summary

Date: 2025-06-12

Service Impacted: Supabase Platform (Multiple Services)

Affected Region(s): Global

Duration: ~2 hours 25 minutes (18:05–20:30 UTC)

Between 18:05 UTC and 20:30 UTC on June 12, 2025, Supabase services experienced widespread degradation caused by an outage in our upstream provider’s infrastructure. The outage impacted the majority of HTTP and Websocket traffic to Supabase APIs globally.

Direct Postgres and connection pooler access remained unaffected.

Who was affected?

  • All Supabase users interacting with the platform or projects via HTTP or WebSocket, including:

    • Auth, Storage, Edge Functions, and Data APIs (via API Gateway)
    • Dashboard operations (partial functionality)
    • Realtime/WebSocket connections
    • Logging and observability

What happened?

Supabase relies on the upstream provider for proxying, DNS and routing across our platform services and user projects. On June 12, a bug in their Cloud Provider’s infrastructure caused Workers KV storage to fail cascading across their systems.

This caused Supabase’s API Gateway to stop responding, leading to major service degradation as it is an entry point for the vast majority of HTTP and WebSocket traffic.

Incident timeline (UTC):

  • 18:17 — Internal alert triggered: HTTP APIs, Dashboard, WebSocket traffic degraded.
  • 18:28 — Incident declared. Logging in into our provider dashboard wasn't working, we opened a support ticket with them and started a conversation about the root cause and ETA for the fix.
  • 18:36 — We observe timeouts from the Management API globally.
  • 18:45–19:05 — Spike of Management API requests affected internal middleware DB, we took measures to return it to healthy state.
  • 19:05–19:13 — Trialled bypass proxy mitigation.
  • 20:09 — Provider systems began recovering.
  • 20:20 — We retried platform operations that failed during downtime.
  • 20:30 — Supabase services begin to stabilize as upstream infrastructure recovers.
  • 21:18 — Supabase moves incident status to “Monitoring”.
  • 22:05 — Incident closed; services confirmed operational.

Why did it happen?

  • The root cause was an issue within the infrastructure of Cloudflare’s upstream cloud provider.
  • This took down multiple their services, including KV which is a key service for them.
  • Supabase services affected due to heavy reliance on upstream provider:

    • API Gateway failure blocked access to HTTP APIs and WebSocket.
    • Management API and Dashboard affected due to proxy dependency and API Gateway dependency.
    • Logflare degraded limiting observability due to the cloud provider issues.

What did we do during the incident?

  • Opened support communications with Cloudflare immediately.
  • Coordinated response across teams and set up fallback comms channels.
  • Attempted mitigations while awaiting upstream recovery.
  • Paused billing exports and resumed after stability.
  • Recovered internal middleware systems affected by retry spikes.

What will we do to prevent this in future?

  1. Migrate critical services away from single points of failure in provider infrastructure.
  2. Continue work on API Gateway redesign for isolation and resilience.
  3. Improve logging and observability tooling to survive upstream outages.
  4. Update internal incident comms processes to reflect global outages accurately and quickly.
  5. Engage with the upstream provider for earlier notification of high-impact changes/issues.
  6. Tune circuit breakers and alerts for Management API and similar workloads.

What actions do users need to take?

No action required. This incident was caused by upstream infrastructure failure. All services have returned to normal.

Conclusion

This was a serious outage that significantly impacted Supabase customers. While the root cause lay with our upstream provider, our own dependencies and architecture contributed to the extent of the disruption. We’ve identified concrete steps to mitigate future incidents, including reducing dependency on single infrastructure points and improving our internal and external communication during outages.

We’re grateful for the patience and understanding of our community and remain committed to building a more resilient and transparent platform.

Posted Jun 24, 2025 - 11:50 UTC

Resolved

This incident has been resolved.
Posted Jun 12, 2025 - 22:05 UTC

Monitoring

Our upstream provider is reporting that all services have been restored and are operational. We are also seeing greatly improved error rates which have returned to baseline. We will continue to monitor for any further issues.
Posted Jun 12, 2025 - 21:17 UTC

Update

Our upstream provider's services are recovering quickly around the globe. We are seeing improvement in error rates, and systems get back to operational. We are continuing to monitor and will post updates as we have more information.
Posted Jun 12, 2025 - 20:36 UTC

Update

Our upstream provider still says that errors across their services are intermittent. We are still seeing slow improvement in error rates, but they do remain elevated. We are continuing to monitor and will post updates as we have additional information.
Posted Jun 12, 2025 - 19:46 UTC

Update

Our upstream provider still says that errors across their services are intermittent; however, we are seeing error rates returning to normal levels. We are continuing to keep an eye on things and will provide more updates as information becomes available.
Posted Jun 12, 2025 - 19:08 UTC

Identified

We believe connectivity issues are the result of a broad outage with our upstream provider. We are currently working with them, and they are posting updates to their own status page here: https://www.cloudflarestatus.com/incidents/25r9t0vz99rp
Posted Jun 12, 2025 - 18:40 UTC

Investigating

We are seeing widespread reports of connectivity issues across all regions. We suspect an issue in an upstream provider given the scope; however we are currently investigating. We will post updates here as they become available.
Posted Jun 12, 2025 - 18:31 UTC
This incident affected: API Gateway.