Egress uploads to Azure

Incident Report for LiveKit

Postmortem

Timeline

  • 6/24 22:30 UTC: A dependency update for Egress was deployed to production.
  • 6/25 15:00 UTC: Investigation began after receiving customer reports of Azure upload failures.
  • 6/25 16:30 UTC: The cause was identified and fixed.
  • 6/25 17:15 UTC: The fix was fully deployed to production.

Remediation

Since Azure blob storage is only used by a small percentage of our customers, and upload failures due to issues such as bad credentials are common, failures were not easy to recognize by our logging or standard error/failure monitoring. We’ve completed the following steps to prevent similar incidents in the future:

  • Integration tests now include uploading to Azure before every deployment
  • End to end tests now include periodic Egress requests which will upload to Azure, and alert on failures
Posted Jun 25, 2025 - 12:47 PDT

Resolved

This incident has been resolved.
Posted Jun 25, 2025 - 11:58 PDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jun 25, 2025 - 10:53 PDT

Identified

The issue has been identified and a fix is being implemented.
Posted Jun 25, 2025 - 10:31 PDT

Investigating

Identified an issue affecting some egress uploads to Azure.
Posted Jun 25, 2025 - 10:28 PDT
This incident affected: Global Egress.