Resolved -
This issue is now fully resolved. We will be posting a detailed RCA.
Apr 15, 21:09 PDT
Monitoring -
Our fix is fully implemented and we are not seeing any more failures or high latencies of the RoomService APIs. We are continuing to monitor the issue. We did observe a period of 15 minutes with high API failures while mitigation steps were being applied.
Apr 15, 20:51 PDT
Investigating -
While applying a fix for the API latencies, we are temporarily seeing increased failure rates in RoomServices APIs, including CreateRoom, UpdateRoomMetadata, and DeleteRoom. We are actively working on mitigating this. Impact has been upgraded to major.
Apr 15, 20:45 PDT
Identified -
We continue to see the long Room API latencies which are now also impacting other regions. The latency increases appear to originate from a specific table in our distributed database. The issue has been escalated with the database vendor and we are working on a workaround for decreasing the API latencies. Other services are not impacted.
Apr 15, 19:10 PDT
Update -
We believe these elevated latencies began around 22:00 UTC. We have confirmed that only API requests in US-West should be impacted. The current list of impacted APIs appears to be CreateRoom, DeleteRoom, and UpdateRoomMetadata. We are working on mitigating the issue to return latencies back to normal.
Apr 15, 18:07 PDT
Update -
We are continuing to investigate this issue.
Apr 15, 17:13 PDT
Investigating -
We are investigating reports of increased latencies in RoomService APIs in the US West region, specifically on CreateRoom, DeleteRoom, and UpdateRoomMetadata APIs.
Apr 15, 16:23 PDT