Dear clients,
I wanted to update you on the outages we had on January 24th, 2023 during a planned release. First of all, I sincerely apologize on behalf of the company. This was our error. From our Root Cause Analysis, two events contributed to the outage.
The first event was an issue with a database update on one of our large tables. Given the update applied to tens of millions of rows, the database locked up a specific transactional table which caused checkout to fail. The second event was an unnoticed code merge conflict that caused a front end component to fail, which impacted our ACME hosted eCommerce pages.
Moving forward, while we manually review database scripts, our release management process will be improved. First, we are improving our process to estimate the time it takes to run database scripts into production like size tables. This will help us determine in advance if a release requires downtime. Secondly, we are improving our smoke testing steps to catch what manual code reviews may not.
We continue to invest into our tooling and release management team size to improve our service availability. Clearly we did fall short of expectations in this release. The team has already started incorporating the process changes, via updating our Roll Out Plan procedures for the upcoming planned releases. We will ensure the same events do not happen again.
Echeyde Cubillo, Chief Technology Officer/Co-founder