ZEN and the art of Reliability

Zendesk handles approximately 250,000 requests per second at daily peak into our infrastructure, with over ½ of those requests needing to read or write to a database. At our core we’re a humble Ruby on Rails application that is partitioned and heavily sharded. Our infrastructure was simple 10 years ago — Nginx with a Ruby backend and a single MySQL database.

After being on the journey for a decade, it feels like we’ve been tested in every way possible. We’ve seen consistent misuse, both intentional and unintentional, from external folks and ourselves. We’ve reached the scaling limits of our core technologies. We’ve found short-term partitioning strategies just to make it through. We’ve architected our way through scaling bottlenecks.

Read more…

Leave a comment

Design a site like this with WordPress.com
Get started