Infrastructure scale review
Our Infra team reviews the scale trends of the past quarter and defines action items for the next one to be ready for upcoming scale.
At monday.com, we have experienced a constant hyper growth in the last 2 years, with our customer base growing 3X each year. Our internal engineering team is also growing at a scale of 2X each year and the product is getting more and more complex. For example our automations and integrations execution grew last year from 0 to 3M per day!
This growth means always being ready to serve more customers. In many cases, the solutions we chose to build a year ago now don’t handle the scale of such growth and we have to rethink their implementation. Moreover, we of course want to be proactive, rather than reactive, and find solutions to problems before receiving alerts of their malfunction.
We therefore decided to create a quarterly scale review process. Our Infrastructure team reviews the scale trends of the past quarter and defines action items for the next one to be ready for upcoming scale.
What we usually review during this process:
1. Machines — we review our servers for utilization and concurrency. We check the growth trend of rpm (requests per minute) over time, align it with CPU/RAM usage over time and the response time of our apps running on these servers. We might end up scaling a specific pool of servers (we have app, slow and api pools of servers), changing the types of machines we are using and/or scaling up/down to be able to handle the upcoming load. We apply this technique to old generation of infrastructure still in use but also apply similar principles to our modern k8s clusters, where we review auto scaling policies.
2. Databases — we review our SQL databases data growth over time, the total growth and per table, as well as IOPs usage and CPU utilization. We might decide to optimize query performance, add more application level caching, move some calls to read replica or even create another replica. We also review our NoSQL database in the same way and decide if we need to do some work there.
3. Product — we look at our Bigbrain BI tool, which is used to store all our business metrics, for core feature related events growth trends.
For example, as our users’ main entity is a Board, we would look for number of Board loads over time and decide if we need to optimize the code/infra related to that feature in the product.
We also talk about what can impact our scale in the next Q, all the new features we plan to release, and decide how to get ready for these in terms of data, performance or serving.
We finish the review by creating a clear roadmap for the Infrastructure team for the next Q, based on the insights gained. We define which efforts must not fail in the plan (the big pillars) and what is going to be tackled on an ongoing basis in an iterative manner.
At the end we present to the main stakeholders what we did this quarter, our scale trends, what can impact our scale in the next Q and what we are going to do to be able to scale. This is the agenda slide from our last meeting:
This approach proved to be very effective in quickly scaling and evolving our environment, as we are constantly monitoring and improving our readiness for our growing scale.
Found this interesting? Want to join us?
We build our infrastructure solutions in the same way we build everything at monday.com. We’re looking specifically for full stack developers with a special passion for infrastructure. Like any engineer at monday.com, if we need to change the application code to comply with infrastructure changes, we do it ourselves without waiting for someone to do it for us.
If you’re a team player with strong communication skills, this just may be the team for you. You can see here all of our team’s open positions, which includes Development Experience Engineer, Infrastructure Engineer (SRE), Infrastructure Backend Engineer, Production Engineer and DBA.
If you’re excited by what you just read and our challenges, we will be happy to meet and share the knowledge. Let’s be in touch! 🙂