How we manage software infrastructure at monday.com
Infrastructure

How we manage software infrastructure at monday.com

David Virtser
David Virtser

As you may know, at monday.com, we’re transforming the way people work by building a simple and intuitive SaaS software to connect teams around the world to their workplace processes, while improving collaboration and communication along the way.

Imagine you’re running a super successful company with 10 employees, 30 different clients, and 50 projects. As a manager, you decide to run your entire business (external projects and internal communication) on monday.com. An excellent choice if we do say so ourselves! ???? monday.com then transforms into the core of your company’s operations. While this is just one example, it is the reality for the 35,000+ teams that rely on our platform. As a result, we at monday.com simply cannot afford outages or system downtime. So, we’re always working as hard as possible to provide the best user experience we can and constantly maintaining service uptime.

Now a little about our R&D department that is responsible for building this platform. As of now, we’re around 30 engineers, split into a few teams. All our engineers are true full-stack engineers. And when we say full stack, we don’t necessarily mean that they write backend and frontend code. It means they own a concept from thinking it through (together with product and UX experts), executing it (by writing the required code) and then deploying it (creating Infra and monitoring). As a full-stack at monday.com, you work end to end, owning the whole process. We want to maintain that culture for as long as we can.

At monday.com, we’re obsessed with transparency which helps us operate everything in a more visible way. When you visit the monday.com HQ, you will notice a lot of screens (over 70 to date!) showcasing data on dashboards. We have many dashboards that are unique to R&D. Dashboards help us show how many deployments we did each day, who is deploying right now, what’s failing, what our current test coverage is and much more.

We don’t have a formal QA process and we have a good suite of tests that allow us to safely do Continuous Delivery with about 20–30 deployment per day. We also have a proprietary A/B testing framework, backed by BigBrain, which also increases our confidence in making so much changes per day.

Although all engineers are capable of understanding and applying infrastructure changes at any given point, we have a dedicated infrastructure team that focuses on providing the right tools to maximize developer productivity and to build a framework to make infrastructure changes possible for any engineer across the company.

We have a 24 hour on-call rotation shared by all engineers. This means that each engineer must be capable of resolving any issue whether it’s a bug in the code or an infrastructure change that needs to be fixed. We invest a lot of time in education and training, as well as creating a wiki with different types of incidents and how to solve them.

The Infrastructure Team’s Main KPIs

Developer Productivity

We created a docker (and docker compose) based environment to help engineers to spin up the whole environment on their own laptops for development and testing. We are also always measuring time and working to improve our build and deployment pipelines. You can take a look at some interesting metrics in the dashboard below:

Application Infrastructure

Production Stability & Security

We manage all production incidents in a dedicated monday.com board. Each time we detect an incident, we note it in this board. The information we update includes the incident title, who took care of it, the date it happened, time to resolution, root cause, which service was affected, incident severity, current status and more. It allows us to keep track of production incidents, implement action items, and to learn from our mistakes.

You can see this example in its entirely in one of our newest offerings, monday stories.

So, what kind of people are we looking for to join our infrastructure team?

If you’re a team player with strong communication skills, this just may be the team for you. You can see here the full job description for our current open role of an Infrastructure Engineer (SRE). We are always looking for new team members and if you’re excited by what you just read, we’d love to meet you!