Why did we change our team from DBA, and what is DBRE anyway?

Why did we change our team from DBA, and what is DBRE anyway?

Liron Amitzi
Liron Amitzi

Databases are an important part here at monday.com, and actually in most companies that work with and manage data. Keeping the databases up and running, while maintaining good performance, and assisting developers when creating new features or systems, requires professionals who know and understand the database best. Traditionally, these were DBAs.

The thing is that the world is changing and being a DBA is not “cool” anymore. But if I’m being more serious, the DBA profession started to change as companies changed the way they develop products, manage them, and expand.

From monolith to Microservices

monday.com, like many other companies, started with one large service that did basically everything (called a monolith). As time passed, the company grew, more developers joined and created new teams, and fast development and deployment became a real challenge as well as maintaining one large codebase. This made us shift (like many other companies) to microservices architecture.

Microservices architecture breaks everything in the app to smaller pieces, managed by different applications (and databases where needed). It allows much quicker development and deployment, reduces the dependencies between teams, and it is the only way to keep up with the required pace. But what is this fast-pace development? Just to give you some numbers, currently we deploy about once a day to our monolith (which by now contains only specific parts of the application), while the total number of microservices deployments is about 30-60 times each day. This could have never been done to a single monolithic application.

How is this related to databases?

In the cloud computing and devops world there is a phrase saying we should treat our servers as cattle and not as pets. The concept is that we don’t treat every server like a pet (I remember a time where each server had a special name and I remembered the names, IP addresses, and more information about each server). Today, with the growing number of servers (and mainly with stateless ones) we want to treat them like cattle. We want to manage them as a group of servers with a specific role and characteristics, but we can’t actually take care or manage each one individually.

In the database world it’s a bit more complex, while kubernetes and applications can be easily stateless, databases are still stateful and have unique characteristics. However, managing hundreds of databases (we currently have over 200 relational databases, including read replicas, plus some NoSQL ones, and we keep creating more all the time) can’t be done as pets anymore either. And the traditional DBA doesn’t have the tools to manage them like we did in the past.

Enter DBRE

DBRE stands for DataBase Reliability Engineer, and it’s exactly the profession that comes to solve these challenges. The main concepts are similar to the ones devops have: automation, self-service, availability, performance, monitoring, etc.

I’m not going to drill into any of those here, but if you think about a small team managing a large number of databases, you can see how this can’t be done in the old fashioned way of connecting to each server to perform operations, or even write scripts to do these things. We need to think big! Automation is not a single script we write to help us perform a specific operation. It’s something we need to manage as a team, improve over time, and it should do many wonderful things for us.

So DBAs are now writing more code, and not just small bash scripts, but much larger projects (that should be managed in a repository just like any other coding project). As part of that we also want to provide self-service options. If we have microservices that pop up all the time, why should we create all the databases for them? We should have an infrastructure code that will create it for us with the exact standards and configuration we want. And if we have such a tool, why not expose it to developers so they can run it?

Now how about upgrades? The same. If we have a standard procedure for upgrades, let’s automate it. And once it’s automated, let’s expose it as a self service option and shift left the responsibility to the developers (or support, or operations, or whoever that might be).

Diagnostic scripts? Alerts? Anything else we do regularly? It’s all the same and should be added to our toolkit.

Our Team

Our team is responsible for all of the databases of the microservices that build the monday.com service. In the past the team was called “data engineer”, and DBA before that. But DBA (as I explained) does not represent all we do, and we are definitely not data engineers. DBRE is the role we perform now, even though we are not there yet. We are writing more code, more automations, and we are trying to build more solutions and will expose as much as possible as a self service in the future. This transition is not easy or fast, but it’s super challenging and interesting.

My Concern

With all of this shift, I do have one concern. In my over 20 years of experience as a consultant, I have seen many roles change and the one that bummed me out the most was the disappearance of the sysadmins. Many companies in the past had physical servers and server rooms, and therefore had linux/UNIX sysadmins who knew the bits and bytes of the operating system. With the move to the cloud and DevOps, it seems that this has been lost a bit. As a consultant, I’ve seen companies that lost this knowledge and if something happened or a server misbehaved, they just spun up a new server to replace it. This is cool from an operational perspective, but valuable knowledge was lost.

With the overall move to DBRE in the world, I hope the same loss will not occur. The in-depth knowledge of databases should stay, with people understanding the internal structures, behavior, and mechanisms of the database, so we can still figure out what’s going on and solve problems because we understand them, and not by simply changing the server class or adding a new replica.

And with that, I’ll do my best to continue understanding and researching how things work, and I’ll keep challenging myself and the team to preserve the knowledge we gained as “old-school” DBAs as we transition towards DBREs.