What Made Our Resource Catalog Useful Before It Was “Done”?

What Made Our Resource Catalog Useful Before It Was “Done”?

Ben Yitzhaki
Ben Yitzhaki

As monday.com grew, managing infrastructure like databases and queues became more complex. Developers often had to rely on support tickets, which made the process slower and less transparent than we wanted. We heard this feedback repeatedly during a 2024 offsite focused on our internal developer platform, Sphera. I had the opportunity to lead a project that solved the issue by building a Resource Catalog, which puts developers in control of their infrastructure.

Seven months in, the Resource Catalog is not only improving visibility into the resources we own, but is also used daily by different R&D teams. It has even helped reduce developers’ mean time to respond during incidents, which, to be honest, wasn’t something we set out to solve.

In this post, we’ll dive into the motivations behind building our resource catalog and the approach we took during development. If you’re looking for ways to reduce blockers while empowering developers and enabling easy management of cloud resources, this post is for you.

How we worked (and why it didn’t scale)

Creating resources wasn’t as intuitive as creating a board in monday.com. If a developer needed a new database, they would start by writing the required Terraform files. We had a service in our internal developers platform that could automate this, but it wasn’t easy to use and usually required help from someone with experience. The process typically ended with a series of pull requests, involvement from multiple teams, and a long process.
This was clearly a frustrating flow for both sides. Developers had a hard time, and their autonomy was taken away. Tickets were often followed by some back-and-forth to make sure the right resources were being created in the right way. Moreover, Infra engineers were overwhelmed with the number of tickets and became the bottleneck.

Oh, and when the resource was ready, devs had no idea how to use it. They missed guidance and documentation. Where’s the connection string? How do we get it working locally? Which package to use? These questions came up all the time and added more frustration to the process.

It became clear that while this approach had worked in the past, it wasn’t scaling as the company grew. We wanted our infrastructure engineers to be enablers, not bottlenecks. We have a lot of trust in every developer, and that trust shaped our approach to a better solution.

How research shifted our focus

Before we got our hands dirty, we took the time to do some research. We wanted to understand whether the problem was bigger than just an overload of tickets and a slow creation process. 

We ran a series of sessions with developers from different teams and departments. What came up again and again was that since the creation process ended as a ticket, it needed a lot of manual work. That’s where creativity started to sneak in, and while that’s not always a bad thing, in this case, it led to things being done differently each time. As a result, conventions were loosely followed, and standardization became a real challenge.

We also went through our entire pool of tickets and saw the same pattern. Beyond reviewing pull requests for new resources, Infra was spending a lot of time handling requests for manual changes. Things like increasing HPA or purging queues often ended with either granting temporary permissions to developers or doing it for them. It felt like we were starting to uncover action items that could really move the needle, not just help us build a nice-looking UI for creating resources.

The last thing we did was look into existing solutions. Some companies are building SaaS tools in this space that you can simply plug and play. We drew inspiration from them, but eventually we realized that most of these tools are very resource-centric, while we needed a service-centric solution that would feel more natural for our developers. 

In addition to managing cloud resources, we wanted to go beyond that and support logical resources as well. For example, creating a consumer might involve a bundle that includes an SQS queue, a Kubernetes deployment that listens to that queue, and an SNS topic to allow sending messages to it. We also wanted to support running custom operations on those resources, thereby reducing the need for developers to request permissions or rely on manual intervention. That level of flexibility was too specific to our needs and would have required too many changes to make existing solutions work for us.

Starting small with the right wins

Since existing solutions didn’t meet our needs, we decided to build our own. We created a new monorepo service through our IDP, which included a microservice, a microfrontend, and a shared package for common logic. We began by focusing on visibility into existing resources, which laid the groundwork for everything that followed. This alone gave teams, for the first time, the ability to see which resources their service was using, without needing to dig through Terraform files or try to piece things together from secrets.

With our top pains in mind, we decided to focus on quick wins. We started by adding custom actions to resources that we knew would reduce tickets and empower developers. We chose to introduce the ability to redrive messages in SQS first, as it was one of the most frequently requested actions. As soon as we added it, it was picked up and used during incidents, helping developers respond faster by removing the usual barriers. They no longer had to request permissions to access AWS or wait for an Infra engineer to step in.

To keep things visible and compliant, we made sure every operation was recorded in our internal audit log. We also sent updates to Slack so that everyone was informed, not just the team that triggered the action, but also the on-calls and even the CX teams who might end up speaking with affected customers.

Turning a tool into a platform

Very early in the project, as developers began to feel the value they were getting, we saw growing interest from them. They began reaching out, asking to join the effort, create their own actions, or add features to existing ones.

That was the moment we realized we needed to double down on the foundations of the project. We started refactoring the freshly written code because we didn’t just want this to be a one-off solution. We wanted it to become a platform that others could easily extend.

We decided to build it using mostly generic components, driven by configurations that we could define and update with minimal effort. For example, adding a new action would involve adding it to the configuration for that resource and writing a simple handler in the backend to perform the required logic. No need to mess with UI components, expose new controllers, or touch unrelated parts of the system. This approach also makes it easier to generate new code using AI, as we are building on top of a stable structure rather than rewriting core logic or copying and pasting boilerplates.

A significant part of the motivation was to give developers more end-to-end autonomy and set things up in a way that wouldn’t require us to add work later on. We wanted this to be something the whole R&D org could use and build on, not just a tool owned and maintained by our team.

What we learned along the way

We initiated this project with the goal of simplifying resource creation for developers, reducing ticket overhead, and granting them greater autonomy. Although we haven’t yet added the ability to create resources, the catalog is already being used daily by teams across R&D.

That early adoption came from focusing on the gaps developers faced every day, rather than putting all our effort into flows we thought were the most urgent to improve. We spent time listening, reviewing tickets, and building the capabilities they actually needed.

Because we invested early in solid foundations and treated the catalog as a platform, it was easy to respond to feedback and quickly add new features. Supporting more resource types or adjusting existing ones didn’t require significant changes. The system was built to be flexible, which made a significant difference.

The impact is already clear, and there’s growing excitement about what’s coming next.