How our engineers supercharged our BI capabilities: A trilogy
BigBrain

How our engineers supercharged our BI capabilities: A trilogy

mondayblog
mondayblog

Everything at monday.com starts and ends with data. We’re driven to succeed, and to do that, we must measure everything with precision and accuracy.

That’s why we built our own business intelligence (BI) solution from scratch: our beloved BigBrain. It tracks every single KPI we have here at monday.com and in the spirit of transparency, we make these numbers readily accessible to everyone on our team.

Our BigBrain team — currently 4 engineers and counting (we’re hiring!) — just wrapped up a big project to supercharge BigBrain as we embark on a new chapter of accelerated growth in our company. In this three-part series, BigBrain developer Daniel Mittelman will share how and why we moved to MemSQL and why it’s awesome.

The man, the myth, the legend: our brilliant fullstack BI developer Daniel Mittelman

Here at monday.com, we’re growing at an incredibly fast rate. More and more people are visiting our website, signing up for new accounts, and trying out the platform. In addition, an increasing number of happy customers are upgrading their plans to let more people on their team work with monday.com.

As we mentioned before, we collect and analyze data about our users’ behavior (with privacy concerns taken into account, of course). This helps us shape our business plan for years to come and make better decisions about how to improve our product.

But…there’s a cost. An increasing number of users, combined with an increasing number of available features in the product, means we’re accumulating data at an incredibly fast rate. Check it out in the chart below:

monday.com’s total number of engagement records generated by users over time.

In January 2016, we had around 50 million records in our engagement data. One year later in January 2017, this number had grown to 265 million records. We kickstarted 2018 with over 1 billion records.

This is not considered a very large amount in Big Data, but simple math dictates that by 2019, our data pool will surpass 4 billion records, and by 2020, we’ll face data storage of 15 billion records.

Armed with this knowledge, we realized that our existing Elasticsearch-based solution would reach its limits within a year. Don’t get me wrong — Elasticsearch is a great database for token-based searching. It also performs very well when given basic data analysis tasks (mainly counting and bucketing stuff.)

However, Elasticsearch lacks the ability to cope with greater challenges, such as performing aggregations over multiple indices and joining them together, or making complex funnel computations. Unsurprisingly, these more complex analyses are the ones we want most here at monday.com, as they reveal a more complete picture about user engagement.

We set out on the search for a new analytical database in mid-2017. We looked at (almost) every possible database that’s out there: from the cloud-based, ready-to-use data warehouses like Amazon’s Redshift, Google BigQuery and Microsoft’s Cosmos DB, to the expanding new field of GPU-based databases like MapD and Kinetica. We even checked out Amazon’s Athena backed by a JSON-formatted S3 bucket.

Each database had its advantages and downsides. Overwhelmed by all the options out there, we mapped our exact requirements for this new solution. Our checklist was, in descending order of importance:

  • Speed! Requests shouldn’t exceed 6 seconds in most cases, and 30 seconds at most
  • Versatile: Capable of joining different data types during a single computation
  • Distributed: Highly available with replication and fault tolerance
  • Secure: It must meet our needs for encryption, both at rest and in transit
  • Integrates with Ruby on Rails, the framework we use for backend development
  • Easily scalable: We’re growing and we don’t expect to stop anytime soon
  • SQLElasticsearch’s JSON DSL is a nightmare
  • Reasonably priced: We paid $3,600 a month for our 9-node Elasticsearch cluster and did not want to exceed this price point

Speed and versatility were our most important requirements, and are also the places where most of the cloud-based solutions fail. We tested some of our more complex queries, both in Redshift and BigQuery, and got disappointing execution times, ranging from 50 seconds to several minutes. Cosmos DB actually showed impressive benchmark results, but doesn’t have official Ruby support. Same for MapD and Kinetica.

Our last contender was a memory-based database solution called MemSQL. At first sight, its Enterprise edition seemed to match all of our requirements, including versatility and support for SQL. But the question of whether it would live up to our speed requirements still remained to be seen.

Their website boasts the database’s superior speed over many traditional database engines, but this did not sway us from putting the database to test. The results, as can be predicted by reading this blog post up to this point, were stellar, and we will dive more into the technical aspects in Part 2 (coming soon!)

Here are some real-life examples of how we use MemSQL within BigBrain for non-trivial analysis (everything beyond charts showing aggregations over time):

1. End-to-end marketing performance: Many marketing tools, like AdWords Account Manager or the Facebook Ad Manager, give detailed information about how campaigns perform by displaying the campaign’s cost and how many impressions/clicks it generated.

We took it a few steps further to include analysis of marketing sources, campaigns, and even single banners. We display the number of visitors each banner and associated campaign brought to our website, how many people signed up, and how many people eventually became paying customers.

This allows us to calculate KPIs for each segment, such as its customer acquisition cost (CAC) and the return on investment (ROI). This kind of computation requires us to join six tables, sometimes over tens of millions of records — something that our octa-core PostgreSQL database couldn’t accomplish in under a minute. MemSQL, however, completes the request in a matter of seconds.

2. Funnels: Our funnels tool allows us to construct an event-based process (out of our 1,800 different engagement types) and see how many visitors, users, and accounts pass through the process within a given timeframe.

More importantly, it lets us identify weak points where the user experience is not optimal, and in turn fine-tune it in the platform. You can think of it as our UX debugging tool. We can also segment our funnels when running A/B tests, so that we can analyze the behavior of users that are served each of the test’s variants. We build funnels using an iterative process of “join and reduce,” such that each step in the funnel requires a different OUTER JOIN to the engagement storage, with special constraints in place. Here’s an example for our signup process funnel:

Our signup process statistics in H2 2017, segmented by browser, starting with 3.6 million unique visitors. Interestingly, people who visit our website on Chrome have a statistically significant higher chance of actually signing up to our trial than Firefox. This funnel was computed in 18 seconds.

In my next post, we will dive deeper into how we store our data in MemSQL, how different parts of MemSQL work, and how we adjusted them to run our queries as quickly as possible. We will also show how we built our funnels system so that it can compute any funnel, including non-trivial ones where events don’t necessarily occur in the order specified in the funnel. (Spoiler alert — it’s not as easy as shown in most of the tutorials you will find online). In the third part, we will review the process of launching a MemSQL cluster from scratch and how to set it up as a critical piece of infrastructure in production.