I was reading a post recently about Red Hat removing MongoDB support from Satellite (and yes, some folks say it is because of the license changes). It made me think how often over the last few years I’ve seen post after angry post about how terrible MongoDB is and how no one should ever use. However, in all this time, MongoDB has become a much more mature product. So what happened? Did all of the hate truly come from mistakes made in the early implementation/marketing of MongoDB? Or is the problem that people are blaming MongoDB for their own lack of efforts when evaluating if it was a good fit?
If you’re finding yourself at this point foaming at the mouth because it appears that I’m defending MongoDB, please jump to the end of this post and read my disclaimer.
Trendspotting
I’ve been working with software for more years at this point than I like to admit, but even then I’ve only experienced a tiny fraction of the trends that have buffeted our industry. I’ve witnessed the rise of 4GL, AOP, Agile, SOA, Web 2.0, AJAX, blockchain… the list never ends. Every year there are new trends that pop up in the world of software engineering. Some fizzle quickly, while others fundamentally change the way software development is performed.
With any new innovation that starts to gain traction, you’re going to see a general excitement appear around it as people start to pile on board, or see the buzz being generated by others and decide that they too want to get in on the action. This process was codified by Gartner with its hype cycle, which (while controversial) is a decent approximation of what happens with technologies that are eventually found to be valuable.
But every once in a while, a new innovation appears (or in this case reappears) that is driven by one particular implementation of that innovation. In the case of the NoSQL movement, it was driven heavily by the appearance, and meteoric rise, of MongoDB. MongoDB didn’t start the movement, it was the data challenges at large internet companies that really drove the return to non-relational databases. Projects like Google’s Bigtable and Facebook’s Cassandra kicked it off, but MongoDB was the most visible and accessible implementation of an open source NoSQL database that most developers had access to.
Aside: You might be thinking right now that I’m conflating document oriented databases with columnar databases, key/value stores, or any of the numerous other datastore types that fall under the generic NoSQL banner. And you are correct. But this was happening wildly at the time. Everyone was jumping into the NoSQL craze and they knew that they absolutely needed NoSQL, but didn’t really understand the different technologies involved. To many people, MongoDB was NoSQL.
And developers pounced on it. The idea of a schema-less database that used json-like documents, could run across multiple servers easily, and magically scaled to meet any challenge was quite alluring. Around 2014 or so, it seemed like everywhere you looked someone was implementing MongoDB in a place where just a year earlier a relational database like MySQL, Postgres, or SQL Server would have been used. When asked why Mongo was being used there were responses ranging from the banal “it’s web scale” to the more thoughtful “my data is very loosely structured and fits well into a schema-less database”.
It is important to remember that MongoDB, and document oriented databases in general, solve a number of problems people had with traditional relational databases:
Strict Schema – With a relational database, if your data model was dynamically shaped you were forced to either create a bunch of random “miscellaneous” data columns, shove data in as a blob of data, or use an EAV setup… all of which had significant downsides.
Difficult Scalability – With a relational database, if your data was so large that you couldn’t fit it easily into one server MongoDB had built in mechanisms like replica sets for allowing you to scale that data across multiple machines.
Difficult Schema Modifications – No migrations! With a relational database, changing the structure of the database can be a huge challenge (especially once your data gets really big). MongoDB promised to make this dramatically more simple. And it made it soooo easy to get started, you could just keep updating your schema and move really quickly.
Write Performance – MongoDB’s performance was good, especially when configured in certain ways. MongoDB’s out-of-the-box write configuration, which is one of the big things it was criticized for, allowed it to put up some impressive performance numbers.
Caveat Emptor
The potential benefits MongoDB provided were huge, especially for people facing certain classes of problems. Reading the list above without context, or experience, would lead you to believe that it truly was a game-changer when it came to database systems. The only problem was that the benefits listed above came with a number of caveats, some of which I’ve listed below.
To be fair, no one at 10gen/MongoDB Inc. would claim the items below aren’t true, they are just tradeoffs.
Loss of transactions – Transactions are a core feature of many relational databases (no, not all, but most). Having a transaction means that you can perform multiple operations atomically and you can ensure that your data will stay consistent. Sure, with a NoSQL database you can have a transaction within a single document, or you can use tactics like two-phase commits to get transaction-like semantics. But the point is you have to do this work yourself… and it can be challenging and labor intensive to get right. Often you don’t realize how much you’re giving up here until you start seeing data in your database get into invalid states because you couldn’t guarantee the atomicity of operations. Note: As many people have let me know, MongoDB 4.0 introduced transactions last year, but they come with a number of limitations. So as this post is suggesting, please evaluate whether they will work for your needs.
Loss of relational integrity (foreign keys) – If your data has relationships, then you’re going to have relations. Almost all data has some kind of relations, and if your database doesn’t enforce them, then your application is going to have to. Having a database enforce these relationships can offload a lot of work from your application, and therefore from your engineers.
Lack of ability to enforce data structure – Strong schemas might be a pain in the ass sometimes, but they can also be a powerful mechanism for ensuring that your data is well structured. If you leverage them appropriately, it provides a powerful mechanism for ensuring your data is in the shape you expect. Document databases like MongoDB allow an incredible amount of flexibility around the schema, but that flexibility offloads the responsibility onto the maintainer to keep their data clean. If you don’t put in that effort, then you end up putting a lot of code into your application to account for data that might not be in the shape you expect. As we often like to say at Simple Thread… your app is going to be rewritten one day, your data will live forever. Note: MongoDB supports schema validation, which is useful, but doesn’t provide the same guarantees that you get in a relational database. Primarily, adding or modifying the schema validation doesn’t affect any existing data in the collection, it is up to you to make sure you’re updating your data to match the new schema. So whether or not this is sufficient for your needs is up to you to determine.
Custom query language/Loss of tooling ecosystem – SQL was an absolute revolution when it came out, and nothing has changed since then. SQL is an incredibly powerful language, but one that can also be challenging. Having to query a database using a custom query language composed of JSON snippets would be considered a big step backwards by folks experienced with SQL. There is a whole world of tools that interoperate with SQL databases. Everything from IDEs to reporting tools. Moving to a database that doesn’t support SQL means you can’t use most of these tools, or you have to find a way to get your data into a SQL database so that these tools can be used, and this can be harder than you think.
Many developers who reached for MongoDB didn’t deeply understand the tradeoffs they were making, and often they dove in head-first by using it as the primary datastore for their applications. This meant that is was often incredibly costly to go back on this decision.
What could have been done differently?
Not everyone jumped in head first and slammed into the bottom of the deep end. But enough did that there will be projects for years to come removing MongoDB from places where it just didn’t fit. If many of these organizations had taken a bit of time to think methodically about the technology choices they were making, it is likely that many of them wouldn’t have made the decisions they did.
So how do you decide what technology makes sense for your use case? There have been a few attempts at creating systematic frameworks for evaluating technologies such as “A Framework for Technology Introduction in Software Organizations” and “A Framework for Evaluating Software Technologies”, but I don’t think it needs to be that complicated.
Many technologies can be reasonably evaluated by asking just two main questions, but the challenge is finding individuals who can responsibly answer them, dedicate time to answering them, and answer them without bias.
If you’re not facing some kind of problem, you don’t need a new tool. Full stop.
Question 1: What problems am I trying to solve?
If you’re not facing some kind of problem, you don’t need a new tool. Full stop. Don’t look for solutions and then back into problems. If you’re not facing a problem that a new technology doesn’t solve significantly better than your existing technology, then your decision is over. If you’re considering using this technology because you’ve seen others using it, it might be useful to think about what problems they are facing, and ask yourself if you’re facing the same problems. It is often easy to reach for a technology because you see another company using it, the difficulty is in determining whether or not you’re facing the same challenges.
Question 2: What am I giving up?
This is definitely the harder of the two questions to answer, because you have to dig in and have a good understanding of both the old technology and the new technology. Sometimes you can’t really understand a new technology until you’ve built something with it, or have access to someone who has spent significant time with the technology.
If you don’t have either, then you should be considering what is the smallest investment you can make to determine if this tool is valuable. And if you make the investment, how hard would it be to undo the decision?
Humans Always Messing Things Up
One thing you’ll have to keep in mind is that you’re going to be fighting human nature when you’re trying to answer these questions as unbiased as possible. There are a number of cognitive biases that must be overcome in order to effectively evaluate a technology, but just to name a few:
Bandwagon effect – Everyone knows this, and yet it is still hard to fight against. Just make sure that you’re choosing a technology because it solves real needs for you, not because the cool kids are doing it.
Mere newness bias – Many software developers tend to undervalue technologies they have worked with for a long time, and overvalue the benefits of a new technology. This isn’t specific to software engineers, everyone has the tendency to do this.
Feature-positive effect – We tend to see what is present, and overlook what isn’t there. This can wreak havoc when working in concert with the “Mere newness bias”, since not only are you inherently putting more value on the new technology, but you’re also overlooking the gaps of the new tech.
Looking at things objectively is a challenge, but understanding the biases that may affect you will help you make more rational decisions.
Wrap Up
When a new innovation appears (or reappears), we need to be very careful in answering two questions:
- Does this tool solve a real problem for us?
- Do we thoroughly understand the tradeoffs?
If you can’t confidently answer those two questions, take a few steps back and reevaluate.
So was MongoDB ever the right choice? Yes, of course it was; like most things in engineering, it depends. For teams that answered those two questions, many found value and continue to find value in MongoDB. For those who didn’t, hopefully they learned a valuable, not-too-painful lesson about navigating the hype cycle.
Disclaimer
I want to clarify that I neither love nor hate MongoDB. I simply haven’t run into many problems that I thought it would be the best fit for. I know that 10gen/MongoDB Inc. didn’t do themselves any favors early-on by setting unsafe defaults and promoting MongoDB everywhere (especially at hackathons) as the be-all end-all solution for every data need. Yes these were probably bad decisions, but I think it backs up the point I’m making here because these were issues that could be uncovered very quickly with even a cursory evaluation of the technology.
Loved the article? Hated it? Didn’t even read it?
We’d love to hear from you.
Nice article. Although you should probably get up to date on their tech. I believe some of the claims made here are no longer valid, stuff like data structure enforcement and transactions
Thanks for the comment, I was not aware that MongoDB implemented multi-document transactions recently! That is great to see! After a few searches I found that they have a number of limitations vs transactions in traditional databases, which I think goes back to the point of my article. I appreciate the correction though.
I was aware of the schema enforcement, but as with transactions there are limitations. For instance adding or changing the schema enforcement on a collection doesn’t change existing documents. So while this is convenient, it doesn’t provide the same guarantees you get with a relational database.
Again, I’m not saying MongoDB is bad, just that it is critical to do the research and understand the limitations.
I would just like to add that some of these traditional features such as schema record consistency and foreign keys are just not as important anymore. In fact they are a fundamental coupling of data api and data storage.
This is the approach of mongodb. I believe that mongodb assumes that either the dB will only be used by 1 app, or there will be a service on top of it. The service can enforce schema and data relationships. With today’s best practices those traditional features aren’t as necessary and actually produce overhead. Which is why I think Mongodb has such great performance.
It reminds me of the Go programming language philosophy. They only implemented the necessary parts and nothing superfluous or not 100% needed.
You’re praising the lack of structure as if it were a good thing. Let’s see if you keep with the same opinion after having worked in several companies across different teams. I am most certainly not on your side on this one.
Mongo is nice, but my default choice is Couchbase.
At 10gen they understood from the beginning what most still have not caught up to: most of the data people work with really just doesn’t matter very much.
That list of “other stuff people bought” that you see at the bottom of the Ebay and Amazon pages: how accurate does it need to be? What matters most, overwhelmingly, is that it doesn’t delay putting up the page.
Mongo had to have transactions because it is Publicly Traded now, but turning on Transactions and Durable Writes slows things down, sometimes a lot. People did not choose Mongo because they wanted slower pages. The large majority of users still use transactions rarely, if ever, and are happier that way.
If your answers really need to be right every time, and if every write has to be still there after any system failure, then by all means use whatever transaction machinery you have access to. Most things just don’t matter much, and good enough is good enough. For those things, convenience and expedience should win.
Just be sure to keep track which is which.
Yes! I couldn’t agree more, and very well put. Whether or not a tool works for you is completely dependent on your use case.
This might be true if you spend your days writing recommendation engines or “user intelligence” software, but many of us still write code where the underlying data has to be correct, every time. Think about anything to do with payments, for example.
You can use other mechanisms to ensure the data is recorded in those cases. Like using message queues with retries and idempotent writes.
Bolting extra technologies, like message queues, unto your stack just so you can avoid running a database that properly supports transactions is not a great idea. If you don’t give a damn about your data, as Nathan Myers suggests, then its just extra development, maintenance and overhead for nothing. If you DO give a damn about your data then its just extra development, maintenance and overhead to do something that would be better done by a different database.
You definitely need to bolt extra technologies to make sure your system is up and running all the time. What happens if your Database is down in production. Or what if your database throws exception during a sudden spike. And then you are doomed. Introducing queues is such a good concept and in our project we are benefitting hugely by using queues. We are doing exactly what Jaime mentioned.
Thank you!
I guess your research didn’t uncover that they have BI Connector which allows querying with SQL and using SQL based tools with mongo.
The BI Connector is not part of core Mongo, only as part of their “Enterprise Advanced” product it appears. It also wasn’t around during Mongo’s meteoric rise. But I think that all misses the point. My point isn’t that MongoDB is bad, it is that people didn’t evaluate it properly, and then got upset when they got “burned” by it. Of course MongoDB Inc. is going to attempt to address any limitation of their platform by releasing tools like the BI Connector, any sane company would.
Yes, its called “PostgreSQL”.
“If you’re not facing a problem that a new technology doesn’t solve significantly better than your existing technology, then your decision is over.”
From my own experience the problem is that at the moment the decision is made, the people making it truly believe that the new technology solves the problem significantly better. Even when it does not.
No one likes to admit they are following a trend. Sure other people might behave like so, but they have truly thought this over and their problem must be solved with NoSQL or blockchain, or whatever. They have considered the pros and cons, it just happens that their needs align with the latest tech fashion.
It seems to me that as backend engineering has become a “nice to have” rather than a must for many developers, the hope and expectation that migrations and the like would finally be abstracted away by some new technology has grown. What could be greater than to find out that you don’t have to figure out how to structure or restructure an old-school relational database schema for scalability and flexibility? It seems like a promise most devs below a certain age would love to believe.
Of course, most devs above a certain age are probably no safer assuming that their old RDBMS skills will never go out of style.
You touched on some important phenomenon like the bandwagon effect and that leads into the low-information developer effect. This gives rise to to frameworks like Ruby on Rails whose popularity has everything to do with the application generator and little or nothing to do with Ruby.
The idea is to enable those with domain expertise to deliver successful application without becoming expert developers or system operators. These communities then outgrow the training wheels and drive changes required outside the “typical” application and are now always successful. MongoDB, like Ruby on Rails, is in that camp. I’ve got no use for it, nor do I have any use for Ruby on Rails. It’s inferior for large scale applications and does not solve a problem (lacking in expertise) for other applications. The licensing issue makes it a total non-starter for most in the service business, but that’s another issue.
MongoDB is Stone Soup. This is the beggar who shows up in the town square promising to create a wonderful soup with nothing but water and stones. But you know, it could use a little salt, as a villager runs off to fetch some. Perhaps some more spices, and some carrots will enhance the product? Now only if we had some tender meat… ahhh.
Document databases are very useful and will continue to expand in cloud infrastructures, but I cannot imagine voluntarily signing up for vendor-lock-in product (where they literally own your entire stack)
MongoDB is highly dependent on either low-information customers or those with no intent of ever providing their own service. All major cloud vendors off this and the open-source version of MongoDB is more than sufficient for applications that MongoDB was actually designed for.
This article is very sensible and touches a lot of true points. The only point which I think is a little more nuanced than it seems at first is the bandwagon effect. Indeed, I do agree that a technology should not be chosen because all the cool kids are using it. Yet, there is strength in numbers, not because there is a direct relation between popularity and adequacy, but because of viability.
Besides Question 1 and Question 2, I add another question, “will this technology be around, and supported, for the lifetime of my application?” This is an exceedingly difficult question to answer (Microsoft’s Silverlight gave really good answers to Question 1 and 2 for LOB applications, but I doubt anyone would guess Microsoft would kill Silverlight just a few years after it started becoming really popular). It requires predicting into the future, which is always very hard. It requires estimating the lifetime of my own application, which is also something hard to predict. However, its importance cannot be overstated. Obviously, popularity is not an absolute guarantee of long life. Really popular projects also fail. However, popularity creates a network effect. The more people are using a piece of technology, the easier it is to get help. The more likely it is that it will not be a dead horse in the future. There will be more training materials, more third party utilities, more of everything. There is more assurance in going where everyone else is going as long as question 1 and 2 are satisfied, not because of the reason of the majority (the majority isn’t always right), but because of the community created by the majority (the analogy with the stone soup by Rick O’Shea is spot on – while the community becomes beneficial to the traveller, it can also be very beneficial to the villagers in the story).