Massoud Mazar

Sharing The Knowledge

NAVIGATION - SEARCH

How replacing ElasticSearch with Azure DocumentDB (CosmosDB) turned out to be a bad idea

Disclaimer: this is my personal opinion and not the opinion of my colleagues or my employer.

History

We used to store Terra Bytes of data in ElasticSearch in form of JSON documents. As the size of data stored in cluster grew, we had to create new clusters with lots of nodes and it turned to a maintenance and cost nightmare. Microsoft Azure team suggested we move to DocumentDB to reduce the cost, and since it can scale infinitely, there won't be any maintenance needed.  

Our Use Case

We need to store JSON documents with average size of 10-15 KB. These documents are rarely written more than 2 times, but could be queried many times. At this time, we have more than 100 Million documents stored, and we add about 1 million documents per day.

Cost

Before making the change, I used the Azure cost calculator and estimated that we will need about 20,000 RUs for our normal operation. Our lack of knowledge about how DocDB works (and how it relies on Partition IDs and if your queries go cross partition you are doomed) resulted in underestimating the cost. To make the long story short, we are running at 700,000 RUs, which translates to monthly cost of $54,000.

If we used the most elaborate setup of ElasticSearch for this purpose, say, with replication factor of 3 on SSD machines, we would need about 45 VMs with 100 GB SSD per VM, at the cost of $540 per month, resulting in a total monthly cost of $24,300, which is less than half of DocumentDB.

If we used replication factor 2, with 400 GB non-SSD VMs, monthly cost would be less than $5,000. You can come up with many other combinations in between.

What a terrible deal! I hope we got a lot of new features and functionality for the extra money!

Functionality

By moving to DocumentDB, we lost a lot of functionality and features. It is almost impossible to run any query which goes cross partition. We are unable to run any quick analytics on the data. Any query on more than one partition is so expensive and slow, we just do not bother with it.

In ElasticSearch, it was just too easy to search for anything, we had a lot of wasteful operations to gather the data in real-time. When migrated to DocumentDB, we had to turn of a bunch of such operations because they are not possible anymore.

In summary, we lost a lot of functionality by moving to DocumentDB.

Scaleability

While migrating the data to DocDB, I learned that Azure team has to manually change our partition count and scale parameters to accommodate our initial needs. At the start, a new collection was initialized with 250 GB storage and assuming storage size increase is a seamless operation, I started the data migration, many hours into the migration, all instances of my migration tool stopped. I guess something was triggered on Azure back end and the capacity was reduced to 100 GB. Since it was Friday afternoon, our migration was delayed for 3 days until someone manually increased our capacity to higher value.

For the next size increase, we informed the nice folks at Azure to increase the capacity before we hit the wall, but migration crashed again because size increase required downtime.

Promises of a great future

Azure team has been genuinely trying to help us move to DocDB, and they have plans for improvements in all areas mentioned above. It's on us to make the right decision when choosing the technologies.

Comments (6) -

Jason Hedges

What is your opinion of Azure CosmosDB vs ElasticSearch if you were dealing with a much, much smaller set of data? Are queries against Cosmos as fast as ElasticSearch? Thanks!

Reply

Administrator

It is hard to answer your question without more context. What would you consider much much smaller than 100 Million documents and 1.5 TB? How much do you expect this data to grow over time? What types of queries will you be running?

What I have learned is, you should NOT count on being able to run ANY type of query other than partition based queries, unless you only have up to a million documents in the collection.

Reply

Jason Hedges

Thank you for your reply. I'm asking about smaller data sets like 10,000 - 100,000 documents. If they averaged 15kb per document then that would be 150MB to 1.5GB. In our scenario, we are looking at taking relational data that is spread out over many relational DB tables and flattening that data into a document to build a fast, searchable database for read-only queries.

Reply

Administrator

Flattening your relational data to a document for fast read-only searches makes complete sense. I would suggest ElasticSearch, and you will not need too many nodes. Unless you are sure CosmosDB is much cheaper and can perform, do not waste your time on it. Just my 2 cents.

Reply

why not try Azure search for document indexing and searching instead of ElasticSearch. Elasticsearch would be an IaaS service where you would have to manage the  infrastructure where as Azure Search Service would be PaaS with no infrastructure management. I am sure you must have come across this azure service or someone from Azure team would have recommended it.

Reply

Administrator

It's a good question which I cannot answer. The decision was made by a Microsoft Azure architect who is overseeing our adoption of Azure technologies. Since I do not have empirical data for or against Azure Search regarding cost, features, scalability and performance I cannot comment on that, but thank you for bringing it up as some readers may benefit from it.

Reply

Add comment