ElasticSearch — Search on steroids (the why and how)

Manan Sharma
3 min readDec 8, 2020

I just had a very close experience with ElasticSearch and MongoDB for creating a faster, responsive and scalable search experience, and yes I have been blown over. In this blog, I tell you, why you should be too 🤯

What is ElasticSearch? 🙌

It’s always good to start with some basics right?

Well for the people who are versed in the tech lingo, ElasticSearch is a real-time, distributed and open source full-text search and analytics engine for your data.

And for those of you who aren’t, ElasticSearch is search on steroids, it can scale towards petabytes of data and execute queries on it fast. Really, really fast. 🚀

Now let’s break down this definition into simple explainable chunks:

  1. Real-time: ElasticSearch indexes your data in real-time, it has a refresh rate of 1 second, that is, it checks for any changes in data after every 1 second. This makes sure there is very little lag between when your data is input and when it starts appearing on search results.
  2. Distributed: ElasticSeach can divide our indexes /collections into various shards where each shard can have one more replica.
  3. Open Source: ElasticSearch is open source and free to use, it is maintained by a very active and thriving community. Here is their GitHub repository.

The Technical Stuff 👨‍💻

If you are reading this for an interview, these are some things you should definitely know about. If not, well, this could be slightly overwhelming.

ElasticSearch is an Apache Lucene based search server which allows us to run full-text search queries over our data. It is accessible from RESTful web service interfaces and uses schema-less JSON to store data.

It is built using Java which allows it to run across platforms and helps us to explore very large amount of data at very high speeds.

How does it work? 🔮

At the core level ElasticSearch works just like a NoSQL database which stores every individual data entity as a document. A document is a collection of fields defined in the JSON format along with a unique identifier.

This storage of documents in de-normalized, that is, child data is embedded in the parent document rather than fetched using relational joins. This really improves search performance as every document has all the necessary data it requires.

What is this “indexing”? 📂

Indexing is a data structure technique which allows us to quickly retrieve records from a database. For example, if our user records have fields like name, email, age etc. But we only need frequent searches on the age of our users. We can create an index on the user records where every single entity only stores the age of a user along with a unique identifier and execute search queries on this data rather than the whole of user records, giving us faster and efficient search results.

But it is not indexing which gives ElasticSearch superpowers, it is actually the inverted index.

Inverted Index

Inverted index is, as the name suggests, the exact inverse of this technique. Here’s what we mean by it:

In the above example, indexing stores every user’s unique identifier and its corresponding age. So if I query to fetch users having the age of 25, it will scan through the documents and return those having the age of 25.

But the inverted index will store the age and it’s corresponding users, that is, for every age value in our records, it will store the corresponding user document identifiers. So now, if I query for all users with the age of 25, it does not scan every document, it just finds the age of 25 and returns all documents being referred by it. Clever isn’t it? I know. 🥇

Here is a schematic explanation:

The inverted index maps terms to documents which contain the term. This behavior is contrary to forward indexing which maps documents to their terms.

For generalization, ElasticSearch normalizes every term before storage. As an example, strings may be normalized as:

lower-casing -> remove punctuation -> split the words into terms 

This “splitting” of words is dependent on the type of search you intend to do on your data. This part goes out of scope for this story.

Conclusion 🚀

Well this was your introduction to ElasticSearch, and I hope it got you interested! 🧩

I am writing a multi-part series to completely decode ElasticSearch to the core and I hope to see you on the next part!

Happy Coding!✨

--

--

Manan Sharma

A Computer Sc. undergrad at Delhi Technological University. Trying to make an impact in this world, one keystroke at a time.