ElasticSearch search performance tips

CoolGuy
5 min readSep 3, 2019

--

ElasticSearch is a very popular database for full text search and AWS ElasticSearch service naively protect us from the hassle of managing clusters. However, in a production environment, it requires a lot of prior knowledge/experience to understand the configurations for clusters, documents, indices, etc for maximizing the performance and minimizing the risk.

Recently, I’ve encountered a performance problem of ElasticSearch where we need to do large amount of queries in which each query is such small/general resulting in considerable amount of search result. For example, we have 1000 queries and each query may result in 1MB response and the latency requirement is to be under 500 milliseconds.

TLDR Lesson:

  • The batch API msearch may help reduce the roundtrip time but it does not necessarily guarantee less query time. It will be the longest time spent across sub queries.
  • Use combination of concurrent requests and the batch API
  • Set reasonable result size in the query
  • # of supported concurrent search operations per data node/instance is int((# of available_processors * 3) / 2) + 1. So assume 8 CPU cores and 3 data nodes, it means 72 operations=(8*3/2+1)*3. All other requests will be sent to the queue. And if the queue is overflow, the request will be rejected. See details at https://tinyurl.com/na4fcge
  • Keep the same # of total shards (primary + replica) as the # of data nodes to maximize throughput.
  • Use in-memory cache and enable ElasticSearch-side remote cache (field cache, request cache, query cache).

Long trip before I learnt the above lessons:

msearch vs search

When I first used msearch (with 50 sub queries), it is 5x slower than 50 concurrent requests. Then I realized that msearch is not as fast as I imagined. But when we use concurrent connections as an alternative, we need to think about the thread pools on the host, supported concurrent requests from ElasticSearch clusters, connection round trips, potential failures etc. So overall, it is not a simple Yes/No decision between msearch and search. We should design our query based on the pattern of the query and the stats of service hosts and ElasticSearch clusters.

At last, my conclusion is to mix both, meaning batch of multiple msearch queries. I chunked the list of queries into different msearch queries and send those msearch queries to ElasticSearch concurrently. Why? First of all, our first goal is to reduce the latency. This solution provideed us double parallelization. This is also kind of comprisement between msearch and concurrent search. Sending a single huge msearch query may end up with time out. Sending all queries concurrently may end up throttling our ElasticSearch cluster.

Cluster Configration

If you are still confused about the concept of documents, indices, primary shards, replicas, data nodes, master nodes, etc., you should first read this official documentaion. Below rules might not be universally true but are helpful in my use case:

Rule 1: the more shards, the higher latency. This is because each query need to hit different shards which may exist in the same nodes or different nodes, which means another network communication. However, a single shard usually should only contain data below 50GB. Unless your data is above that value, you should only have 1 primary shards for your active index.

Rule 2: the more replicas, the lower latency. This is pretty straghtfoward to understand. You have more replica, meaning more places to read the data.

Rule 3: Keep the # of primary shards + replicas same as the # of nodes so that all threads from one data node is dedicated to serve the request for the single replica/shard on that node. But if you have limited budget, that’s a different the story and this rule may not apply.

Rule 4: The more CPUs on each node, the lower latency. This is also straghtforward to understand because the # of cores decide the # of supported concurrent search request. Note that, there are also different type of thread queues where requests will be held when it’s beyond the supported concurrent requests. The default is the fixed type with 1000 length. If your pending request is larger than 1000, then the following request will be rejected by the nodes which is what we don’t want.

Rule 5: Choose proper instance type. AWS provides different types of instances such as computation-optmized, memory-optmized, general-purpose, etc. Given the similar rate, computation optmized comes with higher CPU cores but less memory whereas memory-optmized provides higher memory but less CPU cores. Our case doesn’t have much update operations (twice a day) and each query is simple enough, we found computation-optmized type to be more appropriate for us.

Cache

The above two may help the first-time queries. But to improve the availability and further reduce the latency for frequently repeated queries, you may need cache.

In-memory cache: ElasticSearch query results are usually indeterminate based on the dynamic scoring. In this case, you need to tie your cache key to the query string in a smarter way and give the cache shorter TTL. Also, for filters, the search results usually don’t change as it is either Yes or No. In this case, you can have longer TTL and it is ok to even directly use the filter value as the cache key.

Remote cache: I am not talking about the Memcache but the field cache, request cache provided by ElasticSearch itself. Field cache would help if your query is composed of filters only. Request cache is enabled with request_cache=true in the URL and would help if the query terms are usually the same. This blog and the official doc helped me a lot to understand the cache. The only thing to note is to make sure each data node has enough cache space (by default it is only 1% of heap size).

Other low-hanging fruitful tunings

Above methods helped my use case. But there are some other useful findings during my investigation:

Cluster level tuning:

  • Reduce the refresh interval, if you don’t need to update your index frequently
  • Reduce the # of primary shards, if your entire data is below 50GB.
  • Increase the cache size of each data node

Application level tuning:

  • Use _source to incldue or exclude the fields to reduce the response size as well as I/O resources
  • Nested query, parent-child doc query are expensive,
  • Timeout faster because you don’t want to reject 10 smaller queries because of one single huge query
  • Avoid aggregation, scoring unless you know that you will use those.

That’s it. These are just some simple findings during my work. I hope it can help you! If indeed, please applaude this article so that I know it helped!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Responses (1)

Write a response