Skip to main content

Improving Map Lookup Performance in ClickHouse

· 3 min read

Problem

Map lookup such as a['key'] works with linear complexity (mentioned here) and can be inefficient. This is because selecting a value with a specific key from a table would require iterating through all keys (~M) across all rows (N) in the Map column, resulting in ~MxN lookups.

Does ClickHouse support multi-region replication?

· One min read

The short answer is "yes". However, we recommend keeping latency between all regions/datacenters in two-digit range, otherwise write performance will suffer as it goes through distributed consensus protocol. For example, replication between US coasts will likely work fine, but between the US and Europe won't.

What is a columnar database?

· 2 min read

A columnar database stores the data of each column independently. This allows reading data from disk only for those columns used in any given query. The cost is that operations that affect whole rows become proportionally more expensive. A columnar database is a synonym for a column-oriented database management system. ClickHouse is a typical example of such a system.

What does “ClickHouse” mean?

· One min read

It’s a combination of "Clickstream" and "Data wareHouse". It comes from the original use case at Yandex.Metrica, where ClickHouse was supposed to keep records of all clicks by people from all over the Internet, and it still does the job. You can read more about this use case on ClickHouse history page.