ClickHouse: what is behind the fastest open source columnar database
Clarendon Room D | Tue 14 Mar 10:45 a.m.–11:30 a.m.
Presented by
-
Troy started life as a Java developer working on rules engines but after many years of 'making it work in IE6' he joined Salesforce where he found a home at Heroku. Today he runs a team of Solution Architects who focus on a whole collection of open source data tools running in public clouds.
-
Olena is a software engineer and a developer advocate currently working at Aiven. She is passionate about open source, data, sustainable software development and team work. Her knowledge is shaped by expertise she acquired working in such companies as Nokia, HERE Technologies and AWS and now Aiven. She is one of Apache Kafka Catalysts and holds two AWS certifications -
AWS Certified Developer and AWS Certified Solutions Architect.
Abstract
An open source columnar database ClickHouse is in many ways exceptional - it is exceptionally fast, exceptionally efficient, but also, at times exceptionally confusing.
Its approach to handling data goes against many principles and concepts that we use in other databases. To give some examples: its primary index doesn't index each row and doesn't guarantee uniqueness; a secondary index is used to skip data and doesn't point to specific rows; JOINS is a complex topic and transactions are supported partially, not to mention that its SQL dialect holds a couple of surprises up its sleeve.
But, all that said, if used correctly, ClickHouse is a superb solution for online analytical processing (OLAP).
The goal of this talk is to help you get the most of ClickHouse and avoid the pitfalls. We'll talk about OLAP and columnar databases. We'll touch topics of indexing, searching and disk storage. We'll look at the reasons behind the most puzzling concepts of ClickHouse, so that by the end of the talk you find them not only logical, but maybe even fascinating.
If your challenge is analysing terabytes of data - this talk is for you. If you're a data scientist looking for tools to work with big data - this talk is for you. And, of course, if you are just curious about what makes ClickHouse crazy fast - this talk is for you as well.
YouTube: https://www.youtube.com/watch?v=b5E-8YkutJY
LA Archive: http://mirror.linux.org.au/pub/everythingopen/2023/clarendon_room_d/Tuesday/ClickHouse_what_is_behind_the_fastest_open_source_columnar_database.webm
An open source columnar database ClickHouse is in many ways exceptional - it is exceptionally fast, exceptionally efficient, but also, at times exceptionally confusing. Its approach to handling data goes against many principles and concepts that we use in other databases. To give some examples: its primary index doesn't index each row and doesn't guarantee uniqueness; a secondary index is used to skip data and doesn't point to specific rows; JOINS is a complex topic and transactions are supported partially, not to mention that its SQL dialect holds a couple of surprises up its sleeve. But, all that said, if used correctly, ClickHouse is a superb solution for online analytical processing (OLAP). The goal of this talk is to help you get the most of ClickHouse and avoid the pitfalls. We'll talk about OLAP and columnar databases. We'll touch topics of indexing, searching and disk storage. We'll look at the reasons behind the most puzzling concepts of ClickHouse, so that by the end of the talk you find them not only logical, but maybe even fascinating. If your challenge is analysing terabytes of data - this talk is for you. If you're a data scientist looking for tools to work with big data - this talk is for you. And, of course, if you are just curious about what makes ClickHouse crazy fast - this talk is for you as well. YouTube: https://www.youtube.com/watch?v=b5E-8YkutJY LA Archive: http://mirror.linux.org.au/pub/everythingopen/2023/clarendon_room_d/Tuesday/ClickHouse_what_is_behind_the_fastest_open_source_columnar_database.webm