The Lightweight DELETE Statement
The lightweight DELETE
statement removes rows from the table [db.]table
that match the expression expr
. It is only available for the *MergeTree table engine family.
DELETE FROM [db.]table [ON CLUSTER cluster] [IN PARTITION partition_expr] WHERE expr;
It is called "lightweight DELETE
" to contrast it to the ALTER TABLE ... DELETE command, which is a heavyweight process.
Examples
-- Deletes all rows from the `hits` table where the `Title` column contains the text `hello`
DELETE FROM hits WHERE Title LIKE '%hello%';
Lightweight DELETE
does not delete data immediately
Lightweight DELETE
is implemented as a mutation that marks rows as deleted but does not immediately physically delete them.
By default, DELETE
statements wait until marking the rows as deleted is completed before returning. This can take a long time if the amount of data is large. Alternatively, you can run it asynchronously in the background using the setting lightweight_deletes_sync
. If disabled, the DELETE
statement is going to return immediately, but the data can still be visible to queries until the background mutation is finished.
The mutation does not physically delete the rows that have been marked as deleted, this will only happen during the next merge. As a result, it is possible that for an unspecified period, data is not actually deleted from storage and is only marked as deleted.
If you need to guarantee that your data is deleted from storage in a predictable time, consider using the table setting min_age_to_force_merge_seconds
. Or you can use the ALTER TABLE ... DELETE command. Note that deleting data using ALTER TABLE ... DELETE
may consume significant resources as it recreates all affected parts.
Deleting large amounts of data
Large deletes can negatively affect ClickHouse performance. If you are attempting to delete all rows from a table, consider using the TRUNCATE TABLE
command.
If you anticipate frequent deletes, consider using a custom partitioning key. You can then use the ALTER TABLE ... DROP PARTITION
command to quickly drop all rows associated with that partition.
Limitations of lightweight DELETE
Lightweight DELETE
s with projections
By default, DELETE
does not work for tables with projections. This is because rows in a projection may be affected by a DELETE
operation. But there is a MergeTree setting lightweight_mutation_projection_mode
to change the behavior.
Performance considerations when using lightweight DELETE
Deleting large volumes of data with the lightweight DELETE
statement can negatively affect SELECT query performance.
The following can also negatively impact lightweight DELETE
performance:
- A heavy
WHERE
condition in aDELETE
query. - If the mutations queue is filled with many other mutations, this can possibly lead to performance issues as all mutations on a table are executed sequentially.
- The affected table has a very large number of data parts.
- Having a lot of data in compact parts. In a Compact part, all columns are stored in one file.
Delete permissions
DELETE
requires the ALTER DELETE
privilege. To enable DELETE
statements on a specific table for a given user, run the following command:
GRANT ALTER DELETE ON db.table to username;
How lightweight DELETEs work internally in ClickHouse
-
A "mask" is applied to affected rows
When a
DELETE FROM table ...
query is executed, ClickHouse saves a mask where each row is marked as either “existing” or as “deleted”. Those “deleted” rows are omitted for subsequent queries. However, rows are actually only removed later by subsequent merges. Writing this mask is much more lightweight than what is done by anALTER TABLE ... DELETE
query.The mask is implemented as a hidden
_row_exists
system column that storesTrue
for all visible rows andFalse
for deleted ones. This column is only present in a part if some rows in the part were deleted. This column does not exist when a part has all values equal toTrue
. -
SELECT
queries are transformed to include the maskWhen a masked column is used in a query, the
SELECT ... FROM table WHERE condition
query internally is extended by the predicate on_row_exists
and is transformed to:SELECT ... FROM table PREWHERE _row_exists WHERE condition
At execution time, the column
_row_exists
is read to determine which rows should not be returned. If there are many deleted rows, ClickHouse can determine which granules can be fully skipped when reading the rest of the columns. -
DELETE
queries are transformed toALTER TABLE ... UPDATE
queriesThe
DELETE FROM table WHERE condition
is translated into anALTER TABLE table UPDATE _row_exists = 0 WHERE condition
mutation.Internally, this mutation is executed in two steps:
-
A
SELECT count() FROM table WHERE condition
command is executed for each individual part to determine if the part is affected. -
Based on the commands above, affected parts are then mutated, and hardlinks are created for unaffected parts. In the case of wide parts, the
_row_exists
column for each row is updated, and all other columns' files are hardlinked. For compact parts, all columns are re-written because they are all stored together in one file.
From the steps above, we can see that lightweight
DELETE
using the masking technique improves performance over traditionalALTER TABLE ... DELETE
because it does not re-write all the columns' files for affected parts. -