Arrow
Description
Apache Arrow comes with two built-in columnar storage formats. ClickHouse supports read and write operations for these formats.
Arrow
is Apache Arrow’s "file mode" format. It is designed for in-memory random access.
Data Types Matching
The table below shows supported data types and how they match ClickHouse data types in INSERT
and SELECT
queries.
Arrow data type (INSERT ) | ClickHouse data type | Arrow data type (SELECT ) |
---|---|---|
BOOL | Bool | BOOL |
UINT8 , BOOL | UInt8 | UINT8 |
INT8 | Int8/Enum8 | INT8 |
UINT16 | UInt16 | UINT16 |
INT16 | Int16/Enum16 | INT16 |
UINT32 | UInt32 | UINT32 |
INT32 | Int32 | INT32 |
UINT64 | UInt64 | UINT64 |
INT64 | Int64 | INT64 |
FLOAT , HALF_FLOAT | Float32 | FLOAT32 |
DOUBLE | Float64 | FLOAT64 |
DATE32 | Date32 | UINT16 |
DATE64 | DateTime | UINT32 |
TIMESTAMP , TIME32 , TIME64 | DateTime64 | UINT32 |
STRING , BINARY | String | BINARY |
STRING , BINARY , FIXED_SIZE_BINARY | FixedString | FIXED_SIZE_BINARY |
DECIMAL | Decimal | DECIMAL |
DECIMAL256 | Decimal256 | DECIMAL256 |
LIST | Array | LIST |
STRUCT | Tuple | STRUCT |
MAP | Map | MAP |
UINT32 | IPv4 | UINT32 |
FIXED_SIZE_BINARY , BINARY | IPv6 | FIXED_SIZE_BINARY |
FIXED_SIZE_BINARY , BINARY | Int128/UInt128/Int256/UInt256 | FIXED_SIZE_BINARY |
Arrays can be nested and can have a value of the Nullable
type as an argument. Tuple
and Map
types also can be nested.
The DICTIONARY
type is supported for INSERT
queries, and for SELECT
queries there is an output_format_arrow_low_cardinality_as_dictionary setting that allows to output LowCardinality type as a DICTIONARY
type.
Unsupported Arrow data types: FIXED_SIZE_BINARY
, JSON
, UUID
, ENUM
.
The data types of ClickHouse table columns do not have to match the corresponding Arrow data fields. When inserting data, ClickHouse interprets data types according to the table above and then casts the data to the data type set for the ClickHouse table column.
Example Usage
Inserting Data
You can insert Arrow data from a file into ClickHouse table by the following command:
$ cat filename.arrow | clickhouse-client --query="INSERT INTO some_table FORMAT Arrow"
Selecting Data
You can select data from a ClickHouse table and save them into some file in the Arrow format by the following command:
$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Arrow" > {filename.arrow}
Format Settings
- output_format_arrow_low_cardinality_as_dictionary - enable output ClickHouse LowCardinality type as Dictionary Arrow type. Default value -
false
. - output_format_arrow_use_64_bit_indexes_for_dictionary - use 64-bit integer type for Dictionary indexes. Default value -
false
. - output_format_arrow_use_signed_indexes_for_dictionary - use signed integer type for Dictionary indexes. Default value -
true
. - output_format_arrow_string_as_string - use Arrow String type instead of Binary for String columns. Default value -
false
. - input_format_arrow_case_insensitive_column_matching - ignore case when matching Arrow columns with ClickHouse columns. Default value -
false
. - input_format_arrow_allow_missing_columns - allow missing columns while reading Arrow data. Default value -
false
. - input_format_arrow_skip_columns_with_unsupported_types_in_schema_inference - allow skipping columns with unsupported types while schema inference for Arrow format. Default value -
false
. - output_format_arrow_fixed_string_as_fixed_byte_array - use Arrow FIXED_SIZE_BINARY type instead of Binary/String for FixedString columns. Default value -
true
. - output_format_arrow_compression_method - compression method used in output Arrow format. Default value -
lz4_frame
.