ORC
Description
Apache ORC is a columnar storage format widespread in the Hadoop ecosystem.
Data Types Matching
The table below shows supported data types and how they match ClickHouse data types in INSERT
and SELECT
queries.
ORC data type (INSERT ) | ClickHouse data type | ORC data type (SELECT ) |
---|---|---|
Boolean | UInt8 | Boolean |
Tinyint | Int8/UInt8/Enum8 | Tinyint |
Smallint | Int16/UInt16/Enum16 | Smallint |
Int | Int32/UInt32 | Int |
Bigint | Int64/UInt32 | Bigint |
Float | Float32 | Float |
Double | Float64 | Double |
Decimal | Decimal | Decimal |
Date | Date32 | Date |
Timestamp | DateTime64 | Timestamp |
String , Char , Varchar , Binary | String | Binary |
List | Array | List |
Struct | Tuple | Struct |
Map | Map | Map |
Int | IPv4 | Int |
Binary | IPv6 | Binary |
Binary | Int128/UInt128/Int256/UInt256 | Binary |
Binary | Decimal256 | Binary |
- Other types are not supported.
- Arrays can be nested and can have a value of the
Nullable
type as an argument.Tuple
andMap
types also can be nested. - The data types of ClickHouse table columns do not have to match the corresponding ORC data fields. When inserting data, ClickHouse interprets data types according to the table above and then casts the data to the data type set for the ClickHouse table column.
Example Usage
Inserting Data
You can insert ORC data from a file into ClickHouse table by the following command:
$ cat filename.orc | clickhouse-client --query="INSERT INTO some_table FORMAT ORC"
Selecting Data
You can select data from a ClickHouse table and save them into some file in the ORC format by the following command:
$ clickhouse-client --query="SELECT * FROM {some_table} FORMAT ORC" > {filename.orc}
Format Settings
- output_format_arrow_string_as_string - use Arrow String type instead of Binary for String columns. Default value -
false
. - output_format_orc_compression_method - compression method used in output ORC format. Default value -
none
. - input_format_arrow_case_insensitive_column_matching - ignore case when matching Arrow columns with ClickHouse columns. Default value -
false
. - input_format_arrow_allow_missing_columns - allow missing columns while reading Arrow data. Default value -
false
. - input_format_arrow_skip_columns_with_unsupported_types_in_schema_inference - allow skipping columns with unsupported types while schema inference for Arrow format. Default value -
false
.
To exchange data with Hadoop, you can use HDFS table engine.