CapnProto
Description
CapnProto is a binary message format similar to Protocol Buffers and Thrift, but not like JSON or MessagePack. CapnProto messages are strictly typed and not self-describing, meaning they need an external schema description. The schema is applied on the fly and cached for each query. See also Format Schema.
Data Types Matching
The table below shows supported data types and how they match ClickHouse data types in INSERT
and SELECT
queries.
CapnProto data type (INSERT ) | ClickHouse data type | CapnProto data type (SELECT ) |
---|---|---|
UINT8 , BOOL | UInt8 | UINT8 |
INT8 | Int8 | INT8 |
UINT16 | UInt16, Date | UINT16 |
INT16 | Int16 | INT16 |
UINT32 | UInt32, DateTime | UINT32 |
INT32 | Int32, Decimal32 | INT32 |
UINT64 | UInt64 | UINT64 |
INT64 | Int64, DateTime64, Decimal64 | INT64 |
FLOAT32 | Float32 | FLOAT32 |
FLOAT64 | Float64 | FLOAT64 |
TEXT, DATA | String, FixedString | TEXT, DATA |
union(T, Void), union(Void, T) | Nullable(T) | union(T, Void), union(Void, T) |
ENUM | Enum(8/16) | ENUM |
LIST | Array | LIST |
STRUCT | Tuple | STRUCT |
UINT32 | IPv4 | UINT32 |
DATA | IPv6 | DATA |
DATA | Int128/UInt128/Int256/UInt256 | DATA |
DATA | Decimal128/Decimal256 | DATA |
STRUCT(entries LIST(STRUCT(key Key, value Value))) | Map | STRUCT(entries LIST(STRUCT(key Key, value Value))) |
- Integer types can be converted into each other during input/output.
- For working with
Enum
in CapnProto format use the format_capn_proto_enum_comparising_mode setting. - Arrays can be nested and can have a value of the
Nullable
type as an argument.Tuple
andMap
types also can be nested.
Example Usage
Inserting and Selecting Data
You can insert CapnProto data from a file into ClickHouse table by the following command:
$ cat capnproto_messages.bin | clickhouse-client --query "INSERT INTO test.hits SETTINGS format_schema = 'schema:Message' FORMAT CapnProto"
Where schema.capnp
looks like this:
struct Message {
SearchPhrase @0 :Text;
c @1 :Uint64;
}
You can select data from a ClickHouse table and save them into some file in the CapnProto format by the following command:
$ clickhouse-client --query = "SELECT * FROM test.hits FORMAT CapnProto SETTINGS format_schema = 'schema:Message'"
Using autogenerated schema
If you don't have an external CapnProto schema for your data, you can still output/input data in CapnProto format using autogenerated schema. For example:
SELECT * FROM test.hits format CapnProto SETTINGS format_capn_proto_use_autogenerated_schema=1
In this case ClickHouse will autogenerate CapnProto schema according to the table structure using function structureToCapnProtoSchema and will use this schema to serialize data in CapnProto format.
You can also read CapnProto file with autogenerated schema (in this case the file must be created using the same schema):
$ cat hits.bin | clickhouse-client --query "INSERT INTO test.hits SETTINGS format_capn_proto_use_autogenerated_schema=1 FORMAT CapnProto"
The setting format_capn_proto_use_autogenerated_schema is enabled by default and applies if format_schema is not set.
You can also save autogenerated schema in the file during input/output using setting output_format_schema. For example:
SELECT * FROM test.hits format CapnProto SETTINGS format_capn_proto_use_autogenerated_schema=1, output_format_schema='path/to/schema/schema.capnp'
In this case autogenerated CapnProto schema will be saved in file path/to/schema/schema.capnp
.