TabSeparated
Input | Output | Alias |
---|---|---|
✔ | ✔ | TSV |
Description
In TabSeparated format, data is written by row. Each row contains values separated by tabs. Each value is followed by a tab, except the last value in the row, which is followed by a line feed. Strictly Unix line feeds are assumed everywhere. The last row also must contain a line feed at the end. Values are written in text format, without enclosing quotation marks, and with special characters escaped.
This format is also available under the name TSV
.
The TabSeparated
format is convenient for processing data using custom programs and scripts. It is used by default in the HTTP interface, and in the command-line client’s batch mode. This format also allows transferring data between different DBMSs. For example, you can get a dump from MySQL and upload it to ClickHouse, or vice versa.
The TabSeparated
format supports outputting total values (when using WITH TOTALS) and extreme values (when ‘extremes’ is set to 1). In these cases, the total values and extremes are output after the main data. The main result, total values, and extremes are separated from each other by an empty line. Example:
SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT TabSeparated
2014-03-17 1406958
2014-03-18 1383658
2014-03-19 1405797
2014-03-20 1353623
2014-03-21 1245779
2014-03-22 1031592
2014-03-23 1046491
1970-01-01 8873898
2014-03-17 1031592
2014-03-23 1406958
Data Formatting
Integer numbers are written in decimal form. Numbers can contain an extra “+” character at the beginning (ignored when parsing, and not recorded when formatting). Non-negative numbers can’t contain the negative sign. When reading, it is allowed to parse an empty string as a zero, or (for signed types) a string consisting of just a minus sign as a zero. Numbers that do not fit into the corresponding data type may be parsed as a different number, without an error message.
Floating-point numbers are written in decimal form. The dot is used as the decimal separator. Exponential entries are supported, as are ‘inf’, ‘+inf’, ‘-inf’, and ‘nan’. An entry of floating-point numbers may begin or end with a decimal point. During formatting, accuracy may be lost on floating-point numbers. During parsing, it is not strictly required to read the nearest machine-representable number.
Dates are written in YYYY-MM-DD format and parsed in the same format, but with any characters as separators.
Dates with times are written in the format YYYY-MM-DD hh:mm:ss
and parsed in the same format, but with any characters as separators.
This all occurs in the system time zone at the time the client or server starts (depending on which of them formats data). For dates with times, daylight saving time is not specified. So if a dump has times during daylight saving time, the dump does not unequivocally match the data, and parsing will select one of the two times.
During a read operation, incorrect dates and dates with times can be parsed with natural overflow or as null dates and times, without an error message.
As an exception, parsing dates with times is also supported in Unix timestamp format, if it consists of exactly 10 decimal digits. The result is not time zone-dependent. The formats YYYY-MM-DD hh:mm:ss
and NNNNNNNNNN
are differentiated automatically.
Strings are output with backslash-escaped special characters. The following escape sequences are used for output: \b
, \f
, \r
, \n
, \t
, \0
, \'
, \\
. Parsing also supports the sequences \a
, \v
, and \xHH
(hex escape sequences) and any \c
sequences, where c
is any character (these sequences are converted to c
). Thus, reading data supports formats where a line feed can be written as \n
or \
, or as a line feed. For example, the string Hello world
with a line feed between the words instead of space can be parsed in any of the following variations:
Hello\nworld
Hello\
world
The second variant is supported because MySQL uses it when writing tab-separated dumps.
The minimum set of characters that you need to escape when passing data in TabSeparated format: tab, line feed (LF) and backslash.
Only a small set of symbols are escaped. You can easily stumble onto a string value that your terminal will ruin in output.
Arrays are written as a list of comma-separated values in square brackets. Number items in the array are formatted as normally. Date
and DateTime
types are written in single quotes. Strings are written in single quotes with the same escaping rules as above.
NULL is formatted according to setting format_tsv_null_representation (default value is \N
).
In input data, ENUM values can be represented as names or as ids. First, we try to match the input value to the ENUM name. If we fail and the input value is a number, we try to match this number to ENUM id. If input data contains only ENUM ids, it's recommended to enable the setting input_format_tsv_enum_as_number to optimize ENUM parsing.
Each element of Nested structures is represented as an array.
For example:
CREATE TABLE nestedt
(
`id` UInt8,
`aux` Nested(
a UInt8,
b String
)
)
ENGINE = TinyLog
INSERT INTO nestedt Values ( 1, [1], ['a'])
SELECT * FROM nestedt FORMAT TSV
1 [1] ['a']
Example Usage
Format Settings
Setting | Description | Default |
---|---|---|
format_tsv_null_representation | Custom NULL representation in TSV format. | \N |
input_format_tsv_empty_as_default | treat empty fields in TSV input as default values. For complex default expressions input_format_defaults_for_omitted_fields must be enabled too. | false |
input_format_tsv_enum_as_number | treat inserted enum values in TSV formats as enum indices. | false |
input_format_tsv_use_best_effort_in_schema_inference | use some tweaks and heuristics to infer schema in TSV format. If disabled, all fields will be inferred as Strings. | true |
output_format_tsv_crlf_end_of_line | if it is set true, end of line in TSV output format will be \r\n instead of \n . | false |
input_format_tsv_crlf_end_of_line | if it is set true, end of line in TSV input format will be \r\n instead of \n . | false |
input_format_tsv_skip_first_lines | skip specified number of lines at the beginning of data. | 0 |
input_format_tsv_detect_header | automatically detect header with names and types in TSV format. | true |
input_format_tsv_skip_trailing_empty_lines | skip trailing empty lines at the end of data. | false |
input_format_tsv_allow_variable_number_of_columns | allow variable number of columns in TSV format, ignore extra columns and use default values on missing columns. | false |