Skip to main content
Skip to main content

CSV

Description

Comma Separated Values format (RFC). When formatting, rows are enclosed in double quotes. A double quote inside a string is output as two double quotes in a row. There are no other rules for escaping characters.

  • Date and date-time are enclosed in double quotes.
  • Numbers are output without quotes.
  • Values are separated by a delimiter character, which is , by default. The delimiter character is defined in the setting format_csv_delimiter.
  • Rows are separated using the Unix line feed (LF).
  • Arrays are serialized in CSV as follows:
    • first, the array is serialized to a string as in TabSeparated format
    • The resulting string is output to CSV in double quotes.
  • Tuples in CSV format are serialized as separate columns (that is, their nesting in the tuple is lost).
$ clickhouse-client --format_csv_delimiter="|" --query="INSERT INTO test.csv FORMAT CSV" < data.csv
Note

By default, the delimiter is , See the format_csv_delimiter setting for more information.

When parsing, all values can be parsed either with or without quotes. Both double and single quotes are supported.

Rows can also be arranged without quotes. In this case, they are parsed up to the delimiter character or line feed (CR or LF). However, in violation of the RFC, when parsing rows without quotes, the leading and trailing spaces and tabs are ignored. The line feed supports: Unix (LF), Windows (CR LF) and Mac OS Classic (CR LF) types.

NULL is formatted according to setting format_csv_null_representation (the default value is \N).

In the input data, ENUM values can be represented as names or as ids. First, we try to match the input value to the ENUM name. If we fail and the input value is a number, we try to match this number to the ENUM id. If input data contains only ENUM ids, it's recommended to enable the setting input_format_csv_enum_as_number to optimize ENUM parsing.

Example Usage

Format Settings

SettingDescriptionDefaultNotes
format_csv_delimiterthe character to be considered as a delimiter in CSV data.,
format_csv_allow_single_quotesallow strings in single quotes.true
format_csv_allow_double_quotesallow strings in double quotes.true
format_csv_null_representationcustom NULL representation in CSV format.\N
input_format_csv_empty_as_defaulttreat empty fields in CSV input as default values.trueFor complex default expressions, input_format_defaults_for_omitted_fields must be enabled too.
input_format_csv_enum_as_numbertreat inserted enum values in CSV formats as enum indices.false
input_format_csv_use_best_effort_in_schema_inferenceuse some tweaks and heuristics to infer schema in CSV format. If disabled, all fields will be inferred as Strings.true
input_format_csv_arrays_as_nested_csvwhen reading Array from CSV, expect that its elements were serialized in nested CSV and then put into string.false
output_format_csv_crlf_end_of_lineif it is set to true, end of line in CSV output format will be \r\n instead of \n.false
input_format_csv_skip_first_linesskip the specified number of lines at the beginning of data.0
input_format_csv_detect_headerautomatically detect header with names and types in CSV format.true
input_format_csv_skip_trailing_empty_linesskip trailing empty lines at the end of data.false
input_format_csv_trim_whitespacestrim spaces and tabs in non-quoted CSV strings.true
input_format_csv_allow_whitespace_or_tab_as_delimiterAllow to use whitespace or tab as field delimiter in CSV strings.false
input_format_csv_allow_variable_number_of_columnsallow variable number of columns in CSV format, ignore extra columns and use default values on missing columns.false
input_format_csv_use_default_on_bad_valuesAllow to set default value to column when CSV field deserialization failed on bad value.false
input_format_csv_try_infer_numbers_from_stringsTry to infer numbers from string fields while schema inference.false