JSONObjectEachRow
Description
In this format, all data is represented as a single JSON Object, each row is represented as a separate field of this object similar to JSONEachRow format.
Example Usage
Example:
{
"row_1": {"num": 42, "str": "hello", "arr": [0,1]},
"row_2": {"num": 43, "str": "hello", "arr": [0,1,2]},
"row_3": {"num": 44, "str": "hello", "arr": [0,1,2,3]}
}
To use an object name as a column value you can use the special setting format_json_object_each_row_column_for_object_name. The value of this setting is set to the name of a column, that is used as JSON key for a row in the resulting object. Examples:
For output:
Let's say we have the table test
with two columns:
┌─object_name─┬─number─┐
│ first_obj │ 1 │
│ second_obj │ 2 │
│ third_obj │ 3 │
└─────────────┴────────┘
Let's output it in JSONObjectEachRow
format and use format_json_object_each_row_column_for_object_name
setting:
select * from test settings format_json_object_each_row_column_for_object_name='object_name'
The output:
{
"first_obj": {"number": 1},
"second_obj": {"number": 2},
"third_obj": {"number": 3}
}
For input:
Let's say we stored output from the previous example in a file named data.json
:
select * from file('data.json', JSONObjectEachRow, 'object_name String, number UInt64') settings format_json_object_each_row_column_for_object_name='object_name'
┌─object_name─┬─number─┐
│ first_obj │ 1 │
│ second_obj │ 2 │
│ third_obj │ 3 │
└─────────────┴────────┘
It also works in schema inference:
desc file('data.json', JSONObjectEachRow) settings format_json_object_each_row_column_for_object_name='object_name'
┌─name────────┬─type────────────┐
│ object_name │ String │
│ number │ Nullable(Int64) │
└─────────────┴─────────────────┘
Inserting Data
INSERT INTO UserActivity FORMAT JSONEachRow {"PageViews":5, "UserID":"4324182021466249494", "Duration":146,"Sign":-1} {"UserID":"4324182021466249494","PageViews":6,"Duration":185,"Sign":1}
ClickHouse allows:
- Any order of key-value pairs in the object.
- Omitting some values.
ClickHouse ignores spaces between elements and commas after the objects. You can pass all the objects in one line. You do not have to separate them with line breaks.
Omitted values processing
ClickHouse substitutes omitted values with the default values for the corresponding data types.
If DEFAULT expr
is specified, ClickHouse uses different substitution rules depending on the input_format_defaults_for_omitted_fields setting.
Consider the following table:
CREATE TABLE IF NOT EXISTS example_table
(
x UInt32,
a DEFAULT x * 2
) ENGINE = Memory;
- If
input_format_defaults_for_omitted_fields = 0
, then the default value forx
anda
equals0
(as the default value for theUInt32
data type). - If
input_format_defaults_for_omitted_fields = 1
, then the default value forx
equals0
, but the default value ofa
equalsx * 2
.
When inserting data with input_format_defaults_for_omitted_fields = 1
, ClickHouse consumes more computational resources, compared to insertion with input_format_defaults_for_omitted_fields = 0
.
Selecting Data
Consider the UserActivity
table as an example:
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249494 │ 5 │ 146 │ -1 │
│ 4324182021466249494 │ 6 │ 185 │ 1 │
└─────────────────────┴───────────┴──────────┴──────┘
The query SELECT * FROM UserActivity FORMAT JSONEachRow
returns:
{"UserID":"4324182021466249494","PageViews":5,"Duration":146,"Sign":-1}
{"UserID":"4324182021466249494","PageViews":6,"Duration":185,"Sign":1}
Unlike the JSON format, there is no substitution of invalid UTF-8 sequences. Values are escaped in the same way as for JSON
.
Any set of bytes can be output in the strings. Use the JSONEachRow
format if you are sure that the data in the table can be formatted as JSON without losing any information.
Usage of Nested Structures
If you have a table with Nested data type columns, you can insert JSON data with the same structure. Enable this feature with the input_format_import_nested_json setting.
For example, consider the following table:
CREATE TABLE json_each_row_nested (n Nested (s String, i Int32) ) ENGINE = Memory
As you can see in the Nested
data type description, ClickHouse treats each component of the nested structure as a separate column (n.s
and n.i
for our table). You can insert data in the following way:
INSERT INTO json_each_row_nested FORMAT JSONEachRow {"n.s": ["abc", "def"], "n.i": [1, 23]}
To insert data as a hierarchical JSON object, set input_format_import_nested_json=1.
{
"n": {
"s": ["abc", "def"],
"i": [1, 23]
}
}
Without this setting, ClickHouse throws an exception.
SELECT name, value FROM system.settings WHERE name = 'input_format_import_nested_json'
┌─name────────────────────────────┬─value─┐
│ input_format_import_nested_json │ 0 │
└─────────────────────────────────┴───────┘
INSERT INTO json_each_row_nested FORMAT JSONEachRow {"n": {"s": ["abc", "def"], "i": [1, 23]}}
Code: 117. DB::Exception: Unknown field found while parsing JSONEachRow format: n: (at row 1)
SET input_format_import_nested_json=1
INSERT INTO json_each_row_nested FORMAT JSONEachRow {"n": {"s": ["abc", "def"], "i": [1, 23]}}
SELECT * FROM json_each_row_nested
┌─n.s───────────┬─n.i────┐
│ ['abc','def'] │ [1,23] │
└───────────────┴────────┘
Format Settings
- input_format_import_nested_json - map nested JSON data to nested tables (it works for JSONEachRow format). Default value -
false
. - input_format_json_read_bools_as_numbers - allow to parse bools as numbers in JSON input formats. Default value -
true
. - input_format_json_read_bools_as_strings - allow to parse bools as strings in JSON input formats. Default value -
true
. - input_format_json_read_numbers_as_strings - allow to parse numbers as strings in JSON input formats. Default value -
true
. - input_format_json_read_arrays_as_strings - allow to parse JSON arrays as strings in JSON input formats. Default value -
true
. - input_format_json_read_objects_as_strings - allow to parse JSON objects as strings in JSON input formats. Default value -
true
. - input_format_json_named_tuples_as_objects - parse named tuple columns as JSON objects. Default value -
true
. - input_format_json_try_infer_numbers_from_strings - try to infer numbers from string fields while schema inference. Default value -
false
. - input_format_json_try_infer_named_tuples_from_objects - try to infer named tuple from JSON objects during schema inference. Default value -
true
. - input_format_json_infer_incomplete_types_as_strings - use type String for keys that contains only Nulls or empty objects/arrays during schema inference in JSON input formats. Default value -
true
. - input_format_json_defaults_for_missing_elements_in_named_tuple - insert default values for missing elements in JSON object while parsing named tuple. Default value -
true
. - input_format_json_ignore_unknown_keys_in_named_tuple - ignore unknown keys in json object for named tuples. Default value -
false
. - input_format_json_compact_allow_variable_number_of_columns - allow variable number of columns in JSONCompact/JSONCompactEachRow format, ignore extra columns and use default values on missing columns. Default value -
false
. - input_format_json_throw_on_bad_escape_sequence - throw an exception if JSON string contains bad escape sequence. If disabled, bad escape sequences will remain as is in the data. Default value -
true
. - input_format_json_empty_as_default - treat empty fields in JSON input as default values. Default value -
false
. For complex default expressions input_format_defaults_for_omitted_fields must be enabled too. - output_format_json_quote_64bit_integers - controls quoting of 64-bit integers in JSON output format. Default value -
true
. - output_format_json_quote_64bit_floats - controls quoting of 64-bit floats in JSON output format. Default value -
false
. - output_format_json_quote_denormals - enables '+nan', '-nan', '+inf', '-inf' outputs in JSON output format. Default value -
false
. - output_format_json_quote_decimals - controls quoting of decimals in JSON output format. Default value -
false
. - output_format_json_escape_forward_slashes - controls escaping forward slashes for string outputs in JSON output format. Default value -
true
. - output_format_json_named_tuples_as_objects - serialize named tuple columns as JSON objects. Default value -
true
. - output_format_json_array_of_rows - output a JSON array of all rows in JSONEachRow(Compact) format. Default value -
false
. - output_format_json_validate_utf8 - enables validation of UTF-8 sequences in JSON output formats (note that it doesn't impact formats JSON/JSONCompact/JSONColumnsWithMetadata, they always validate utf8). Default value -
false
.