-
Notifications
You must be signed in to change notification settings - Fork 1k
Improve Display
for DataType
and Field
#8290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Display
and Debug
for DataType
Display
and Debug
for DataType
and Field
arrow-schema/src/datatype_format.rs
Outdated
|
||
impl fmt::Display for DataType { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
// NOTE: `Display` and `Debug` formatting are ALWAYS the same, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more common in Rust code to have the Debug implementation try and print out something close to the underlying representation, and Display
is for human consumption
Specifically https://doc.rust-lang.org/std/fmt/trait.Debug.html
Debug should format the output in a programmer-facing, debugging context.
Generally speaking, you should just derive a Debug implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is uncommon, but unfortunately so many error messages in datafusion and arrow use the Debug
formatting of DataType
instead of Display
, which means we end up with huge difficult-to-read error messages.
There are three solutions to this, afaict:
A) Use Display
=Debug
, like this PR.
It's still programmer-facing, because it contains ALL the info (metadata etc)
B) Replace all uses of {:?}
with {}
when printing datatypes in datafusion, arrow, and other third party crates.
This is VERY hard to do, as I know of no automated tool to find all these places.
C) Improve Debug
formatting by omitting empty/default fields. This will help, but the Debug format for DataType::List
will still be very ugly, since it wraps a Field
.
(or maybe I mistakingly think a lot of places use Debug
instead of Display
because the old Display
implementation for List
used the Debug
formatting…)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to the code to motivate this choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
B) Replace all uses of {:?} with {} when printing datatypes in datafusion, arrow, and other third party crates.
This is VERY hard to do, as I know of no automated tool to find all these places.
I think grepping for type:?}
and {:?}
would catch most of them. Maybe a good think to ask some AI tool to do
(venv) andrewlamb@Andrews-MacBook-Pro-3:~/Software/arrow-rs$ grep -r 'type:?' `find . -name '*.rs'`
./arrow-schema/src/datatype_parse.rs: println!("Input '{data_type_string}' ({data_type:?})");
./arrow-schema/src/datatype_parse.rs: println!("Parsing '{data_type_string}', expecting '{expected_data_type:?}'");
./arrow-data/src/transform/run.rs: _ => panic!("Invalid run end type for RunEndEncoded array: {run_end_type:?}"),
./arrow-data/src/transform/run.rs: _ => panic!("Invalid run end type for RunEndEncoded array: {dest_run_end_type:?}",),
./arrow-ipc/src/compression.rs: "compression type {other_type:?} not supported "
./arrow-string/src/like.rs: "{value_type:?} «{value}» like {pattern_type:?} «{pattern}»"
./arrow-string/src/like.rs: "{value_type:?} «{value}» ilike {pattern_type:?} «{pattern}»"
./arrow-string/src/like.rs: "{value_type:?} «{value}» nlike {pattern_type:?} «{pattern}»"
./arrow-string/src/like.rs: "{value_type:?} «{value}» nilike {pattern_type:?} «{pattern}»"
./arrow-csv/src/reader/mod.rs: "Unsupported dictionary key type {key_type:?}"
./arrow-row/src/list.rs: "Expected FixedSizeListArray, found: {list_type:?}",
./arrow-array/src/array/fixed_size_list_array.rs: panic!("FixedSizeListArray data should contain a FixedSizeList data type, got {data_type:?}")
./arrow-array/src/array/primitive_array.rs: write!(f, "PrimitiveArray<{data_type:?}>\n[\n")?;
./arrow-array/src/array/primitive_array.rs: "Cast error: Failed to convert {v} to temporal for {data_type:?}"
./arrow-array/src/array/primitive_array.rs: "Cast error: Failed to convert {v} to temporal for {data_type:?}"
./arrow-array/src/record_batch.rs: "column types must match schema types, expected {field_type:?} but found {col_type:?} at column index {i}")));
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" doesn't expect buffer at index 0. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 2 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 2 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 2 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 2 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 3 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 3 buffers, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 1 buffer, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" expects 2 buffer, but requested {i}. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" doesn't expect buffer at index 0. Please verify that the C data interface is correctly implemented."
./arrow-array/src/ffi.rs: "The datatype \"{data_type:?}\" is still not supported in Rust implementation"
./arrow-array/src/builder/mod.rs: panic!("Data type {t:?} with key type {key_type:?} is not currently supported")
./arrow-cast/src/cast/mod.rs: "Casting from dictionary type {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from type {from_type:?} to dictionary type {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported"
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported"
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported",
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported"
./arrow-cast/src/cast/mod.rs: "Casting from {from_type:?} to {to_type:?} not supported"
./arrow-cast/src/cast/dictionary.rs: "Unsupported type {to_index_type:?} for dictionary index"
./arrow-cast/src/cast/dictionary.rs: "Unsupported output type for dictionary packing: {dict_value_type:?}"
./parquet/benches/arrow_reader_row_filter.rs: let benchmark_name = format!("{filter_type:?}/{proj_case}",);
./parquet/src/record/reader.rs: "Map key type is expected to be a primitive type, but found {key_type:?}"
./parquet/src/arrow/arrow_reader/mod.rs: "data type: {data_type:?}, expected: {expected_err}, got: {err}"
./parquet/src/arrow/arrow_reader/mod.rs: "data type: {data_type:?}, expected: {expected_err}, got: {err}"
./parquet/src/arrow/arrow_writer/mod.rs: "Attempting to write an Arrow type {data_type:?} to parquet that is not yet implemented"
./parquet/src/arrow/buffer/view_buffer.rs: _ => panic!("Unsupported data type: {data_type:?}"),
./parquet/src/schema/visitor.rs: panic!("{list_type:?} is a list type and must be a group type")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we really need to enforce this is:
But I've done my best in this PR, and in
Display
and Debug
for DataType
and Field
Display
for DataType
and Field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @emilk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}; | ||
assert_eq!( | ||
t, | ||
r#"Casting from Map(Field { "entries": Struct(key Utf8, value nullable Utf8) }, false) to Map(Field { "entries": Struct(key Utf8, value Utf8) }, true) not supported"# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is certainly much nicer!
Green! |
We can merge after #7836. |
#7836 is merged! |
Looks like we should've synced this branch with |
Maybe a merge queue would make sense? |
Oops! Fix in: We should be able to just cherry-pick 6de27e6 to its own PR |
+1 |
Quick fix: |
* Fixes merge-race induced problem from #8290
# Rationale for this change Despite us having `Display` implementations for `DataType`, a lot of error messages still use `Debug`. See for instance: * apache/datafusion#17565 * #8290 Therefor I want to make sure the `Debug` formatting of `Field` (and, by extension, `DataType`) is not _utterly horrible_. This PR makes things… slightly better. # What changes are included in this PR? Omits fields of `Field` that have their "default" values. # Are these changes tested? Yes, there are new tests. # Are there any user-facing changes? Though this changes the `Debug` formatting, I would NOT consider this a breaking change, because nobody should rely on consistent `Debug` formatting. See for instance https://doc.rust-lang.org/std/fmt/trait.Debug.html#stability
# Which issue does this PR close? * Follows #8290 (merge that first, and the diff of this PR will drop) * #7469 * Part of #8351 # Rationale for this change We would previously format structs like so: `Struct(name1 type1, name2 nullable type2)` This will break badly whenever the field name is anything but a simple identifier. In other words: it allows [string injection](https://xkcd.com/327/) if the field name contains an end-paranthesis. Except for that, it is also difficult to debug mistakingly bad field names like " " or "\n". # What changes are included in this PR? We change the `Display` and `Debug` formatting of `Struct` **Before**: `Struct(name1 type1, name2 nullable type2)` **After**: `Struct("name1": type1, "name2": nullable type2)` # Are these changes tested? Yes - I've updated the existing tests. # Are there any user-facing changes? Yes, changing the `Display` formatting is a **breaking change**
This is part of an attempt to improve the error reporting of
arrow-rs
,datafusion
, and any other 3rd party crates.I believe that error messages should be as readable as possible. Aim for
rustc
more thangcc
.Here's an example of how this PR improves some existing error messages:
Before:
After:
Which issue does this PR close?
DataType::List
#7048DataType::List
#7051DataType::Struct
field names inDisplay
formatting #8291Display
formatting ofDataType
:s in error messages datafusion#17565Display
forDataType
#8351Rationale for this change
DataType:s are often shown in error messages. Making these error messages readable is very important.
What changes are included in this PR?
UnifyDebug
andDisplay
TheDisplay
andDebug
ofDataType
are now the SAME.Why? Both are frequently used in error messages (both in arrow, anddatafusion
), and both benefit from being readable yet reversible.Reverted based on PR feedback. I will try to improve the
Debug
formatting in a future PR, with clever use of https://doc.rust-lang.org/std/fmt/struct.Formatter.html#method.debug_structImprove
Display
of listsImprove the
Display
formatting ofDataType::List
DataType::LargeList
DataType::FixedSizeList
Before:
List(Field { name: \"item\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })
After:
List(nullable Int32)
Before:
FixedSizeList(Field { name: \"item\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 5)
After:
FixedSizeList(5 x Int32)
Better formatting of
DataType::Struct
The formatting of
Struct
is now reversible, including nullability and metadata.ImproveDebug
format ofField
Best understood with this diff for an existing test:EDIT: reverted
Are these changes tested?
Yes - new tests cover them
Are there any user-facing changes?
Display/to_string
has changed, and so this is a BREAKING CHANGE.Care has been taken that the formatting contains all necessary information (i.e. is reversible), though the actual
FromStr
implementation is still not written (it is missing onmain
, and missing in this PR - so no change).Let me know if I went to far… or not far enough 😆