-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
Prerequisites:
- Python 3.10
- pandavro==1.8.0
- fastavro==1.9.7
Steps to reproduce the issue:
- Create a dataframe with the following data:
import pandas as pd
data = {
'id': [545, 539, 643, 615, 502, 599, 542, 587, 537, 518],
'first_name': ['caallai', 'Xzaaen', 'olrie', 'Iaairl', 'hfreiio', 'yieri', 'hcninn', 'irannir', 'Cmrnnan', 'Mnaeail'],
'last_name': ['kroaoe', 'trrot', 'haill', 'kolide', 'errhnd', 'aoaoet', 'yBorrd', 'evbceyd', 'Wcnoee', 'eMloen'],
'created_date': ['12/22/1992', '06/02/1992', '09/23/1998', '01/01/1997', '03/26/1990', '06/01/1996', '08/08/1992', '01/14/1995', '06/16/1992', '06/24/1991'],
'Active': [False, False, False, False, False, True, False, False, False, True]
}
df = pd.DataFrame(data=data).astype('object')
- Attempt to save the dataframe to an '.avro' file using the following command:
import pandavro as pdx
path = 'output.avro'
pdx.to_avro(path, df, schema=None)
Expected behavior:
The dataframe should be saved to an '.avro' file without any errors.
Actual behavior:
The following error is raised:
File "fastavro/_write.pyx", line 779, in fastavro._write.writer
File "fastavro/_write.pyx", line 687, in fastavro._write.Writer.__init__
File "fastavro/_schema.pyx", line 173, in fastavro._schema.parse_schema
File "fastavro/_schema.pyx", line 407, in fastavro._schema._parse_schema
File "fastavro/_schema.pyx", line 475, in fastavro._schema.parse_field
File "fastavro/_schema.pyx", line 233, in fastavro._schema._parse_schema
File "fastavro/_schema.pyx", line 263, in fastavro._schema._parse_schema
TypeError: argument of type 'NoneType' is not iterable
The inferred schema is:
{
'fields': [
{'name': 'id', 'type': ['null', None]},
{'name': 'first_name', 'type': ['null', 'string']},
{'name': 'last_name', 'type': ['null', 'string']},
{'name': 'created_date', 'type': ['null', 'string']},
{'name': 'Active', 'type': ['null', 'boolean']}
],
'name': 'Root',
'type': 'record'
}
Additional Information:
The issue occurs because the "id" column is inferred as ['null', None] instead of ['null', 'int'] when its data type is set to object.
When the "id" column has the data type integer, the process of saving the '.avro' file is successful.
Workaround:
As a temporary workaround, the data type of the "id" column should be explicitly set to integer before saving the dataframe to an '.avro' file:
df['id'] = df['id'].astype('int')
pdx.to_avro(path, df, schema=None)
Metadata
Metadata
Assignees
Labels
No labels