Failed writing a dataframe to '.avro' file

**Prerequisites:**

- Python 3.10
- pandavro==1.8.0
- fastavro==1.9.7

**Steps to reproduce the issue:**

- Create a dataframe with the following data:

```
import pandas as pd

data = {
    'id': [545, 539, 643, 615, 502, 599, 542, 587, 537, 518],
    'first_name': ['caallai', 'Xzaaen', 'olrie', 'Iaairl', 'hfreiio', 'yieri', 'hcninn', 'irannir', 'Cmrnnan', 'Mnaeail'],
    'last_name': ['kroaoe', 'trrot', 'haill', 'kolide', 'errhnd', 'aoaoet', 'yBorrd', 'evbceyd', 'Wcnoee', 'eMloen'],
    'created_date': ['12/22/1992', '06/02/1992', '09/23/1998', '01/01/1997', '03/26/1990', '06/01/1996', '08/08/1992', '01/14/1995', '06/16/1992', '06/24/1991'],
    'Active': [False, False, False, False, False, True, False, False, False, True]
}
df = pd.DataFrame(data=data).astype('object')
```

- Attempt to save the dataframe to an _'.avro'_ file using the following command:

```
import pandavro as pdx

path = 'output.avro'
pdx.to_avro(path, df, schema=None)
```

**Expected behavior:**

The dataframe should be saved to an _'.avro'_ file without any errors.

**Actual behavior:**

The following error is raised:


```
  File "fastavro/_write.pyx", line 779, in fastavro._write.writer
  File "fastavro/_write.pyx", line 687, in fastavro._write.Writer.__init__
  File "fastavro/_schema.pyx", line 173, in fastavro._schema.parse_schema
  File "fastavro/_schema.pyx", line 407, in fastavro._schema._parse_schema
  File "fastavro/_schema.pyx", line 475, in fastavro._schema.parse_field
  File "fastavro/_schema.pyx", line 233, in fastavro._schema._parse_schema
  File "fastavro/_schema.pyx", line 263, in fastavro._schema._parse_schema
TypeError: argument of type 'NoneType' is not iterable
```

The inferred schema is:
```
{
    'fields': [
        {'name': 'id', 'type': ['null', None]},
        {'name': 'first_name', 'type': ['null', 'string']},
        {'name': 'last_name', 'type': ['null', 'string']},
        {'name': 'created_date', 'type': ['null', 'string']},
        {'name': 'Active', 'type': ['null', 'boolean']}
    ],
    'name': 'Root',
    'type': 'record'
}
```

**Additional Information:**

The issue occurs because the _"id"_ column is inferred as _['null', None]_ instead of _['null', 'int']_ when its data type is set to _object_.
When the _"id"_ column has the data type _integer_, the process of saving the _'.avro'_ file is successful.

Workaround:

As a temporary workaround, the data type of the _"id"_ column should be explicitly set to _integer_ before saving the dataframe to an _'.avro'_ file:

```
df['id'] = df['id'].astype('int')
pdx.to_avro(path, df, schema=None)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Failed writing a dataframe to '.avro' file #58

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Failed writing a dataframe to '.avro' file #58

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions