./scripts/pull_all_510k.py fails while parsing delimited file

On Ubuntu 18.04 with Python 3.8, running the pull script fails with:

> Traceback (most recent call last):
>   File "./scripts/pull_all_510k.py", line 7, in <module>
>     db_510k = fda.db.get_510k_db(force_download=True)
>   File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/fda/db.py", line 24, in get_510k_db
>     db = pandas.concat(map(lambda url: load_510k_db(url, root_dir, 
>   File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
>     return func(*args, **kwargs)
>   File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 294, in concat
>     op = _Concatenator(
>   File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 348, in __init__
>     objs = list(objs)
>   File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/fda/db.py", line 24, in <lambda>
>     db = pandas.concat(map(lambda url: load_510k_db(url, root_dir, 
>   File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/fda/db.py", line 40, in load_510k_db
>     data = pandas.read_csv(io.StringIO(raw), delimiter="|")
>   File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
>     return func(*args, **kwargs)
>   File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
>     return _read(filepath_or_buffer, kwds)
>   File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 488, in _read
>     return parser.read(nrows)
>   File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1047, in read
>     index, columns, col_dict = self._engine.read(nrows)
>   File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 223, in read
>     chunks = self._reader.read_low_memory(nrows)
>   File "pandas/_libs/parsers.pyx", line 801, in pandas._libs.parsers.TextReader.read_low_memory
>   File "pandas/_libs/parsers.pyx", line 857, in pandas._libs.parsers.TextReader._read_rows
>   File "pandas/_libs/parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
>   File "pandas/_libs/parsers.pyx", line 1925, in pandas._libs.parsers.raise_parser_error
> pandas.errors.ParserError: Error tokenizing data. C error: Expected 22 fields in line 64826, saw 23

No data folder is created but this looks like there is a potential parse issue with the delimited files used in the FDA archive.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

./scripts/pull_all_510k.py fails while parsing delimited file #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

./scripts/pull_all_510k.py fails while parsing delimited file #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions