Skip to content

./scripts/pull_all_510k.py fails while parsing delimited file #4

@tsbischof

Description

@tsbischof

On Ubuntu 18.04 with Python 3.8, running the pull script fails with:

Traceback (most recent call last):
File "./scripts/pull_all_510k.py", line 7, in
db_510k = fda.db.get_510k_db(force_download=True)
File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/fda/db.py", line 24, in get_510k_db
db = pandas.concat(map(lambda url: load_510k_db(url, root_dir,
File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 294, in concat
op = _Concatenator(
File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 348, in init
objs = list(objs)
File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/fda/db.py", line 24, in
db = pandas.concat(map(lambda url: load_510k_db(url, root_dir,
File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/fda/db.py", line 40, in load_510k_db
data = pandas.read_csv(io.StringIO(raw), delimiter="|")
File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 488, in _read
return parser.read(nrows)
File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1047, in read
index, columns, col_dict = self._engine.read(nrows)
File "/home/tsbischof/src/tsbischof/fda/test/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 223, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 801, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 857, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1925, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 22 fields in line 64826, saw 23

No data folder is created but this looks like there is a potential parse issue with the delimited files used in the FDA archive.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions