-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
As CSV files downloaded from banks are notoriously messy, one might need to pre-process them beyond overwriting the header and footer members of 'csvbase.CSVReader`. For example. there might be a variable number of header lines. The common solution is to read in the file as a string, do the preprocessing (e.g. skip all lines up to a keyword, e.g. the first column header), then provide the updated string to the usual processing.
But csvbase.CSVReader.read() expects a file name as argument, not a file-like object or a content string. Therefore, one must do something like this
class MyMessyBank:
def read(self, filepath):
with open(filepath, 'r') as f:
content = f.read()
content = my_preprocessing(content)
with tempfile.NamedTemporaryFile as t:
t.writelines(content)
for row in super().read(t.name):
yield rowit would be easier if one could just use StringIO() or similar to provide modified data to CSVReader.read().
There are multiple options that come to my mind
- Allow
read(str, ...)as well asread(file, ...) - Introduce
read()usage likeread(None, ... , content: <file>) - Add a
open(filename) -> filefunction toimporters.Importerthat can be overwritten by subclasses
johannesjh
Metadata
Metadata
Assignees
Labels
No labels