Preprocessing a file for csvbase.Importer requires acrobatics with temporary files

As CSV files downloaded from banks are notoriously messy, one might need to pre-process them beyond overwriting the `header` and `footer` members of 'csvbase.CSVReader`. For example. there might be a variable number of header lines. The common solution is to read in the file as a string, do the preprocessing (e.g. skip all lines up to a keyword, e.g. the first column header), then provide the updated string to the usual processing. 

But `csvbase.CSVReader.read()` expects a file name as argument, not a file-like object or a content string. Therefore, one must do something like this

```python
class MyMessyBank:
  
  def read(self, filepath):
    with open(filepath, 'r') as f:
      content = f.read()
    content = my_preprocessing(content)
    with tempfile.NamedTemporaryFile as t:
      t.writelines(content)
      for row in super().read(t.name):
        yield row
```

it would be easier if one could just use `StringIO()` or similar to provide modified data to `CSVReader.read()`.

There are multiple options that come to my mind

* Allow `read(str, ...)` as well as `read(file, ...)`
* Introduce `read()` usage like `read(None, ... , content: <file>)`
* Add a `open(filename) -> file` function to `importers.Importer` that can be overwritten by subclasses



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Preprocessing a file for csvbase.Importer requires acrobatics with temporary files #196

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Preprocessing a file for csvbase.Importer requires acrobatics with temporary files #196

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions