Skip to content

Preprocessing a file for csvbase.Importer requires acrobatics with temporary files #196

@mlell

Description

@mlell

As CSV files downloaded from banks are notoriously messy, one might need to pre-process them beyond overwriting the header and footer members of 'csvbase.CSVReader`. For example. there might be a variable number of header lines. The common solution is to read in the file as a string, do the preprocessing (e.g. skip all lines up to a keyword, e.g. the first column header), then provide the updated string to the usual processing.

But csvbase.CSVReader.read() expects a file name as argument, not a file-like object or a content string. Therefore, one must do something like this

class MyMessyBank:
  
  def read(self, filepath):
    with open(filepath, 'r') as f:
      content = f.read()
    content = my_preprocessing(content)
    with tempfile.NamedTemporaryFile as t:
      t.writelines(content)
      for row in super().read(t.name):
        yield row

it would be easier if one could just use StringIO() or similar to provide modified data to CSVReader.read().

There are multiple options that come to my mind

  • Allow read(str, ...) as well as read(file, ...)
  • Introduce read() usage like read(None, ... , content: <file>)
  • Add a open(filename) -> file function to importers.Importer that can be overwritten by subclasses

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions