Skip to content

Initial code review #1

@wfondrie

Description

@wfondrie

Overall I think it looks great and very useful. Here are my comments for my initial code review of the repository:

  • The README needs some more details 😄

  • What is this src/py.typed? It seems to be empty.

  • Typically, I think folks use from pathlib import Path. It might be something you want to consider, but it doesn't really matter.

  • Why would you use this private function instead of just boto3.Session() directly?

    def _get_boto_session() -> boto3.Session:
    """Creates and returns an active boto3 session.
    Returns:
    boto3.Session: An active boto3 Session.
    """
    session = boto3.Session()
    return session

  • Any reason why you decided to have the bucket and key separate in you parameters, instead of say using the S3 URI? I guess a more general AWS question: Is the S3 URI always in the format s3://<bucket>/key? If so, you could always figure out the bucket and key from a single parameter.

  • I think you'll run into problems if the user has specified sep in **kwargs.

    return pd.read_csv(data, sep="\t", **kwargs)

  • This should mention it is inferred from the suffix by default:

    outputformat (Optional[str], optional): The target output format.
    Can be one of {parquet, txt, csv, tsv}.
    Defaults to None.

  • You've replicated a subset of the pandas saving a loading utilities here. However, I think that functionality is already in pandas using the s3 URI. From https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html:

    Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.

    Have you tried this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions