Skip to content
This repository was archived by the owner on Mar 9, 2018. It is now read-only.

Add possibility to check values to index validator#36

Open
markusbaden wants to merge 1 commit intoc-bata:masterfrom
markusbaden:columns_exist
Open

Add possibility to check values to index validator#36
markusbaden wants to merge 1 commit intoc-bata:masterfrom
markusbaden:columns_exist

Conversation

@markusbaden
Copy link

Sometimes we want to validate that a DataFrame contains certain columns, without necessarily worrying about what is the content of that column.

In my case I am parsing a file as part of an ETL process and want to check the result. My expectation is that the file will always have the same columns in the same place. Say the columns I am expecting are ['a', 'b'], then

pd.DataFrame(
    [
        [1, 2], 
        [3, 4],
    ],
    columns=['a', 'b'],
)

would be valid, while both

pd.DataFrame(
    [
        [2, 1], 
        [4, 3],
    ],
    columns=['b', 'a'],
)

and

pd.DataFrame(
    [
        [1, 2], 
        [3, 4],
    ],
    columns=['x', 'y'],
)

would not be valid. This case is very strict (i.e. order of columns matters) and we might want to relax this in future iterations.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant