Skip to content

Scorer __init__ silently fails with empty dataframe due to special encoding in csv file #4

@matsuobasho

Description

@matsuobasho

I have the output from the inference step of running run_gpt_v1.5_gpt4_1106_cot.py saved in the appropriate directory.
When I get to the scoring step, fread returns an empty list for the data object:

class Scorer:
    def __init__(self, files, data_list_of_dicts=None):
        if not len(files):
            print('No files for evaluation')
            import sys
            sys.exit()

        from efficiency.log import fread
        data_list = []
        for file in sorted(files):
            data = fread(file)
            data_list += data
            print(file, len(data))

The fread intended behavior is to return an empty list when read_csv errors out. This is not good practice - very difficult to diagnose why the user gets an empty dataframe and eventual error here, especially since fread is programmed in another library:

    def truth_pred_scorer(self, df):
        df.drop(['prompt', 'question_id'], axis=1, inplace=True)

        df = self.apply_score_func(df)

The error occurs because the specified columns for dropping aren't in the dataframe, since it is empty.

I'm able to import my file by specifying a special encoding at line 258 of efficiency.logpy`:

data = pd.read_csv(path, encoding = 'cp1252').to_dict(orient="records")

But the fact that I have to modify another library highlights why this isn't ideal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions