-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
I have the output from the inference step of running run_gpt_v1.5_gpt4_1106_cot.py saved in the appropriate directory.
When I get to the scoring step, fread returns an empty list for the data object:
class Scorer:
def __init__(self, files, data_list_of_dicts=None):
if not len(files):
print('No files for evaluation')
import sys
sys.exit()
from efficiency.log import fread
data_list = []
for file in sorted(files):
data = fread(file)
data_list += data
print(file, len(data))The fread intended behavior is to return an empty list when read_csv errors out. This is not good practice - very difficult to diagnose why the user gets an empty dataframe and eventual error here, especially since fread is programmed in another library:
def truth_pred_scorer(self, df):
df.drop(['prompt', 'question_id'], axis=1, inplace=True)
df = self.apply_score_func(df)The error occurs because the specified columns for dropping aren't in the dataframe, since it is empty.
I'm able to import my file by specifying a special encoding at line 258 of efficiency.logpy`:
data = pd.read_csv(path, encoding = 'cp1252').to_dict(orient="records")But the fact that I have to modify another library highlights why this isn't ideal.
Metadata
Metadata
Assignees
Labels
No labels