Skip to content

Clunky formatting when reading picks.csv with pandas or csv #9

@lennijusten

Description

@lennijusten

The picks.csv output from PhaseNet contains some clunky formatting that requires the user to perform several string manipulations to properly format the itp, tp_prob, its, ts_prob columns.

I will show an example of reading the csv with pandas although reading the csv with the csv package runs into the same formatting issues . I will also share the function I had to make to correctly format the entries.

Pandas

import pandas as pd
df = pd.read_csv('output/picks.csv')

The result is a dataframe containing strings in the itp, tp_prob, its, ts_prob columns.

print(df['itp'][0])
>>>  '[   1 6620 8114]'

print(df['ts_prob'][0])
>>>  '[ 0.11291095  0.31720835  0.06021817]'

The values are not uniformly separated either which means the str.split() method can't be applied to convert the string into a list. Ideally, the csv would contain a uniform, comma-separated list of values. Another solution would be to also save a pickle file to the output directory that contains the lists in object form.

To fix the formatting with the current picks.csv, I made the following function:

import shlex
import pandas as pd

df = pd.read_csv('output/picks.csv')

def pickConverter(df):
    for col in ['itp', 'its']:
        pick_entry_list = []
        for x in range(len(df)):
            try:
                pick_entry_list.append(list(map(int, shlex.split(df[col][x].strip('[]')))))
            except AttributeError:
                pick_entry_list.append([])
                pass
        df[col] = pick_entry_list

    for col in ['tp_prob', 'ts_prob']:
        prob_entry_list = []
        for x in range(len(df)):
            try:
                prob_entry_list.append(list(map(float, shlex.split(df[col][x].strip('[]')))))
            except AttributeError:
                prob_entry_list.append([])
                pass
        df[col] = prob_entry_list
    return df

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions