Skip to content

All 3 examples in tutorial do not work, for different reasons #8

@brian-arnold

Description

@brian-arnold

Hello! As a first test of whether your software works, I went through each example in your tutorial and each failed for a different reason, one of which is the same error previously reported in another issue. These could be due to errors unique to the data processing in the tutorial examples or to software issues, but I didn't probe further. Have you run these examples on your end? I copied and pasted your commands from the tutorial and double checked that everything was right, but I suppose I could have missed something.

Errors:

In example 1, I also get the same error previously posted during train test split:

File "../../scripts/parsers/fasta2explainn.py", line 147, in _to_ExplaiNN
df2 = pd.DataFrame(data, columns=list(range(len(data[0]))))
IndexError: list index out of range

In example 2, during the step to subsample 100k sequences:

File "/Users/bjarnold/Princeton_EEB/Kocher/test/ExplaiNN/scripts/utils/subsample-seqs-by-gc.py", line 95, in _subsample_seqs_by_GC
norm_factor = subsample / sum([len(v) for v in gc_regroups.values()])
ZeroDivisionError: division by zero

In example 3, during model training:

File "/Users/bjarnold/miniforge3/envs/explainn/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "parsers.pyx", line 843, in pandas._libs.parsers.TextReader.read_low_memory
File "parsers.pyx", line 904, in pandas._libs.parsers.TextReader._read_rows
File "parsers.pyx", line 879, in pandas._libs.parsers.TextReader._tokenize_rows
File "parsers.pyx", line 890, in pandas._libs.parsers.TextReader._check_tokenize_status
File "parsers.pyx", line 2058, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4

On top of these errors, there are several discrepancies between the tutorial PDF you uploaded and the slides you make available on google docs, including some slides completely missing (e.g. for example 3) or typos in commands (where the input and output file for a script are the same).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions