All 3 examples in tutorial do not work, for different reasons

Hello! As a first test of whether your software works, I went through each example in your tutorial and each failed for a different reason, one of which is the same error previously reported in another [issue](https://github.com/wassermanlab/ExplaiNN/issues?q=is%3Aopen+is%3Aissue). These could be due to errors unique to the data processing in the tutorial examples or to software issues, but I didn't probe further. Have you run these examples on your end? I copied and pasted your commands from the tutorial and double checked that everything was right, but I suppose I could have missed something.

Errors:

In example 1, I also get the same error previously posted during train test split:

  File "../../scripts/parsers/fasta2explainn.py", line 147, in _to_ExplaiNN
    df2 = pd.DataFrame(data, columns=list(range(len(data[0]))))
IndexError: list index out of range


In example 2, during the step to subsample 100k sequences:

 File "/Users/bjarnold/Princeton_EEB/Kocher/test/ExplaiNN/scripts/utils/subsample-seqs-by-gc.py", line 95, in _subsample_seqs_by_GC
    norm_factor = subsample / sum([len(v) for v in gc_regroups.values()])
ZeroDivisionError: division by zero


In example 3, during model training:

  File "/Users/bjarnold/miniforge3/envs/explainn/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "parsers.pyx", line 843, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 904, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 879, in pandas._libs.parsers.TextReader._tokenize_rows
  File "parsers.pyx", line 890, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "parsers.pyx", line 2058, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4



On top of these errors, there are several discrepancies between the tutorial PDF you uploaded and the slides you make available on google docs, including some slides completely missing (e.g. for example 3) or typos in commands (where the input and output file for a script are the same). 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

All 3 examples in tutorial do not work, for different reasons #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

All 3 examples in tutorial do not work, for different reasons #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions