Skip to content

fixup openfmri datasets metadata #17

@yarikoptic

Description

@yarikoptic

Currently (some might have been fixed upstream) we have following gotchas while parsing metadata from openfmri datasets (before enabling any custom ones, just bids parser)

[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000009
[WARNING] Failed to load participants info due to: 'ascii' codec can't encode character u'\u2019' in position 72: ordinal not in range(128) [csv.py:next:108]. Skipping the rest of file
  • 30 - same as 9. FOI: took 7:10.73 (7min) to aggregate! 223kB size of ds- and 128kB size of cn- compressed
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000030
[ERROR  ] Failed to get dataset metadata (bids): 'ascii' codec can't decode byte 0xe2 in position 3325: ordinal not in range(128) [ascii.py:decode:26]
[ERROR  ] Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately) [aggregate_metadata(/mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000030)]
aggregate_metadata(error): /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000030 [Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately)]
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000053
[ERROR  ] Failed to get dataset metadata (bids): 'ascii' codec can't decode byte 0xc2 in position 1188: ordinal not in range(128) [ascii.py:decode:26]
[ERROR  ] Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately) [aggregate_metadata(/mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000053)]
aggregate_metadata(error): /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000053 [Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately)]
  • 117
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000117
[ERROR  ] Failed to get dataset metadata (bids): 'ascii' codec can't decode byte 0xe2 in position 1585: ordinal not in range(128) [ascii.py:decode:26]
[ERROR  ] Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately) [aggregate_metadata(/mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000117)]
aggregate_metadata(error): /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000117 [Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately)]
  • 140 - just because README is large since includes output of bids-validator. doing nothing about that for now
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000140
[INFO   ] Removed metadata field(s) due to blacklisting and max size settings: set(['description'])
  • 164
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000164
[ERROR  ] Failed to get dataset metadata (bids): 'ascii' codec can't decode byte 0xc3 in position 1244: ordinal not in range(128) [ascii.py:decode:26]
[ERROR  ] Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately) [aggregate_metadata(/mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000164)]
aggregate_metadata(error): /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000164 [Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately)]
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000201
[WARNING] Failed to load participants info due to: "delimiter" must be string, not unicode [csv.py:__init__:79]. Skipping the rest of file
  • 214
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000214
[ERROR  ] Failed to get dataset metadata (bids): 'ascii' codec can't decode byte 0xe2 in position 1412: ordinal not in range(128) [ascii.py:decode:26]
[ERROR  ] Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately) [aggregate_metadata(/mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000214)]
aggregate_metadata(error): /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000214 [Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately)]
  • 216
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000216
[ERROR  ] Failed to get dataset metadata (bids): 'ascii' codec can't decode byte 0xc3 in position 1232: ordinal not in range(128) [ascii.py:decode:26]
[ERROR  ] Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately) [aggregate_metadata(/mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000216)]
aggregate_metadata(error): /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000216 [Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately)]
  • 218 - it is in a bit screwy state... for now manually unannexed/git added top level text files. fixed participants.tsv header to not have trailing tab
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000218
[WARNING] Could not determine file-format, assuming TSV
  • 221 Was a unicode whitespace used to separate fields in Authors. sent patch upstream as well
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000221
[ERROR  ] Failed to get dataset metadata (bids): No JSON object could be decoded [decoder.py:raw_decode:382]
[ERROR  ] Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately) [aggregate_metadata(/mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000221)]
aggregate_metadata(error): /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000221 [Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately)]
  • 223 - just a single column in participants.tsv -- useless
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000223
[WARNING] Could not determine file-format, assuming TSV
  • 224
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000224
[ERROR  ] Failed to get dataset metadata (bids): 'ascii' codec can't decode byte 0xe2 in position 1805: ordinal not in range(128) [ascii.py:decode:26]
[ERROR  ] Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately) [aggregate_metadata(/mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000224)]
aggregate_metadata(error): /mnt/btrfs/datasets-meta6-1-redo1/datalad/crawl/openfmri/ds000224 [Metadata extraction failed (see previous error message, set datalad.runtime.raiseonerror=yes to fail immediately)]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions