Skip to content

IndexError: list index out of range error following "Build QA engine" steps and submitting example query #29

@ltfschoen

Description

@ltfschoen

i'm using Elasticsearch 7.11.1, Python 3.7.13

In the "Build QA engine" section, when I respond to the query as follows:

Enter your query here: what does covid-19 cause    

It outputs an error:

WARNING:allennlp.data.fields.sequence_label_field:Your label namespace was 'pos'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary.
  See documentation for `non_padded_namespaces` parameter in Vocabulary.                                    
INFO:elasticsearch:GET http://localhost:9200/ [status:200 request:0.520s]                                   
INFO:elasticsearch:POST http://localhost:9200/elastic_index/_search [status:200 request:0.353s]             
The number of datapacks(including query) is 1         
Traceback (most recent call last):                    
  File "./examples/pipeline/inference/search_cord19.py", line 97, in <module>                               
    data_pack = next(nlp.process_dataset()).get_pack_at(1)                                                  
  File "/home/ubuntu/.pyenv/versions/3.7.13/lib/python3.7/site-packages/forte/data/multi_pack.py", line 491, in get_pack_at
    return self.packs[index]                          
IndexError: list index out of range                   

It seems I'm not reading the datasets at all, even though I tried to read the sample datasets that were provided in the previous step with

python examples/pipeline/indexer/cordindexer.py --data-dir ./data/document_parses/sample_pdf_json 

which output the following really quickly, so it doesn't seem it indexed any data...

WARNING:root:Re-declared a new class named [ConstituentNode], which is probably used in import.                                                                                                                         
INFO:elasticsearch:GET http://localhost:9200/ [status:200 request:0.008s]                                                                                                                                               
/home/ubuntu/.pyenv/versions/3.7.13/lib/python3.7/site-packages/elasticsearch/connection/base.py:200: ElasticsearchWarning: [types removal] Specifying types in bulk requests is deprecated.                            
  warnings.warn(message, category=ElasticsearchWarning)                                                                                                                                                                 
INFO:elasticsearch:POST http://localhost:9200/_bulk?refresh=true [status:200 request:0.338s]

and that directory contains three dataset files:

  • 55736408816d3f956d830854659f24109444a36c.json
  • aadc3e716b6cb0e898953dff056124378b31483c.json
  • ffff73d17bc392ee68f3f16ef37d25579cb99322.json

i also noticed that in the config.yml file for the Indexer, it has fields doc_id and content https://github.com/petuum/composing_information_system/blob/main/examples/pipeline/indexer/config.yml#L3, however the above dataset files don't contain those fields at all, most of the content is in fields title, text, and section, but if i update that config.yml to be the following i get the same outcome

create_index:
  batch_size: 10000
  fields:
    # - doc_id
    # - content
    - title
    - text
    - section

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions