Some questions to reproduce reported results

Hello,

I think the following questions are not prioritized, comparing to these [PR](https://github.com/huminghao16/RE3QA/pull/3#issue-296753941) and issue #4, so intentionally submitting this as a new issue.

-Simpler first-
1. Is allennlp actually required?
README says allennlp is one of the requirements, but I couldn't find any program in this repository using allennlp

2. Are four NVIDIA TESLA P100 enough?
I used four NVIDIA Tesla V100 SXM2 (16GB video memory for each), but faced CUDA out of memory when using your provided configuration in README with train_batch_size=32 for both SQuAD and TriviaQA datasets.

```
07/08/2019 20:40:20 - INFO - __main__ -   output_dir: out/squad_doc/02                                           
07/08/2019 20:40:23 - INFO - __main__ -   torch_version: 1.1.0 device: cuda n_gpu: 4, distributed training: False
, 16-bits training: False                                                                                        
07/08/2019 20:40:23 - INFO - __main__ -   ***** Preparing model *****                                            
07/08/2019 20:40:24 - INFO - __main__ -   Loading model from pretrained checkpoint: bert-base-uncased/pytorch_mod
el.bin                                                                                                           
07/08/2019 20:40:24 - INFO - __main__ -   Weights of BertForRankingAndReadingAndReranking not initialized from pr
etrained model: ['rank_affine.weight', 'rank_affine.bias', 'read_affine.weight', 'read_affine.bias', 'rerank_affi
ne.weight', 'rerank_affine.bias', 'rank_ffn.dense.weight', 'rank_ffn.dense.bias', 'rank_ffn.affine.weight', 'rank
_ffn.affine.bias', 'rerank_ffn.dense.weight', 'rerank_ffn.dense.bias', 'rerank_ffn.affine.weight', 'rerank_ffn.af
fine.bias']                                                                                                      
07/08/2019 20:40:24 - INFO - __main__ -   Weights from pretrained model not used in BertForRankingAndReadingAndRe
ranking: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias
', 'cls.predictions.transform.LayerNorm.gamma', 'cls.predictions.transform.LayerNorm.beta', 'cls.predictions.deco
der.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']                                         
07/08/2019 20:40:29 - INFO - __main__ -   ***** Preparing training *****                                         
07/08/2019 20:40:34 - INFO - __main__ -   Loading examples from: data/RE3QA/squad/train_8paras_examples
.pkl                                                                                                             
07/08/2019 20:42:56 - INFO - __main__ -   Loading features from: data/RE3QA/squad/train_8paras_384max_1
28stride_features.pkl                                                                                            
07/08/2019 20:42:56 - INFO - __main__ -   Filtering features randomly                                            
07/08/2019 20:42:58 - INFO - __main__ -   Num orig examples = 87599                                              
07/08/2019 20:42:58 - INFO - __main__ -   Num split features = 746625                                            
07/08/2019 20:42:58 - INFO - __main__ -   Num split filtered features = 343850                                   
07/08/2019 20:42:58 - INFO - __main__ -   Batch size for ranker = 69                                             
07/08/2019 20:42:58 - INFO - __main__ -   Batch size for reader = 32                                             
07/08/2019 20:42:58 - INFO - __main__ -   Num steps = 21490
07/08/2019 20:43:22 - INFO - __main__ -   ***** Preparing evaluation *****                                       
07/08/2019 20:43:23 - INFO - __main__ -   Loading examples from: data/RE3QA/squad/eval_10paras_examples
.pkl                                                                                                             
07/08/2019 20:43:52 - INFO - __main__ -   Loading features from: data/RE3QA/squad/eval_10paras_384max_1
28stride_features.pkl                                                                                            
07/08/2019 20:43:52 - INFO - __main__ -   Filtering features randomly                                            
07/08/2019 20:43:52 - INFO - __main__ -   Num orig examples = 10570                                              
07/08/2019 20:43:52 - INFO - __main__ -   Num split features = 122413                                            
07/08/2019 20:43:52 - INFO - __main__ -   Num split filtered features = 42279                                    
07/08/2019 20:43:52 - INFO - __main__ -   Batch size for ranker = 64                                             
07/08/2019 20:43:52 - INFO - __main__ -   Batch size for reader = 32                                             
07/08/2019 20:43:56 - INFO - __main__ -   ***** Running training distillation *****                              
07/08/2019 20:43:56 - INFO - __main__ -   Processing example: 0                                                  
07/08/2019 20:53:28 - INFO - __main__ -   Processing example: 345000                                             
W07/08/2019 21:02:34 - INFO - __main__ -   Processing example: 690000                                            
07/08/2019 21:04:32 - INFO - __main__ -   ***** Reconstruct training data at distill_8paras_4best.pkl *****      
07/08/2019 21:04:32 - INFO - __main__ -   Filtering features based on: out/squad_doc/02/distill_8paras_4best.pkl 
07/08/2019 21:43:07 - INFO - __main__ -   Num orig examples = 87599                                              
07/08/2019 21:43:07 - INFO - __main__ -   Num split features = 746625                                            
07/08/2019 21:43:07 - INFO - __main__ -   Num split filtered features = 349167                                   
07/08/2019 21:43:07 - INFO - __main__ -   Batch size for ranker = 68                                             
07/08/2019 21:43:07 - INFO - __main__ -   Batch size for reader = 32                                             
07/08/2019 21:43:07 - INFO - __main__ -   Num steps = 21822                                                      
07/08/2019 21:43:32 - INFO - __main__ -   ***** Running eval distillation *****                                  
07/08/2019 21:43:32 - INFO - __main__ -   Processing example: 0                                                  
07/08/2019 21:44:40 - INFO - __main__ -   Processing example: 40000                                              
07/08/2019 21:45:49 - INFO - __main__ -   Processing example: 80000                                              
07/08/2019 21:46:58 - INFO - __main__ -   Processing example: 120000                                             
07/08/2019 21:47:03 - INFO - __main__ -   ***** Reconstruct eval data at test_10paras_4best.pkl *****            
07/08/2019 21:47:03 - INFO - __main__ -   Filtering features based on: out/squad_doc/02/test_10paras_4best.pkl   
07/08/2019 21:47:04 - INFO - __main__ -   Num orig examples = 10570                                              
07/08/2019 21:47:04 - INFO - __main__ -   Num split features = 122413                                            
07/08/2019 21:47:04 - INFO - __main__ -   Num split filtered features = 42279
07/08/2019 21:47:04 - INFO - __main__ -   Batch size for ranker = 64                                    
07/08/2019 21:47:04 - INFO - __main__ -   Batch size for reader = 32                                             
07/08/2019 21:47:07 - INFO - __main__ -   ***** Preparing optimizer *****                                        
07/08/2019 21:47:07 - INFO - __main__ -   ***** Running training *****                                           
07/08/2019 21:47:07 - INFO - __main__ -   ***** Epoch: 1 *****                                                   
/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/parallel/_functions.py$
61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsquee$
e and return a vector.                                                                                           
  warnings.warn('Was asked to gather along dimension 0, but all '                                                
Traceback (most recent call last):                                                                               
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main                                           
    "__main__", mod_spec)                                                                                        
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code                                                      
    exec(code, run_globals)                                                                                      
  File "/home/ubuntu/workspace/RE3QA/bert/run_squad_document_full_e2e.py", line 914, in <module>                 
    main()                                                                                                       
  File "/home/ubuntu/workspace/RE3QA/bert/run_squad_document_full_e2e.py", line 857, in main                     
    save_path, best_f1, epoch)                                                                                   
  File "/home/ubuntu/workspace/RE3QA/bert/run_squad_document_full_e2e.py", line 487, in run_train_epoch          
    input_ids=input_ids, token_type_ids=segment_ids)                                                             
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/modules/module$
py", line 493, in __call__                                                                                       
    result = self.forward(*input, **kwargs)                                                                      
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/parallel/data_$
arallel.py", line 152, in forward                                                                                
    outputs = self.parallel_apply(replicas, inputs, kwargs)                                                      
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/parallel/data_$
arallel.py", line 162, in parallel_apply                                                                         
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])                             
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/parallel/paral$
el_apply.py", line 83, in parallel_apply
    raise output                                                                                           File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
    output = module(*input, **kwargs)
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/RE3QA/bert/custom_modeling.py", line 306, in forward
    all_encoder_layers, _ = self.bert(self.num_hidden_read, input_ids, token_type_ids, attention_mask)
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/RE3QA/bert/custom_modeling.py", line 166, in forward
    all_encoder_layers = self.encoder(num_hidden_stop, embedding_output, extended_attention_mask)
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/RE3QA/bert/custom_modeling.py", line 131, in forward
    hidden_states = layer_module(hidden_states, attention_mask)
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/RE3QA/bert/modeling.py", line 273, in forward
    intermediate_output = self.intermediate(attention_output)
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/RE3QA/bert/modeling.py", line 246, in forward
    hidden_states = self.intermediate_act_fn(hidden_states)
  File "/home/ubuntu/workspace/RE3QA/bert/modeling.py", line 35, in gelu
    return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0)))
RuntimeError: CUDA out of memory. Tried to allocate 36.00 MiB (GPU 0; 15.75 GiB total capacity; 14.25 GiB already allocated; 30.19 MiB free; 365.16 MiB cached)
```

3. For the above reason, I needed to set 16 at train_batch_size for both SQuAD and TriviaQA datasets, and got the following results.
Compared to [your reported results](https://arxiv.org/pdf/1906.04618.pdf), I feel some performance gap especially for TriviaQA dataset.
Do you think this is just because of different training batch size (32 -> 16)?
If you used different configurations to get the reported results, could you provide them and tell me which table in the paper I should refer to for comparison?

SQuAD -document-
```
Ranker, type: distill, step: 0, map: 0.492, mrr: 0.510, top_1: 0.309, top_3: 0.610, top_5: 0.814, top_7: 0.932, retrieval_rate: 0.468
 
Ranker, type: test, step: 0, map: 0.395, mrr: 0.412, top_1: 0.230, top_3: 0.467, top_5: 0.655, top_7: 0.800, retrieval_rate: 0.345
 
Ranker, type: distill, step: 0, map: 0.492, mrr: 0.510, top_1: 0.309, top_3: 0.610, top_5: 0.814, top_7: 0.932, retrieval_rate: 0.468
 
Ranker, type: test, step: 0, map: 0.395, mrr: 0.412, top_1: 0.230, top_3: 0.467, top_5: 0.655, top_7: 0.800, retrieval_rate: 0.345
 
Ranker, step: 21823, map: 0.888, mrr: 0.906, top_1: 0.872, top_3: 0.938, top_5: 0.956, top_7: 0.961
Reader, step: 21823, em: 45.535, f1: 51.553
 
Ranker, type: distill, step: 21823, map: 0.955, mrr: 0.967, top_1: 0.945, top_3: 0.988, top_5: 0.997, top_7: 0.999, retrieval_rate: 0.468
 
Ranker, type: test, step: 21823, map: 0.888, mrr: 0.906, top_1: 0.872, top_3: 0.938, top_5: 0.956, top_7: 0.961, retrieval_rate: 0.345
 
Ranker, step: 43646, map: 0.889, mrr: 0.907, top_1: 0.871, top_3: 0.939, top_5: 0.956, top_7: 0.962
Reader, step: 43646, em: 76.500, f1: 83.212
 
Ranker, type: test, step: 43646, map: 0.883, mrr: 0.909, top_1: 0.867, top_3: 0.943, top_5: 0.965, top_7: 0.976, retrieval_rate: 0.223
 
Reader, type: test, step: 43646, em: 77.550, f1: 84.379
```

TriviaQA -wiki-
```
Ranker, type: distill, step: 0, map: 0.632, mrr: 0.670, top_1: 0.537, top_3: 0.745, top_5: 0.850, top_7: 0.909, retrieval_rate: 0.406
 
Ranker, type: dev, step: 0, map: 0.594, mrr: 0.636, top_1: 0.514, top_3: 0.706, top_5: 0.800, top_7: 0.855, retrieval_rate: 0.349
 
Ranker, type: distill, step: 0, map: 0.632, mrr: 0.670, top_1: 0.537, top_3: 0.745, top_5: 0.850, top_7: 0.909, retrieval_rate: 0.406
 
Ranker, type: dev, step: 0, map: 0.594, mrr: 0.636, top_1: 0.514, top_3: 0.706, top_5: 0.800, top_7: 0.855, retrieval_rate: 0.349
 
Ranker, type: distill, step: 0, map: 0.632, mrr: 0.670, top_1: 0.538, top_3: 0.744, top_5: 0.850, top_7: 0.909, retrieval_rate: 0.406
 
Ranker, type: dev, step: 0, map: 0.595, mrr: 0.636, top_1: 0.514, top_3: 0.707, top_5: 0.801, top_7: 0.855, retrieval_rate: 0.349
 
Ranker, type: distill, step: 0, map: 0.632, mrr: 0.670, top_1: 0.538, top_3: 0.744, top_5: 0.850, top_7: 0.909, retrieval_rate: 0.406
 
Ranker, type: dev, step: 0, map: 0.595, mrr: 0.636, top_1: 0.514, top_3: 0.707, top_5: 0.801, top_7: 0.855, retrieval_rate: 0.349
 
Ranker, step: 22334, loss: 2.610, map: 0.776, mrr: 0.849, top_1: 0.797, top_3: 0.890, top_5: 0.920, top_7: 0.933
Reader, step: 22334, loss: 2.610, em: 40.636, f1: 53.360
 
Ranker, type: distill, step: 22334, map: 0.839, mrr: 0.903, top_1: 0.852, top_3: 0.941, top_5: 0.970, top_7: 0.984, retrieval_rate: 0.406
 
Ranker, type: dev, step: 22334, map: 0.776, mrr: 0.849, top_1: 0.797, top_3: 0.890, top_5: 0.920, top_7: 0.933, retrieval_rate: 0.349
 
Ranker, step: 44668, loss: 2.388, map: 0.784, mrr: 0.855, top_1: 0.807, top_3: 0.894, top_5: 0.921, top_7: 0.933
Reader, step: 44668, loss: 2.388, em: 51.295, f1: 64.485
 
Ranker, type: dev, step: 44668, map: 0.784, mrr: 0.855, top_1: 0.807, top_3: 0.894, top_5: 0.921, top_7: 0.933, retrieval_rate: 0.349
 
Reader, type: dev, step: 44668, em: 51.483, f1: 64.621
```

TriviaQA -unfiltered-
```
Ranker, type: distill, step: 0, map: 0.778, mrr: 0.809, top_1: 0.720, top_3: 0.872, top_5: 0.929, top_7: 0.957, retrieval_rate: 0.322
 
Ranker, type: dev, step: 0, map: 0.616, mrr: 0.645, top_1: 0.559, top_3: 0.701, top_5: 0.758, top_7: 0.791, retrieval_rate: 0.294
 
Ranker, type: distill, step: 0, map: 0.778, mrr: 0.809, top_1: 0.720, top_3: 0.872, top_5: 0.929, top_7: 0.957, retrieval_rate: 0.322
 
Ranker, type: dev, step: 0, map: 0.616, mrr: 0.645, top_1: 0.559, top_3: 0.701, top_5: 0.758, top_7: 0.791, retrieval_rate: 0.294
 
Ranker, step: 28727, loss: 2.729, map: 0.735, mrr: 0.780, top_1: 0.748, top_3: 0.804, top_5: 0.824, top_7: 0.832
Reader, step: 28727, loss: 2.729, em: 57.695, f1: 62.731
 
Ranker, type: distill, step: 28727, map: 0.900, mrr: 0.941, top_1: 0.912, top_3: 0.964, top_5: 0.981, top_7: 0.988, retrieval_rate: 0.322
 
Ranker, type: dev, step: 28727, map: 0.735, mrr: 0.780, top_1: 0.748, top_3: 0.804, top_5: 0.824, top_7: 0.832, retrieval_rate: 0.294
 
Ranker, step: 57454, loss: 2.763, map: 0.738, mrr: 0.781, top_1: 0.750, top_3: 0.804, top_5: 0.824, top_7: 0.832
Reader, step: 57454, loss: 2.763, em: 63.794, f1: 69.462
 
Ranker, type: dev, step: 57454, map: 0.738, mrr: 0.781, top_1: 0.750, top_3: 0.804, top_5: 0.824, top_7: 0.832, retrieval_rate: 0.294
 
Reader, type: dev, step: 57454, em: 63.714, f1: 69.311
```

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions to reproduce reported results #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Some questions to reproduce reported results #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions