Skip to content

Some questions to reproduce reported results #5

@yoshitomo-matsubara

Description

@yoshitomo-matsubara

Hello,

I think the following questions are not prioritized, comparing to these PR and issue #4, so intentionally submitting this as a new issue.

-Simpler first-

  1. Is allennlp actually required?
    README says allennlp is one of the requirements, but I couldn't find any program in this repository using allennlp

  2. Are four NVIDIA TESLA P100 enough?
    I used four NVIDIA Tesla V100 SXM2 (16GB video memory for each), but faced CUDA out of memory when using your provided configuration in README with train_batch_size=32 for both SQuAD and TriviaQA datasets.

07/08/2019 20:40:20 - INFO - __main__ -   output_dir: out/squad_doc/02                                           
07/08/2019 20:40:23 - INFO - __main__ -   torch_version: 1.1.0 device: cuda n_gpu: 4, distributed training: False
, 16-bits training: False                                                                                        
07/08/2019 20:40:23 - INFO - __main__ -   ***** Preparing model *****                                            
07/08/2019 20:40:24 - INFO - __main__ -   Loading model from pretrained checkpoint: bert-base-uncased/pytorch_mod
el.bin                                                                                                           
07/08/2019 20:40:24 - INFO - __main__ -   Weights of BertForRankingAndReadingAndReranking not initialized from pr
etrained model: ['rank_affine.weight', 'rank_affine.bias', 'read_affine.weight', 'read_affine.bias', 'rerank_affi
ne.weight', 'rerank_affine.bias', 'rank_ffn.dense.weight', 'rank_ffn.dense.bias', 'rank_ffn.affine.weight', 'rank
_ffn.affine.bias', 'rerank_ffn.dense.weight', 'rerank_ffn.dense.bias', 'rerank_ffn.affine.weight', 'rerank_ffn.af
fine.bias']                                                                                                      
07/08/2019 20:40:24 - INFO - __main__ -   Weights from pretrained model not used in BertForRankingAndReadingAndRe
ranking: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias
', 'cls.predictions.transform.LayerNorm.gamma', 'cls.predictions.transform.LayerNorm.beta', 'cls.predictions.deco
der.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']                                         
07/08/2019 20:40:29 - INFO - __main__ -   ***** Preparing training *****                                         
07/08/2019 20:40:34 - INFO - __main__ -   Loading examples from: data/RE3QA/squad/train_8paras_examples
.pkl                                                                                                             
07/08/2019 20:42:56 - INFO - __main__ -   Loading features from: data/RE3QA/squad/train_8paras_384max_1
28stride_features.pkl                                                                                            
07/08/2019 20:42:56 - INFO - __main__ -   Filtering features randomly                                            
07/08/2019 20:42:58 - INFO - __main__ -   Num orig examples = 87599                                              
07/08/2019 20:42:58 - INFO - __main__ -   Num split features = 746625                                            
07/08/2019 20:42:58 - INFO - __main__ -   Num split filtered features = 343850                                   
07/08/2019 20:42:58 - INFO - __main__ -   Batch size for ranker = 69                                             
07/08/2019 20:42:58 - INFO - __main__ -   Batch size for reader = 32                                             
07/08/2019 20:42:58 - INFO - __main__ -   Num steps = 21490
07/08/2019 20:43:22 - INFO - __main__ -   ***** Preparing evaluation *****                                       
07/08/2019 20:43:23 - INFO - __main__ -   Loading examples from: data/RE3QA/squad/eval_10paras_examples
.pkl                                                                                                             
07/08/2019 20:43:52 - INFO - __main__ -   Loading features from: data/RE3QA/squad/eval_10paras_384max_1
28stride_features.pkl                                                                                            
07/08/2019 20:43:52 - INFO - __main__ -   Filtering features randomly                                            
07/08/2019 20:43:52 - INFO - __main__ -   Num orig examples = 10570                                              
07/08/2019 20:43:52 - INFO - __main__ -   Num split features = 122413                                            
07/08/2019 20:43:52 - INFO - __main__ -   Num split filtered features = 42279                                    
07/08/2019 20:43:52 - INFO - __main__ -   Batch size for ranker = 64                                             
07/08/2019 20:43:52 - INFO - __main__ -   Batch size for reader = 32                                             
07/08/2019 20:43:56 - INFO - __main__ -   ***** Running training distillation *****                              
07/08/2019 20:43:56 - INFO - __main__ -   Processing example: 0                                                  
07/08/2019 20:53:28 - INFO - __main__ -   Processing example: 345000                                             
W07/08/2019 21:02:34 - INFO - __main__ -   Processing example: 690000                                            
07/08/2019 21:04:32 - INFO - __main__ -   ***** Reconstruct training data at distill_8paras_4best.pkl *****      
07/08/2019 21:04:32 - INFO - __main__ -   Filtering features based on: out/squad_doc/02/distill_8paras_4best.pkl 
07/08/2019 21:43:07 - INFO - __main__ -   Num orig examples = 87599                                              
07/08/2019 21:43:07 - INFO - __main__ -   Num split features = 746625                                            
07/08/2019 21:43:07 - INFO - __main__ -   Num split filtered features = 349167                                   
07/08/2019 21:43:07 - INFO - __main__ -   Batch size for ranker = 68                                             
07/08/2019 21:43:07 - INFO - __main__ -   Batch size for reader = 32                                             
07/08/2019 21:43:07 - INFO - __main__ -   Num steps = 21822                                                      
07/08/2019 21:43:32 - INFO - __main__ -   ***** Running eval distillation *****                                  
07/08/2019 21:43:32 - INFO - __main__ -   Processing example: 0                                                  
07/08/2019 21:44:40 - INFO - __main__ -   Processing example: 40000                                              
07/08/2019 21:45:49 - INFO - __main__ -   Processing example: 80000                                              
07/08/2019 21:46:58 - INFO - __main__ -   Processing example: 120000                                             
07/08/2019 21:47:03 - INFO - __main__ -   ***** Reconstruct eval data at test_10paras_4best.pkl *****            
07/08/2019 21:47:03 - INFO - __main__ -   Filtering features based on: out/squad_doc/02/test_10paras_4best.pkl   
07/08/2019 21:47:04 - INFO - __main__ -   Num orig examples = 10570                                              
07/08/2019 21:47:04 - INFO - __main__ -   Num split features = 122413                                            
07/08/2019 21:47:04 - INFO - __main__ -   Num split filtered features = 42279
07/08/2019 21:47:04 - INFO - __main__ -   Batch size for ranker = 64                                    
07/08/2019 21:47:04 - INFO - __main__ -   Batch size for reader = 32                                             
07/08/2019 21:47:07 - INFO - __main__ -   ***** Preparing optimizer *****                                        
07/08/2019 21:47:07 - INFO - __main__ -   ***** Running training *****                                           
07/08/2019 21:47:07 - INFO - __main__ -   ***** Epoch: 1 *****                                                   
/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/parallel/_functions.py$
61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsquee$
e and return a vector.                                                                                           
  warnings.warn('Was asked to gather along dimension 0, but all '                                                
Traceback (most recent call last):                                                                               
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main                                           
    "__main__", mod_spec)                                                                                        
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code                                                      
    exec(code, run_globals)                                                                                      
  File "/home/ubuntu/workspace/RE3QA/bert/run_squad_document_full_e2e.py", line 914, in <module>                 
    main()                                                                                                       
  File "/home/ubuntu/workspace/RE3QA/bert/run_squad_document_full_e2e.py", line 857, in main                     
    save_path, best_f1, epoch)                                                                                   
  File "/home/ubuntu/workspace/RE3QA/bert/run_squad_document_full_e2e.py", line 487, in run_train_epoch          
    input_ids=input_ids, token_type_ids=segment_ids)                                                             
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/modules/module$
py", line 493, in __call__                                                                                       
    result = self.forward(*input, **kwargs)                                                                      
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/parallel/data_$
arallel.py", line 152, in forward                                                                                
    outputs = self.parallel_apply(replicas, inputs, kwargs)                                                      
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/parallel/data_$
arallel.py", line 162, in parallel_apply                                                                         
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])                             
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/parallel/paral$
el_apply.py", line 83, in parallel_apply
    raise output                                                                                           File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
    output = module(*input, **kwargs)
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/RE3QA/bert/custom_modeling.py", line 306, in forward
    all_encoder_layers, _ = self.bert(self.num_hidden_read, input_ids, token_type_ids, attention_mask)
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/RE3QA/bert/custom_modeling.py", line 166, in forward
    all_encoder_layers = self.encoder(num_hidden_stop, embedding_output, extended_attention_mask)
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/RE3QA/bert/custom_modeling.py", line 131, in forward
    hidden_states = layer_module(hidden_states, attention_mask)
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/RE3QA/bert/modeling.py", line 273, in forward
    intermediate_output = self.intermediate(attention_output)
  File "/home/ubuntu/.local/share/virtualenvs/RE3QA-pRGEMyAS/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/RE3QA/bert/modeling.py", line 246, in forward
    hidden_states = self.intermediate_act_fn(hidden_states)
  File "/home/ubuntu/workspace/RE3QA/bert/modeling.py", line 35, in gelu
    return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0)))
RuntimeError: CUDA out of memory. Tried to allocate 36.00 MiB (GPU 0; 15.75 GiB total capacity; 14.25 GiB already allocated; 30.19 MiB free; 365.16 MiB cached)
  1. For the above reason, I needed to set 16 at train_batch_size for both SQuAD and TriviaQA datasets, and got the following results.
    Compared to your reported results, I feel some performance gap especially for TriviaQA dataset.
    Do you think this is just because of different training batch size (32 -> 16)?
    If you used different configurations to get the reported results, could you provide them and tell me which table in the paper I should refer to for comparison?

SQuAD -document-

Ranker, type: distill, step: 0, map: 0.492, mrr: 0.510, top_1: 0.309, top_3: 0.610, top_5: 0.814, top_7: 0.932, retrieval_rate: 0.468
 
Ranker, type: test, step: 0, map: 0.395, mrr: 0.412, top_1: 0.230, top_3: 0.467, top_5: 0.655, top_7: 0.800, retrieval_rate: 0.345
 
Ranker, type: distill, step: 0, map: 0.492, mrr: 0.510, top_1: 0.309, top_3: 0.610, top_5: 0.814, top_7: 0.932, retrieval_rate: 0.468
 
Ranker, type: test, step: 0, map: 0.395, mrr: 0.412, top_1: 0.230, top_3: 0.467, top_5: 0.655, top_7: 0.800, retrieval_rate: 0.345
 
Ranker, step: 21823, map: 0.888, mrr: 0.906, top_1: 0.872, top_3: 0.938, top_5: 0.956, top_7: 0.961
Reader, step: 21823, em: 45.535, f1: 51.553
 
Ranker, type: distill, step: 21823, map: 0.955, mrr: 0.967, top_1: 0.945, top_3: 0.988, top_5: 0.997, top_7: 0.999, retrieval_rate: 0.468
 
Ranker, type: test, step: 21823, map: 0.888, mrr: 0.906, top_1: 0.872, top_3: 0.938, top_5: 0.956, top_7: 0.961, retrieval_rate: 0.345
 
Ranker, step: 43646, map: 0.889, mrr: 0.907, top_1: 0.871, top_3: 0.939, top_5: 0.956, top_7: 0.962
Reader, step: 43646, em: 76.500, f1: 83.212
 
Ranker, type: test, step: 43646, map: 0.883, mrr: 0.909, top_1: 0.867, top_3: 0.943, top_5: 0.965, top_7: 0.976, retrieval_rate: 0.223
 
Reader, type: test, step: 43646, em: 77.550, f1: 84.379

TriviaQA -wiki-

Ranker, type: distill, step: 0, map: 0.632, mrr: 0.670, top_1: 0.537, top_3: 0.745, top_5: 0.850, top_7: 0.909, retrieval_rate: 0.406
 
Ranker, type: dev, step: 0, map: 0.594, mrr: 0.636, top_1: 0.514, top_3: 0.706, top_5: 0.800, top_7: 0.855, retrieval_rate: 0.349
 
Ranker, type: distill, step: 0, map: 0.632, mrr: 0.670, top_1: 0.537, top_3: 0.745, top_5: 0.850, top_7: 0.909, retrieval_rate: 0.406
 
Ranker, type: dev, step: 0, map: 0.594, mrr: 0.636, top_1: 0.514, top_3: 0.706, top_5: 0.800, top_7: 0.855, retrieval_rate: 0.349
 
Ranker, type: distill, step: 0, map: 0.632, mrr: 0.670, top_1: 0.538, top_3: 0.744, top_5: 0.850, top_7: 0.909, retrieval_rate: 0.406
 
Ranker, type: dev, step: 0, map: 0.595, mrr: 0.636, top_1: 0.514, top_3: 0.707, top_5: 0.801, top_7: 0.855, retrieval_rate: 0.349
 
Ranker, type: distill, step: 0, map: 0.632, mrr: 0.670, top_1: 0.538, top_3: 0.744, top_5: 0.850, top_7: 0.909, retrieval_rate: 0.406
 
Ranker, type: dev, step: 0, map: 0.595, mrr: 0.636, top_1: 0.514, top_3: 0.707, top_5: 0.801, top_7: 0.855, retrieval_rate: 0.349
 
Ranker, step: 22334, loss: 2.610, map: 0.776, mrr: 0.849, top_1: 0.797, top_3: 0.890, top_5: 0.920, top_7: 0.933
Reader, step: 22334, loss: 2.610, em: 40.636, f1: 53.360
 
Ranker, type: distill, step: 22334, map: 0.839, mrr: 0.903, top_1: 0.852, top_3: 0.941, top_5: 0.970, top_7: 0.984, retrieval_rate: 0.406
 
Ranker, type: dev, step: 22334, map: 0.776, mrr: 0.849, top_1: 0.797, top_3: 0.890, top_5: 0.920, top_7: 0.933, retrieval_rate: 0.349
 
Ranker, step: 44668, loss: 2.388, map: 0.784, mrr: 0.855, top_1: 0.807, top_3: 0.894, top_5: 0.921, top_7: 0.933
Reader, step: 44668, loss: 2.388, em: 51.295, f1: 64.485
 
Ranker, type: dev, step: 44668, map: 0.784, mrr: 0.855, top_1: 0.807, top_3: 0.894, top_5: 0.921, top_7: 0.933, retrieval_rate: 0.349
 
Reader, type: dev, step: 44668, em: 51.483, f1: 64.621

TriviaQA -unfiltered-

Ranker, type: distill, step: 0, map: 0.778, mrr: 0.809, top_1: 0.720, top_3: 0.872, top_5: 0.929, top_7: 0.957, retrieval_rate: 0.322
 
Ranker, type: dev, step: 0, map: 0.616, mrr: 0.645, top_1: 0.559, top_3: 0.701, top_5: 0.758, top_7: 0.791, retrieval_rate: 0.294
 
Ranker, type: distill, step: 0, map: 0.778, mrr: 0.809, top_1: 0.720, top_3: 0.872, top_5: 0.929, top_7: 0.957, retrieval_rate: 0.322
 
Ranker, type: dev, step: 0, map: 0.616, mrr: 0.645, top_1: 0.559, top_3: 0.701, top_5: 0.758, top_7: 0.791, retrieval_rate: 0.294
 
Ranker, step: 28727, loss: 2.729, map: 0.735, mrr: 0.780, top_1: 0.748, top_3: 0.804, top_5: 0.824, top_7: 0.832
Reader, step: 28727, loss: 2.729, em: 57.695, f1: 62.731
 
Ranker, type: distill, step: 28727, map: 0.900, mrr: 0.941, top_1: 0.912, top_3: 0.964, top_5: 0.981, top_7: 0.988, retrieval_rate: 0.322
 
Ranker, type: dev, step: 28727, map: 0.735, mrr: 0.780, top_1: 0.748, top_3: 0.804, top_5: 0.824, top_7: 0.832, retrieval_rate: 0.294
 
Ranker, step: 57454, loss: 2.763, map: 0.738, mrr: 0.781, top_1: 0.750, top_3: 0.804, top_5: 0.824, top_7: 0.832
Reader, step: 57454, loss: 2.763, em: 63.794, f1: 69.462
 
Ranker, type: dev, step: 57454, map: 0.738, mrr: 0.781, top_1: 0.750, top_3: 0.804, top_5: 0.824, top_7: 0.832, retrieval_rate: 0.294
 
Reader, type: dev, step: 57454, em: 63.714, f1: 69.311

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions