Enhanced Elasticsearch indexing with progress tracking#26
Enhanced Elasticsearch indexing with progress tracking#26ZhishanQ wants to merge 2 commits intooneal2000:mainfrom
Conversation
…ing with progress tracking and robust error handling
…r Elasticsearch indexing
|
This PR improves the Elasticsearch indexing process by: Added prep_elastic_with_tqdm.py: A new script that enhances the original indexing functionality with: These improvements enhance reproducibility and provide a better developer experience when working with the DRAGIN framework. |
|
Hi @ZhishanQ, Thank you so much for this pull request. This is incredibly helpful and addresses a major pain point. More than half of the reproducibility issues we've faced have been related to the Elasticsearch indexing step. The lack of a progress bar was a significant source of this confusion—we had many users asking questions like, "Does this Your new script with I've done a visual review of the code, and it looks solid. Before merging, we just need to run some internal experiments to verify that this new script doesn't affect the final reproducibility results. We've been overwhelmed by "cannot reproduce" emails and social media messages in the past, so I have to be extra cautious about this step. This is excellent work, though, and we truly appreciate you taking the initiative to improve this. We'll follow up here as soon as our internal verification is complete. Thanks again! |
|
You are welcome! This repo of yours is very helpful for my current research project. |
Added prep_elastic_with_tqdm.py with tqdm progress bars and improved error handling
Updated documentation to reference the new indexing script