Skip to content

Enhanced Elasticsearch indexing with progress tracking#26

Open
ZhishanQ wants to merge 2 commits intooneal2000:mainfrom
ZhishanQ:main
Open

Enhanced Elasticsearch indexing with progress tracking#26
ZhishanQ wants to merge 2 commits intooneal2000:mainfrom
ZhishanQ:main

Conversation

@ZhishanQ
Copy link

Added prep_elastic_with_tqdm.py with tqdm progress bars and improved error handling
Updated documentation to reference the new indexing script

@ZhishanQ
Copy link
Author

@oneal2000

This PR improves the Elasticsearch indexing process by:

Added prep_elastic_with_tqdm.py: A new script that enhances the original indexing functionality with:

These improvements enhance reproducibility and provide a better developer experience when working with the DRAGIN framework.

@oneal2000
Copy link
Owner

Hi @ZhishanQ,

Thank you so much for this pull request. This is incredibly helpful and addresses a major pain point.

More than half of the reproducibility issues we've faced have been related to the Elasticsearch indexing step. The lack of a progress bar was a significant source of this confusion—we had many users asking questions like, "Does this nohup process need to run in the background forever? Can I CTRL C this?"

Your new script with tqdm and better error handling is a fantastic solution.

I've done a visual review of the code, and it looks solid.

Before merging, we just need to run some internal experiments to verify that this new script doesn't affect the final reproducibility results. We've been overwhelmed by "cannot reproduce" emails and social media messages in the past, so I have to be extra cautious about this step.

This is excellent work, though, and we truly appreciate you taking the initiative to improve this. We'll follow up here as soon as our internal verification is complete.

Thanks again!

@ZhishanQ
Copy link
Author

ZhishanQ commented Nov 6, 2025

You are welcome! This repo of yours is very helpful for my current research project.
If you have any questions regarding the code I submitted, please feel free to contact me at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants