Conversation
There was a problem hiding this comment.
comment-zouryou is MIT licensed, so we should probably give credit in our LICENSE file. Just a small attribution at the end like:
Methods and logic borrowed from the following programs. See their respective licenses for more details:
comment-zouryou (https://github.com/tanbatu/comment-zouryou): Copyright (c) 2022 tanbatu, MIT LicenseIf it's easy enough, feel free to add a flag. I think it should be pretty simple with the utility method you added to get the timestamp. Probably --comments-date="2025-05-05 00:00:00" or whatever reasonable granularity makes sense.
|
Hi @AlexAplin, thanks for reviewing my code. I've made changes to resolves most of the issues, but I need clarification for some. I plan to implement the datetime range flag once we have resolved all of this. By default, datetime range will be from current unix timestamp (start) until 2007-03-03 11:59 pm unix timestamp (end) (as set comment-zouryou) and cannot exceed that range. Additionally, by default nndownload will try to fetch all comments in the datetime range or there are no more comments to fetch, whichever comes first. Some videos have millions of comments and it might take a very long time to fetch them all, so I should include a flag for the comment fetching to stop once it has fetched X number of comments (or perhaps timeout after some time? not sure which is better). This limit will take priority, then datetime range, and finally no more comments. I would love to hear your thoughts. |
March 6th is when γ launched, along with the first video uploads, but maybe test data goes back to then. I think it's probably fine to use that as a limit if you specify it came from comment-zouryou.
I'd add
If any of the new flags are specified without Hope this makes sense, but please let me know if you have any questions. Thanks for your efforts! |
I'll specify.
On some videos, such as Anyways, this sounds good to me, just that we might not get the exact number of comments given that per fetch number is variable, but that shouldn't be an issue. Just add in the flag description it might not be exact.
I'll make it datetime instead of just date for sake of granularity. If only date is provided, assume 11:59:59. We should also want to accomodate the flags Off-topic: do you kmow what easy threads/comments are? What makes them different from main? |
|
That additional combination is a good call, feel free to handle that.
On sm9 I get 980 comments received in the browser request to |
|
Hi @AlexAplin, thanks for the reply. Yes, the owner thread can still be fetched ( I've added the flags you've requested. Note that I've changed You can test the following commands:
In addition, I've changed the comment fetching logic such that it will append to global Since downloading comments can take quite a while, I want to implement an estimated progress output. Something that takes one line per thread, like:
If you have any preferences on the format or tool to use for this, let me know. |
|
We settled on using rich for progress bars, so specifying the total should work, and you can set up a task for each thread if desired. |
|
@AlexAplin added the progress bar + a check on whether comments.json exists so that we don't accidentally overwrite previous progress. |
nndownload/nndownload.py
Outdated
| # Save in case of success and on interrupt) | ||
| with open(filename, "w", encoding="utf-8") as file: | ||
| json.dump(COMMENTS_DATA_JSON, file, indent=4, ensure_ascii=False, sort_keys=True) | ||
| json.dump(COMMENTS_DATA_JSON, file, indent=None, ensure_ascii=False, sort_keys=True) |
There was a problem hiding this comment.
halves the space taken from my very limited testing.
|
This got really spaghetti-fied. I've simplified a lot of the logic and made improvements. I've tested the different flows but may need reverification. I also removed the save on Ctrl+C because of the threading changes, but should be decently easy to add back I think. I'll leave some feedback to explain my changes |
nndownload/nndownload.py
Outdated
| # There are no comments before lastTime to fetch | ||
| break | ||
|
|
||
| # If we got the same comments as last time, we should stop |
There was a problem hiding this comment.
This seems really unnecessary, especially iterating over every comment. Isn't it really unlikely that we would retrieve the exact same response twice? If this is a concern, it should be sufficient to see if one specific last ID is present in both requests. I've taken this out for now
There was a problem hiding this comment.
I have gotten this issue before from some videos, and I think that is why comment-zouryou stops fetching when it sees comment id 1-5, but that can lead to possibly missing comment id 1-4 if 5 was found. That is why I introduced this check so that we don't get stuck in a loop getting the same ids again.
Your check is way better though, I'll implement that.
There was a problem hiding this comment.
Are you planning to work on this still?
|
@AlexAplin sorry, been a bit busy. I'll try to finish this up in around 3-5 days. |
No rush, thanks for your efforts |
|
@shovel-kun Hi again, just checking in. I'd like to get this merged before #205. If it's okay, I can make some of the changes on the branch |
|
Yes, sorry, go ahead @AlexAplin |
3889fee to
7235f51
Compare
|
Hi @AlexAplin, I've finally got the time to address the issues you've mentioned. Review when you can, thanks. |
Resolves #134
Comment fetching logic is taken from https://github.com/tanbatu/comment-zouryou.
If you want, a flag could be added for specifying a timestamp.