[WIP] Convert to python package with cli#2
Draft
ahmed-shariff wants to merge 9 commits intovitality-vis:mainfrom
Draft
[WIP] Convert to python package with cli#2ahmed-shariff wants to merge 9 commits intovitality-vis:mainfrom
ahmed-shariff wants to merge 9 commits intovitality-vis:mainfrom
Conversation
Member
|
Thanks @ahmed-shariff, especially for the kind of API you have in mind - looks very promising. I will set it up locally this weekend and get back to you with a more detailed response. |
96854da to
4e79979
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Following the discussion in #1 , this PR aims for the following:
I am using poetry to manage the packaging aspect. The package is called
paperscraper.process: command group for the different processes to process data. All subcommand can take one optional flag-f/--force. When this flag is not a given processes will run only when the corresponding output of the process doesn't exist.run_all: run all the processes.db: clean the xml filevenues: extract unique venuesdata-extraction: extract the data from dblp snapshotcollect-data: scrape additional informationpostprocess: clean and extract unique datasearch: takespatternstring and returns any entries that has a match in the title or abstract. By default uses fuzzy matching. Has the following options:--venue: filter by venue. Can have multiple--venue. Each can be a partial match to either full name or short name.--author: filter by author. Can have multiple--author. Each can be a partial match.--re: a flag, when set, thepatternwill be treated as regex.--fuzzy-max-difference: the maximum number of differences allowed from thepatternto get a match.list: summery of the data (lists venues)