Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 34 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,38 @@
# babel_datapipeline
Data processing pipeline for Babel

Luigi tasks are in the luigi_pipeline file.
### Command to run
`luigi --module babel_datapipeline tasks.[task_file].[specific_task] --local-scheduler --date [yyyy-mm-dd]`

This is the command to run locally. `[task_file]` and `[specific_task]` refer to the tasks listed in Current Tasks.

### Current Tasks

**Infomap**
- PajekFactory
- InfomapTask

**IO**
- LocalTargetInputs
- AminerS3Targets
- DynamoOutputTask
- *Future dataset S3 targets would go here*

**Parsers**
- AMinerParse
- *Future dataset parser tasks go here*

**Recommenders**
- CocitationTask
- BibcoupleTask
- EFTask

### Configuration
Luigi specific configuration as described [here](http://luigi.readthedocs.org/en/stable/configuration.html) can be found in configs/luigi.cfg. Dataset configuration can be found in configs/default.cfg.

### Output files
Currently the datapipeline dumps the outputs of various steps of the pipeline in the folder in which the command is run. In doing so it will create folders named `citation_dict`, `infomap_output`, `pajek_files`, and `recs`.

### Requirements
This module, in addition to the packages noted in the setup.py and requirements.txt files, requires that [Infomap](http://www.mapequation.org/code.html#Installation) is installed and included in PATH. Additionally, in order to use any of the AWS IO tasks, [boto3 credentials](http://boto3.readthedocs.org/en/latest/guide/configuration.html) must be configured. So far `region`, `aws_access_key_id`, and `aws_secret_access_key` are required.

Current command to run:
`luigi --module luigi_pipeline [parser or recommender task] --local-scheduler --date [yyyy-mm-dd]`
193 changes: 0 additions & 193 deletions babel_datapipeline/database/storage.py

This file was deleted.

Loading