-
Notifications
You must be signed in to change notification settings - Fork 0
381 update import script to json payload #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
9fdfcc4
56038b8
aa2f88b
ca9246c
ec46ad2
c704d46
8a78722
e07ffd7
4ef295a
c756537
b013a59
adfdd5d
4ebafb3
3483283
f08527f
2dc1a04
4bbacf4
dae624f
f03fc80
785ff14
cf69b84
bb1ac63
bb14b79
dbf699f
0908092
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,3 +2,4 @@ | |
| *.pyc | ||
| *.egg-info | ||
| docs/build | ||
| .vscode | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,40 +1,56 @@ | ||
| DPimport: A command line glob importer for DPdash | ||
| ================================================= | ||
| # DPimport: A command line glob importer for DPdash | ||
|
|
||
| DPimport is a command line tool for importing files into DPdash using a | ||
| simple [`glob`](https://en.wikipedia.org/wiki/Glob_(programming)) expression. | ||
| simple [`glob`](<https://en.wikipedia.org/wiki/Glob_(programming)>) expression. | ||
|
|
||
| ## Table of contents | ||
|
|
||
| 1. [Installation](#installation) | ||
| 2. [Configuration](#configuration) | ||
| 3. [Usage](#usage) | ||
| 4. [MongoDB](#mongodb) | ||
|
|
||
| ## Installation | ||
| Just use `pip` | ||
|
|
||
| Option 1: Install via `pip` | ||
|
|
||
|
|
||
| ```bash | ||
| pip install https://github.com/AMP-SCZ/dpimport.git | ||
| ``` | ||
|
|
||
| Option 2: Clone the repository and run manually | ||
|
|
||
| ```bash | ||
| git clone https://github.com/AMP-SCZ/dpimport.git | ||
| cd dpimport | ||
| python -m venv venv | ||
| source venv/bin/activate | ||
| pip install -r requirements.txt | ||
| python scripts/import.py -c config.yml '/PHOENIX/GENERAL/STUDY_A/SUB_001/DATA_TYPE/processed/*.csv' | ||
| ``` | ||
|
|
||
| ## Configuration | ||
|
|
||
| DPimport requires a configuration file in YAML format, passed as a command | ||
| line argument with `-c|--config`, for establishing a MongoDB database | ||
| connection. You will find an example configuration file in the `examples` | ||
| directory within this repository. | ||
| line argument with `-c|--config`, for connecting with an instance of the DPdash | ||
| application API. The configuration file should contain the following fields: | ||
|
|
||
| api_url - Endpoint for the DPdash API | ||
| api_user - Username for the DPdash API | ||
| api_key - API key for the DPdash API | ||
| verify_ssl - Whether to verify SSL certificates (default: True) | ||
|
|
||
| ## Usage | ||
|
|
||
| The main command line tool is `import.py`. You can use this tool to import any | ||
| DPdash-compatible CSV files or metadata files using the direct path to a file | ||
| DPdash-compatible CSV files or metadata files using the direct path to a file | ||
| or a glob expression (use single quotes to avoid shell expansion) | ||
|
|
||
| ```bash | ||
| import.py -c config.yml '/PHOENIX/GENERAL/STUDY_A/SUB_001/DATA_TYPE/processed/*.csv' | ||
| import.py -c config.yml '/PHOENIX/GENERAL/STUDY_A/SUB_001/DATA_TYPE/processed/*.csv' -n 8 | ||
| ``` | ||
|
|
||
| `-n 8` is for parallelly importing 8 files. The default is `-n 1`. | ||
|
|
||
|
|
||
| You may also now use the `**` recursive glob expression, for example: | ||
|
|
||
| ```bash | ||
|
|
@@ -55,18 +71,9 @@ and so on. | |
|
|
||
| `directory/*/*.csv` matches only `directory/[subdirectory]/[filename].csv`. With a [recursive glob pattern](https://docs.python.org/3/library/glob.html#glob.glob), `directory/**/*.csv` will additionally match: | ||
|
|
||
| * `directory/[filename].csv` (no subdirectory) | ||
| * `directory/[subdirectory1]/[subdirectory2]/[filename].csv` (sub-subdirectory) | ||
| - `directory/[filename].csv` (no subdirectory) | ||
| - `directory/[subdirectory1]/[subdirectory2]/[filename].csv` (sub-subdirectory) | ||
|
|
||
| and so on, for as many levels deep as exist in the directory tree. | ||
|
|
||
| </details> | ||
|
|
||
|
|
||
|
|
||
| ## MongoDB | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it worth noting something like "This used to require Mongo but doesn't anymore"?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The app still uses mongo, but we don't write to it directly anymore |
||
|
|
||
| This tool requires MongoDB to be running and accessible with the credentials you | ||
| supply in the `config.yml` file. For tips on MongoDB as it is used in DPdash and DPimport, | ||
| see [the DPdash wiki](https://github.com/PREDICT-DPACC/dpdash/wiki/MongoDB-Tips). | ||
|
|
||
This file was deleted.
This file was deleted.
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need the
<>s? Is it because of the parens in the url?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were pushes done to main so maybe this was done for a reason. I'm unsure what.