Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
*.pyc
*.egg-info
docs/build
.vscode
53 changes: 30 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,56 @@
DPimport: A command line glob importer for DPdash
=================================================
# DPimport: A command line glob importer for DPdash

DPimport is a command line tool for importing files into DPdash using a
simple [`glob`](https://en.wikipedia.org/wiki/Glob_(programming)) expression.
simple [`glob`](<https://en.wikipedia.org/wiki/Glob_(programming)>) expression.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need the <>s? Is it because of the parens in the url?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were pushes done to main so maybe this was done for a reason. I'm unsure what.


## Table of contents

1. [Installation](#installation)
2. [Configuration](#configuration)
3. [Usage](#usage)
4. [MongoDB](#mongodb)

## Installation
Just use `pip`

Option 1: Install via `pip`


```bash
pip install https://github.com/AMP-SCZ/dpimport.git
```

Option 2: Clone the repository and run manually

```bash
git clone https://github.com/AMP-SCZ/dpimport.git
cd dpimport
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python scripts/import.py -c config.yml '/PHOENIX/GENERAL/STUDY_A/SUB_001/DATA_TYPE/processed/*.csv'
```

## Configuration

DPimport requires a configuration file in YAML format, passed as a command
line argument with `-c|--config`, for establishing a MongoDB database
connection. You will find an example configuration file in the `examples`
directory within this repository.
line argument with `-c|--config`, for connecting with an instance of the DPdash
application API. The configuration file should contain the following fields:

api_url - Endpoint for the DPdash API
api_user - Username for the DPdash API
api_key - API key for the DPdash API
verify_ssl - Whether to verify SSL certificates (default: True)

## Usage

The main command line tool is `import.py`. You can use this tool to import any
DPdash-compatible CSV files or metadata files using the direct path to a file
DPdash-compatible CSV files or metadata files using the direct path to a file
or a glob expression (use single quotes to avoid shell expansion)

```bash
import.py -c config.yml '/PHOENIX/GENERAL/STUDY_A/SUB_001/DATA_TYPE/processed/*.csv'
import.py -c config.yml '/PHOENIX/GENERAL/STUDY_A/SUB_001/DATA_TYPE/processed/*.csv' -n 8
```

`-n 8` is for parallelly importing 8 files. The default is `-n 1`.


You may also now use the `**` recursive glob expression, for example:

```bash
Expand All @@ -55,18 +71,9 @@ and so on.

`directory/*/*.csv` matches only `directory/[subdirectory]/[filename].csv`. With a [recursive glob pattern](https://docs.python.org/3/library/glob.html#glob.glob), `directory/**/*.csv` will additionally match:

* `directory/[filename].csv` (no subdirectory)
* `directory/[subdirectory1]/[subdirectory2]/[filename].csv` (sub-subdirectory)
- `directory/[filename].csv` (no subdirectory)
- `directory/[subdirectory1]/[subdirectory2]/[filename].csv` (sub-subdirectory)

and so on, for as many levels deep as exist in the directory tree.

</details>



## MongoDB
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth noting something like "This used to require Mongo but doesn't anymore"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The app still uses mongo, but we don't write to it directly anymore


This tool requires MongoDB to be running and accessible with the credentials you
supply in the `config.yml` file. For tips on MongoDB as it is used in DPdash and DPimport,
see [the DPdash wiki](https://github.com/PREDICT-DPACC/dpdash/wiki/MongoDB-Tips).

104 changes: 0 additions & 104 deletions dpimport/__init__.py

This file was deleted.

6 changes: 0 additions & 6 deletions dpimport/__version__.py

This file was deleted.

102 changes: 0 additions & 102 deletions dpimport/database/__init__.py

This file was deleted.

Loading