From cdb8261f0cdc2f507b5e803aeedb85e94ab79aaf Mon Sep 17 00:00:00 2001 From: Norman Hooper Date: Fri, 9 Jan 2026 17:44:17 +0000 Subject: [PATCH] Overhaul documentation --- CONTRIBUTING.md | 195 +++++++++ LICENSE | 23 + README.md | 806 +++-------------------------------- docs/database-integration.md | 373 ++++++++++++++++ docs/development.md | 418 ++++++++++++++++++ docs/index.md | 74 ++++ docs/library-usage.md | 209 +++++++++ docs/minilinq-reference.md | 240 +++++++++++ docs/output-formats.md | 300 +++++++++++++ docs/query-formats.md | 272 ++++++++++++ docs/scheduling.md | 118 +++++ docs/testing.md | 251 +++++++++++ docs/user-location-data.md | 352 +++++++++++++++ 13 files changed, 2894 insertions(+), 737 deletions(-) create mode 100644 CONTRIBUTING.md create mode 100644 LICENSE create mode 100644 docs/database-integration.md create mode 100644 docs/development.md create mode 100644 docs/index.md create mode 100644 docs/library-usage.md create mode 100644 docs/minilinq-reference.md create mode 100644 docs/output-formats.md create mode 100644 docs/query-formats.md create mode 100644 docs/scheduling.md create mode 100644 docs/testing.md create mode 100644 docs/user-location-data.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..8efdbfc --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,195 @@ +Contributing to CommCare Export +=============================== + +Thank you for your interest in contributing to CommCare Export! This +document provides guidelines and instructions for contributing. + +> [!TIP] +> This guide covers the contribution process, pull requests, and +> community guidelines. For detailed technical information about the +> codebase, architecture, and development workflows, see the +> [Development Guide](docs/development.md). + + +Getting Started +--------------- + +1. Sign up for GitHub at https://github.com if you haven't already +2. Fork the repository at https://github.com/dimagi/commcare-export +3. Follow the setup instructions in the + [Development Guide](docs/development.md) +4. Create a feature branch for your changes + +For detailed environment setup, dependencies, and project structure, see +the [Development Guide](docs/development.md). + + +Making Changes +-------------- + +1. Create a feature branch from `master` +2. Make your edits following code style guidelines +3. Write or update tests for your changes +4. Run tests and type checks +5. Commit with clear messages +6. Push your branch and submit a pull request + +See the [Development Guide](docs/development.md) for detailed workflows, +debugging tips, and common development tasks. + + +Code Style and Standards +------------------------ + +- Follow PEP 8 style guidelines +- Use clear, descriptive names +- Add docstrings to public functions and classes +- Use type hints sparingly and meaningfully +- Run type checks: + `mypy --install-types commcare_export/ tests/ migrations/` + +See the +[Development Guide](docs/development.md#type-hints-and-type-checking) +for detailed coding standards and guidelines. + + +Testing +------- + +All pull requests must include tests: + +- Add tests for new features and bug fixes +- Ensure all tests pass: `pytest` +- Maintain or improve code coverage + +For detailed testing instructions, database setup, and troubleshooting, +see the [Testing Guide](docs/testing.md). + + +Pull Request Guidelines +----------------------- + +### Before Submitting + +- [ ] All tests pass +- [ ] Type checks pass (if applicable) +- [ ] Code follows project style guidelines +- [ ] Commit messages are clear and descriptive +- [ ] Documentation is updated (if applicable) + +### What Makes a Good Pull Request + +1. **Clear description** of what the PR does and why +2. **Single focus** - one feature or bug fix per PR +3. **Tests included** for new functionality or bug fixes +4. **Documentation updates** if behavior changes +5. **Clean commit history** (consider squashing if many small commits) + +### Pull Request Process + +1. Submit your PR with a clear title and description +2. Wait for CI checks to complete +3. Address any review comments +4. Once approved, a maintainer will merge your PR + + +Reporting Issues +---------------- + +### Bug Reports + +When reporting bugs, please include: + +- Clear description of the bug +- Steps to reproduce +- Expected behavior +- Actual behavior +- Python version and platform +- Relevant error messages or logs + +### Feature Requests + +When requesting features, please include: + +- Clear description of the feature +- Use case and motivation +- Examples of how it would be used +- Any relevant links or references + + +Release Process +--------------- + +For maintainers only. + +### Creating a Release + +1. **Create a tag** for the release: + ```shell + git tag -a "X.YY.0" -m "Release X.YY.0" + git push --tags + ``` + +2. **Create the distribution**: + ```shell + uv build + ``` + + Ensure that the archives in `dist/` have the correct version number + (matching the tag name). + +3. **Upload to PyPI**: + ```shell + uv publish + ``` + +4. **Verify the upload** at https://pypi.python.org/pypi/commcare-export + +5. **Create a release on GitHub** at + https://github.com/dimagi/commcare-export/releases + + Once the release is published, a GitHub workflow is kicked off that + compiles executables of the DET compatible with Linux and Windows + machines, adding them to the release as assets. + +### Release Artifacts + +After publishing a release on GitHub: + +- **Linux executable**: Built automatically via GitHub Actions +- **Windows executable**: Built automatically via GitHub Actions + +For Linux-based users: If you download and use the executable file, make +sure the file has the executable permission enabled: + +```shell +chmod +x commcare-export +``` + + +Community +--------- + +### Getting Help + +- **Documentation**: Check the [docs/](docs/) directory for technical + documentation +- **Discussions**: Use the [CommCare Forum](https://forum.dimagi.com/) + for questions + + +Additional Resources +-------------------- + +- [Development Guide](docs/development.md) - Detailed development setup + and architecture +- [Testing Guide](docs/testing.md) - Comprehensive testing documentation +- [Technical Documentation](docs/index.md) - Full technical + documentation + + +Thank You! +---------- + +We appreciate your contributions to CommCare Export. Your efforts help +improve the tool for everyone in the CommCare community. diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..2e0a67e --- /dev/null +++ b/LICENSE @@ -0,0 +1,23 @@ +MIT License +=========== + +Copyright (c) 2013-2026 Dimagi Inc. + +Permission is hereby granted, free of charge, to any person obtaining a +copy of this software and associated documentation files (the +“Software”), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions: + +The above copyright notice and this permission notice shall be included +in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS +OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE +SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. diff --git a/README.md b/README.md index d031d5b..3740168 100644 --- a/README.md +++ b/README.md @@ -1,791 +1,123 @@ CommCare Export =============== -https://github.com/dimagi/commcare-export +https://github.com/dimagi/commcare-export -[![Build Status](https://app.travis-ci.com/dimagi/commcare-export.svg?branch=master)](https://app.travis-ci.com/dimagi/commcare-export) +[![Build Status](https://github.com/dimagi/commcare-export/actions/workflows/test.yml/badge.svg)](https://github.com/dimagi/commcare-export/actions) [![Test coverage](https://coveralls.io/repos/dimagi/commcare-export/badge.png?branch=master)](https://coveralls.io/r/dimagi/commcare-export) [![PyPI version](https://badge.fury.io/py/commcare-export.svg)](https://badge.fury.io/py/commcare-export) A command-line tool (and Python library) to generate customized exports from the [CommCare HQ](https://www.commcarehq.org) [REST API](https://wiki.commcarehq.org/display/commcarepublic/Data+APIs). -* [User documentation](https://wiki.commcarehq.org/display/commcarepublic/CommCare+Data+Export+Tool) -* [Changelog](https://github.com/dimagi/commcare-export/releases) -Installation & Quick Start --------------------------- +Features +-------- -Following commands are to be run on a terminal or a command line. +- **Flexible Queries**: Create custom exports using Excel or JSON query specifications +- **Multiple Output Formats**: Export to CSV, Excel, JSON, Markdown, or SQL databases +- **Incremental Exports**: Automatically track and export only new/modified data +- **Organization Data**: Export and join user and location data with forms and cases +- **Python Library**: Use as a library to integrate with your own applications +- **Scheduling Support**: Run automated exports on Windows, Linux, or Mac -Once on a terminal window or command line, for simplicity, run commands from the home directory. -### Python +Quick Start +----------- -Check which Python version is installed. - -This tool is tested with Python versions from 3.9 to 3.13. - -```shell -$ python3 --version -``` -If Python is installed, its version will be shown. - -If Python isn't installed, [download and install](https://www.python.org/downloads/) -a version of Python from 3.9 to 3.13. - -## Virtualenv (Optional) - -It is recommended to set up a virtual environment for CommCare Export -to avoid conflicts with other Python applications. - -More about virtualenvs on https://docs.python.org/3/tutorial/venv.html - -Setup a virtual environment using: - -```shell -$ python3 -m venv venv -``` - -Activate virtual environment by running: - -```shell -$ source venv/bin/activate -``` - -**Note**: virtualenv needs to be activated each time you start a new terminal session or command line prompt. - -For convenience, to avoid doing that, you can create an alias to activate virtual environments in -"venv" directory by adding the following to your -`.bashrc` or `.zshrc` file: - -```shell -$ alias venv='if [[ -d venv ]] ; then source venv/bin/activate ; fi' -``` - -Then you can activate virtual environments with simply typing -```shell -$ venv -``` - -## Install CommCare Export - -[uv](https://docs.astral.sh/uv/) is a fast Python package installer and resolver. - -```shell -$ uv pip install commcare-export -``` - -## CommCare HQ - -1. Sign up for [CommCare HQ](https://www.commcarehq.org/) if you have not already. - -2. Create a project space and application. - -3. Visit the Release Manager, make a build, click the star to release it. - -4. Use Web Apps and fill out some forms. - -5. Modify one of example queries in the `examples/` directory, modifying the "Filter Value" column - to match your form XMLNS / case type. - See [this page](https://confluence.dimagi.com/display/commcarepublic/Finding+a+Form%27s+XMLNS) to - determine the XMLNS for your form. - -Now you can run the following examples: - -```shell -$ commcare-export \ - --query examples/demo-registration.xlsx \ - --project YOUR_PROJECT \ - --output-format markdown - -$ commcare-export \ - --query examples/demo-registration.json \ - --project YOUR_PROJECT \ - --output-format markdown - -$ commcare-export \ - --query examples/demo-deliveries.xlsx \ - --project YOUR_PROJECT \ - --output-format markdown - -$ commcare-export \ - --query examples/demo-deliveries.json \ - --project YOUR_PROJECT \ - --output-format markdown -``` - -You'll see the tables printed out. Change to `--output-format sql --output URL_TO_YOUR_DB --since DATE` to -sync all forms submitted since that date. - -Example query files are provided in both Excel and JSON format. It is recommended -to use the Excel format as the JSON format may change upon future library releases. - -Command-line Usage ------------------- - -The basic usage of the command-line tool is with a saved Excel or JSON query (see how to write these, below) - -```shell -$ commcare-export --commcare-hq \ - --username \ - --project \ - --api-version \ - --version \ - --query \ - --output-format \ - --output \ - --users \ - --locations \ - --with-organization -``` - -See `commcare-export --help` for the full list of options. - -### Logging - -By default, commcare-export writes logs to a file named -`commcare_export.log` in the current working directory. Log entries are -appended to this file across multiple runs to preserve history. - -You can customize the log directory: +### Installation ```shell -$ commcare-export --log-dir /path/to/logs \ - --query my-query.xlsx \ - --project myproject +uv pip install commcare-export ``` -To disable file logging and show all output in the console only: +### Basic Usage ```shell -$ commcare-export --no-logfile \ - --query my-query.xlsx \ - --project myproject -``` - -> [!NOTE] -> The log directory will be created automatically if it doesn't exist. -> If the specified directory cannot be created or written to, -> commcare-export will fall back to console-only logging with a warning -> message. - -There are example query files for the CommCare Demo App (available on the CommCare HQ Exchange) in the `examples/` -directory. - -`--output` - -CommCare Export uses SQLAlachemy's [create_engine](http://docs.sqlalchemy.org/en/latest/core/engines.html) to establish a database connection. This is based off of the [RFC-1738](https://www.ietf.org/rfc/rfc1738.txt) protocol. Some common examples: +# Export forms to Excel +commcare-export \ + --query examples/demo-registration.xlsx \ + --project YOUR_PROJECT \ + --output-format xlsx \ + --output data.xlsx +# Export to SQL database with incremental updates +commcare-export \ + --query examples/demo-registration.xlsx \ + --project YOUR_PROJECT \ + --output-format sql \ + --output postgresql://user:pass@localhost/dbname ``` -# Postgres -postgresql+psycopg2://scott:tiger@localhost/mydatabase - -# MySQL -mysql+pymysql://scott:tiger@localhost/mydatabase -# MSSQL -mssql+pyodbc://scott:tiger@localhost/mydatabases?driver=ODBC+Driver+17+for+SQL+Server -``` - -Excel Queries +Documentation ------------- -An Excel query is any `.xlsx` workbook. Each sheet in the workbook represents one table you wish -to create. There are two grouping of columns to configure the table: - - - **Data Source**: Set this to `form` to export form data, or `case` for case data. - - **Filter Name** / *Filter Value*: These columns are paired up to filter the input cases or forms. - - **Field**: The destination in your SQL database for the value. - - **Source Field**: The particular field from the form you wish to extract. This can be any JSON path. - - -JSON Queries ------------- - -JSON queries are a described in the table below. You build a JSON object that represents the query you have in mind. -A good way to get started is to work from the examples, or you could make an Excel query and run the tool -with `--dump-query` to see the resulting JSON query. - - -User and Location Data ----------------------- - -The --users and --locations options export data from a CommCare project that -can be joined with form and case data. The --with-organization option does all -of that and adds a field to Excel query specifications to be joined on. - -Specifying the --users option or --with-organization option will export an -additional table named 'commcare_users' containing the following columns: - -| Column | Type | Note | -|----------------------------------|------|-------------------------------------| -| id | Text | Primary key | -| default_phone_number | Text | | -| email | Text | | -| first_name | Text | | -| groups | Text | | -| last_name | Text | | -| phone_numbers | Text | | -| resource_uri | Text | | -| commcare_location_id | Text | Foreign key to `commcare_locations` | -| commcare_location_ids | Text | | -| commcare_primary_case_sharing_id | Text | | -| commcare_project | Text | | -| username | Text | | - -The data in the 'commcare_users' table comes from the [List Mobile Workers -API endpoint](https://confluence.dimagi.com/display/commcarepublic/List+Mobile+Workers). - -Specifying the --locations option or --with-organization options will export -an additional table named 'commcare_locations' containing the following columns: - -| Column | Type | Note | -|------------------------------|------|-----------------------------------------------| -| id | Text | | -| created_at | Date | | -| domain | Text | | -| external_id | Text | | -| last_modified | Date | | -| latitude | Text | | -| location_data | Text | | -| location_id | Text | Primary key | -| location_type | Text | | -| longitude | Text | | -| name | Text | | -| parent | Text | Resource URI of parent location | -| resource_uri | Text | | -| site_code | Text | | -| location_type_administrative | Text | | -| location_type_code | Text | | -| location_type_name | Text | | -| location_type_parent | Text | | -| *location level code* | Text | Column name depends on project's organization | -| *location level code* | Text | Column name depends on project's organization | - -The data in the 'commcare_locations' table comes from the Location API -endpoint along with some additional columns from the Location Type API -endpoint. The last columns in the table exist if you have set up -organization levels for your projects. One column is created for each -organization level. The column name is derived from the Location Type -that you specified. The column value is the location_id of the containing -location at that level of your organization. Consider the example organization -from the [CommCare help page](https://confluence.dimagi.com/display/commcarepublic/Setting+up+Organization+Levels+and+Structure). -A piece of the 'commcare_locations' table could look like this: - -| location_id | location_type_name | chw | supervisor | clinic | district | -|-------------|--------------------|--------|------------|--------|----------| -| 939fa8 | District | NULL | NULL | NULL | 939fa8 | -| c4cbef | Clinic | NULL | NULL | c4cbef | 939fa8 | -| a9ca40 | Supervisor | NULL | a9ca40 | c4cbef | 939fa8 | -| 4545b9 | CHW | 4545b9 | a9ca40 | c4cbef | 939fa8 | - -In order to join form or case data to 'commcare_users' and 'commcare_locations' -the exported forms and cases need to contain a field identifying which user -submitted them. The --with-organization option automatically adds a field -called 'commcare_userid' to each query in an Excel specification for this -purpose. Using that field, you can use a SQL query with a join to report -data about any level of you organization. For example, to count the number -of forms submitted by all workers in each clinic: - -```sql -SELECT l.clinic, - COUNT(*) -FROM form_table t -LEFT JOIN (commcare_users u - LEFT JOIN commcare_locations l - ON u.commcare_location_id = l.location_id) -ON t.commcare_userid = u.id -GROUP BY l.clinic; -``` - -Note that the table names 'commcare_users' and 'commcare_locations' are -treated as reserved names and the export tool will produce an error if -given a query specification that writes to either of them. - -The export tool will write all users to 'commcare_users' and all locations to -'commcare_locations', overwriting existing rows with current data and adding -rows for new users and locations. If you want to remove obsolete users or -locations from your tables, drop them and the next export will leave only -the current ones. If you modify your organization to add or delete levels, -you will change the columns of the 'commcare_locations' table and it is -very likely you will want to drop the table before exporting with the new -organization. - -Scheduling the DET ------------------- -Scheduling the DET to run at regular intervals is a useful tactic to keep your -database up to date with CommCare HQ. - -A common approach to scheduling DET runs is making use of the operating systems' scheduling -libraries to invoke a script to execute the `commcare-export` command. Sample scripts can be -found in the `examples/` directory for both Windows and Linux. - -### Windows -On Windows systems you can make use of the [task scheduler](https://sqlbackupandftp.com/blog/how-to-schedule-a-script-via-windows-task-scheduler/) -to run scheduled scripts for you. - -The `examples/` directory contains a sample script file, `scheduled_run_windows.bat`, which can be used by the -task scheduler to invoke the `commcare-export` command. - -To set up the scheduled task you can follow the steps below. -1. Copy the file `scheduled_run_windows.bat` to any desired location on your system (e.g. `Documents`) -2. Edit the copied `.bat` file and populate your own details -3. Follow the steps outlined [here](https://sqlbackupandftp.com/blog/how-to-schedule-a-script-via-windows-task-scheduler/), -using the .bat file when prompted for the `Program/script`. - - -### Linux -On a Linux system you can make use of the [crontab](https://www.techtarget.com/searchdatacenter/definition/crontab) -command to create scheduled actions (cron jobs) in the system. - -The `examples/` directory contains a sample script file, `scheduled_run_linux.sh`, which can be used by the cron job. -To set up the cron job you can follow the steps below. -1. Copy the example file to the home directory -> cp ./examples/scheduled_run_linux.sh ~/scheduled_run_linux.sh -2. Edit the file to populate your own details -> nano ~/scheduled_run_linux.sh -3. Create a cron job by appending to the crontab file -> crontab -e - -Make an entry below any existing cron jobs. The example below executes the script file at the top of -every 12th hour of every day -> 0 12 * * * bash ~/scheduled_run_linux.sh - -You can consult the [crontab.guru](https://crontab.guru/) tool which is very useful to generate and interpret -any custom cron schedules. - -Python Library Usage --------------------- - -As a library, the various `commcare_export` modules make it easy to +### For End Users - - Interact with the CommCare HQ REST API - - Execute "Minilinq" queries against the API (a very simple query language, described below) - - Load and save JSON representations of Minilinq queries - - Compile Excel configurations to Minilinq queries +See the comprehensive [User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET) for: -To directly access the CommCare HQ REST API: +- Installation and setup +- Creating queries with Excel +- Command-line usage +- Scheduling automated exports +- Common use cases and examples -```python -from commcare_export.checkpoint import CheckpointManagerWithDetails -from commcare_export.commcare_hq_client import CommCareHqClient, AUTH_MODE_APIKEY -from commcare_export.commcare_minilinq import get_paginator, PaginationMode +### For Developers -username = 'some@username.com' -domain = 'your-awesome-domain' -hq_host = 'https://commcarehq.org' -API_KEY= 'your_secret_api_key' +See the [Technical Documentation](docs/index.md) for: -api_client = CommCareHqClient(hq_host, domain, username, API_KEY, AUTH_MODE_APIKEY) -case_paginator=get_paginator(resource='case', pagination_mode=PaginationMode.date_modified) -case_paginator.init() -checkpoint_manager=CheckpointManagerWithDetails(None, None, PaginationMode.date_modified) +- [Python Library Usage](docs/library-usage.md) - Using commcare-export as a Python library +- [MiniLinq Reference](docs/minilinq-reference.md) - Query language documentation +- [Query Formats](docs/query-formats.md) - Excel and JSON query specifications +- [Output Formats](docs/output-formats.md) - Available output formats +- [Database Integration](docs/database-integration.md) - SQL database setup and usage +- [Development Guide](docs/development.md) - Contributing to the project -cases = api_client.iterate('case', case_paginator, checkpoint_manager=checkpoint_manager) -for case in cases: - print(case['case_id']) +Examples +-------- -``` - -To issue a `minilinq` query against it, and then print out that query in a JSON serialization: - -```python -import json -import sys -from commcare_export.minilinq import * -from commcare_export.commcare_hq_client import CommCareHqClient -from commcare_export.commcare_minilinq import CommCareHqEnv -from commcare_export.env import BuiltInEnv, JsonPathEnv -from commcare_export.writers import StreamingMarkdownTableWriter - -api_client = CommCareHqClient( - url="http://www.commcarehq.org", - project='your_project', - username='your_username', - password='password', - version='0.5' -) - -source = Map( - source=Apply( - Reference("api_data"), - Literal("form"), - Literal({"filter": {"term": {"app_id": "whatever"}}}) - ), - body=List([ - Reference("received_on"), - Reference("form.gender"), - ]) -) - -query = Emit( - 'demo-table', - [ - Literal('Received On'), - Literal('Gender') - ], - source -) - -print(json.dumps(query.to_jvalue(), indent=2)) - -results = query.eval(BuiltInEnv() | CommCareHqEnv(api_client) | JsonPathEnv()) - -if len(list(env.emitted_tables())) > 0: - with StreamingMarkdownTableWriter(sys.stdout) as writer: - for table in env.emitted_tables(): - writer.write_table(table) -``` - -Which will output JSON equivalent to this: - -```json -{ - "Emit": { - "headings": [ - { - "Lit": "Received On" - }, - { - "Lit": "Gender" - } - ], - "source": { - "Map": { - "body": { - "List": [ - { - "Ref": "received_on" - }, - { - "Ref": "form.gender" - } - ] - }, - "name": null, - "source": { - "Apply": { - "args": [ - { - "Lit": "form" - }, - { - "Lit": { - "filter": { - "term": { - "app_id": "whatever" - } - } - } - } - ], - "fn": { - "Ref": "api_data" - } - } - } - } - }, - "table": "demo-table" - } -} -``` +Example query files are provided in the [examples/](examples/) directory for both Excel and JSON formats. All examples work with the CommCare Demo App available on the CommCare HQ Exchange. - -MiniLinq Reference ------------------- - -The abstract syntax can be directly inspected in the `commcare_export.minilinq` module. Note that the choice between functions and primitives is deliberately chosen -to expose the structure of the MiniLinq for possible optimization, and to restrict the overall language. - -Here is a description of the astract syntax and semantics - -| Python | JSON | Which is evaluates to | -|-------------------------------|-----------------------------------------------------|----------------------------------| -| `Literal(v)` | `{"Lit": v}` | Just `v` | -| `Reference(x)` | `{"Ref": x}` | Whatever `x` resolves to in the environment | -| `List([a, b, c, ...])` | `{"List": [a, b, c, ...}` | The list of what `a`, `b`, `c` evaluate to | -| `Map(source, name, body)` | `{"Map": {"source": ..., "name": ..., "body": ...}` | Evals `body` for each elem in `source`. If `name` is provided, the elem will be bound to it, otherwise it will replace the whole env. | -| `FlatMap(source, name, body)` | `{"FlatMap": {"source" ... etc}}` | Flattens after mapping, like nested list comprehensions | -| `Filter(source, name, body)` | etc | | -| `Bind(value, name, body)` | etc | Binds the result of `value` to `name` when evaluating `body` | -| `Emit(table, headings, rows)` | etc | Emits `table` with `headings` and `rows`. Note that `table` is a string, `headings` is a list of expressions, and `rows` is a list of lists of expressions. See explanation below for emitted output. | -| `Apply(fn, args)` | etc | Evaluates `fn` to a function, and all of `args`, then applies the function to the args. | - -Built in functions like `api_data` and basic arithmetic and comparison are provided via the environment, -referred to be name using `Ref`, and utilized via `Apply`. - -List of builtin functions: - -| Function | Description | Example Usage | -|--------------------------------|--------------------------------------------------------------------------------|----------------------------------| -| `+, -, *, //, /, >, <, >=, <=` | Standard Math | | -| len | Length | | -| bool | Bool | | -| str2bool | Convert string to boolean. True values are 'true', 't', '1' (case insensitive) | | -| str2date | Convert string to date | | -| bool2int | Convert boolean to integer (0, 1) | | -| str2num | Parse string as a number | | -| format-uuid | Parse a hex UUID, and format it into hyphen-separated groups | | -| substr | Returns substring indexed by [first arg, second arg), zero-indexed. | substr(2, 5) of 'abcdef' = 'cde' | -| selected-at | Returns the Nth word in a string. N is zero-indexed. | selected-at(3) - return 4th word | -| selected | Returns True if the given word is in the value. | selected(fever) | -| count-selected | Count the number of words | | -| json2str | Convert a JSON object to a string | | -| template | Render a string template (not robust) | template({} on {}, state, date) | -| attachment_url | Convert an attachment name into it's download URL | | -| form_url | Output the URL to the form view on CommCare HQ | | -| case_url | Output the URL to the case view on CommCare HQ | | -| unique | Ouptut only unique values in a list | | - -Output Formats --------------- - -Your MiniLinq may define multiple tables with headings in addition to their body rows by using `Emit` -expressions, or may simply return the results of a single query. - -If your MiniLinq does not contain any `Emit` expressions, then the results of the expression will be -printed to standard output as pretty-printed JSON. - -If your MiniLinq _does_ contain `Emit` expressions, then there are many formats available, selected -via the `--output-format ` option, and it can be directed to a file with the `--output ` command-line option. - - - `csv`: Each table will be a CSV file within a Zip archive. - - `xls`: Each table will be a sheet in an old-format Excel spreadsheet. - - `xlsx`: Each table will be a sheet in a new-format Excel spreadsheet. - - `json`: The tables will each be a member of a JSON dictionary, printed to standard output - - `markdown`: The tables will be streamed to standard output in Markdown format (very handy for debugging your queries) - - `sql`: All data will be idempotently "upserted" into the SQL database you specify, including creating the needed tables and columns. - - -Dependencies ------------- - -Required dependencies will be automatically installed. Optional dependencies -for specific export formats can be installed as extras: +Try them out: ```shell -# To export "xlsx" -$ uv pip install "commcare-export[xlsx]" - -# To export "xls" -$ uv pip install "commcare-export[xls]" - -# To sync with a Postgres database -$ uv pip install "commcare-export[postgres]" - -# To sync with a mysql database -$ uv pip install "commcare-export[mysql]" - -# To sync with a database which uses odbc (e.g. mssql) -$ uv pip install "commcare-export[odbc]" - -# To sync with another SQL database supported by SQLAlchemy -$ uv pip install "commcare-export[base_sql]" -# Then install the Python package for your database +commcare-export \ + --query examples/demo-deliveries.xlsx \ + --project YOUR_PROJECT \ + --output-format markdown ``` + Contributing ------------ -0\. Sign up for GitHub, if you have not already, at https://github.com. - -1\. Fork the repository at https://github.com/dimagi/commcare-export. +We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for: -2\. Clone your fork, install into a virtualenv, and start a feature branch +- How to set up your development environment +- Code style guidelines +- Testing requirements +- Pull request process +- Release procedures -```shell -$ git clone git@github.com:your-username/commcare-export.git -$ cd commcare-export -$ uv venv -$ source .venv/bin/activate # On Windows: .venv\Scripts\activate -$ uv pip install -e ".[test]" -$ git checkout -b my-super-duper-feature -``` - -3\. Make your edits. -4\. Make sure the tests pass. The best way to test for all versions is to sign up for https://travis-ci.org and turn on automatic continuous testing for your fork. +Community +--------- -```shell -$ py.test -=============== test session starts =============== -platform darwin -- Python 2.7.3 -- pytest-2.3.4 -collected 17 items - -tests/test_commcare_minilinq.py . -tests/test_excel_query.py .... -tests/test_minilinq.py ........ -tests/test_repeatable_iterator.py . -tests/test_writers.py ... - -============ 17 passed in 2.09 seconds ============ -``` +- **Changelog**: See [GitHub Releases](https://github.com/dimagi/commcare-export/releases) for version history +- **Issues**: Report bugs or request features on [GitHub Issues](https://github.com/dimagi/commcare-export/issues) +- **Questions**: Check the [User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET) or open an issue -5\. Type hints are used in the `env` and `minilinq` modules. Check that any changes in those modules adhere to those types: -```shell -$ mypy --install-types @mypy_typed_modules.txt -``` - -6\. Push the feature branch up - -```shell -$ git push -u origin my-super-duper-feature -``` - -7\. Visit https://github.com/dimagi/commcare-export and submit a pull request. - -8\. Accept our gratitude for contributing: Thanks! - -Release process +Python Versions --------------- -1\. Create a tag for the release - -```shell -$ git tag -a "X.YY.0" -m "Release X.YY.0" -$ git push --tags -``` - -2\. Create the distribution - -```shell -$ uv build -``` - -Ensure that the archives in `dist/` have the correct version number (matching the tag name). - -3\. Upload to pypi - -```shell -$ uv publish -``` - -4\. Verify upload - -https://pypi.python.org/pypi/commcare-export - -5\. Create a release on github - -https://github.com/dimagi/commcare-export/releases - -Once the release is published a GitHub workflow is kicked off that compiles executables of the DET compatible with -Linux and Windows machines, adding it to the release as assets. - -[For Linux-based users] If you decide to download and use the executable file, please make sure the file has the executable permission enabled, -after which it can be invoked like any other executable though the command line. - - -Testing and Test Databases --------------------------- +CommCare Export is tested with Python 3.9, 3.10, 3.11, 3.12, and 3.13. -The following command will run the entire test suite (requires DB environment variables to be set as per below): -```shell -$ py.test -``` - -To run an individual test class or method you can run, e.g.: - -```shell -$ py.test -k "TestExcelQuery" -$ py.test -k "test_get_queries_from_excel" -``` - -To exclude the database tests you can run: - -```shell -$ py.test -m "not dbtest" -``` +License +------- -When running database tests, supported databases are PostgreSQL, MySQL, MSSQL. - -To run tests against selected databases can be done using test marks as follows: -```shell -$ py.test -m [postgres,mysql,mssql] -``` - -Use Docker and docker-compose to start database services for tests: - -1. Start the services: - ```shell - docker-compose up -d - ``` - -2. Wait for services to be healthy: - ```shell - docker-compose ps - ``` - -3. Run your tests. The default environment variables in - `tests/conftest.py` work automatically: - - PostgreSQL: `postgresql://postgres@localhost/` - - MySQL: `mysql+pymysql://travis@/` - - MS SQL Server: `mssql+pyodbc://SA:Password-123@localhost/` - - If needed, you can override with environment variables: - ```shell - export POSTGRES_URL='postgresql://postgres@localhost/' - export MYSQL_URL='mysql+pymysql://root@localhost/' - export MSSQL_URL='mssql+pyodbc://SA:Password-123@localhost/' - ``` -4. Stop the services when done: - ```shell - docker-compose down - ``` - To also remove the data volumes: - ```shell - docker-compose down -v - ``` - -> [!NOTE] -> For MS SQL Server tests, you'll need the ODBC Driver for SQL Server -> installed on your host system for the `pyodbc` connection to work. - -From [learn.microsoft.com](https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server) -([source](https://github.com/MicrosoftDocs/sql-docs/blob/live/docs/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server.md)) - -#### Debian/Ubuntu - -```shell -# Download the package to configure the Microsoft repo -curl -sSL -O https://packages.microsoft.com/config/debian/$(grep VERSION_ID /etc/os-release | cut -d '"' -f 2 | cut -d '.' -f 1)/packages-microsoft-prod.deb -# Install the package -sudo dpkg -i packages-microsoft-prod.deb -# Delete the file -rm packages-microsoft-prod.deb - -sudo apt-get update -sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 - -odbcinst -q -d -``` - -#### Mac OS - -```shell -/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)" -brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release -brew update -HOMEBREW_ACCEPT_EULA=Y brew install msodbcsql18 -``` - - -Integration Tests ------------------ -Running the integration tests requires API credentials from CommCare HQ -that have access to the `corpora` domain. This user should only have -access to the corpora domain. - -These need to be set as environment variables as follows: - -```shell -$ export HQ_USERNAME= -$ export HQ_API_KEY= -``` +MIT License - see [LICENSE](LICENSE) file for details. -For Travis builds these are included as encrypted vars in the travis -config. +Copyright (c) 2013-2026 Dimagi diff --git a/docs/database-integration.md b/docs/database-integration.md new file mode 100644 index 0000000..94b3956 --- /dev/null +++ b/docs/database-integration.md @@ -0,0 +1,373 @@ +# Database Integration + +*Part of [Technical Documentation](index.md)* + +CommCare Export can export data directly to SQL databases, automatically +creating tables and columns as needed, and supporting incremental +updates via checkpoints. + + +Overview +-------- + +When using the SQL output format, CommCare Export will: + +1. Connect to your database using SQLAlchemy +2. Automatically create tables that don't exist +3. Automatically add columns that don't exist +4. "Upsert" data (update existing rows, insert new ones) +5. Track export progress using checkpoints for incremental syncs + + +Connection Strings +------------------ + +CommCare Export uses SQLAlchemy's +[create_engine](http://docs.sqlalchemy.org/en/latest/core/engines.html), +which follows the [RFC-1738](https://www.ietf.org/rfc/rfc1738.txt) URL +format. + +### PostgreSQL + +```shell +# Basic format +postgresql://username:password@host:port/database + +# With psycopg2 driver (recommended) +postgresql+psycopg2://username:password@localhost/mydatabase + +# Example +commcare-export --query forms.xlsx --output-format sql \ + --output postgresql+psycopg2://scott:tiger@localhost/mydatabase +``` + +**Installation:** +```shell +uv pip install "commcare-export[postgres]" +``` + +### MySQL + +```shell +# Basic format +mysql://username:password@host:port/database + +# With pymysql driver (recommended) +mysql+pymysql://username:password@localhost/mydatabase + +# Example +commcare-export --query forms.xlsx --output-format sql \ + --output mysql+pymysql://scott:tiger@localhost/mydatabase +``` + +**Installation:** +```shell +uv pip install "commcare-export[mysql]" +``` + +### MS SQL Server + +```shell +# With pyodbc driver +mssql+pyodbc://username:password@host/database?driver=ODBC+Driver+17+for+SQL+Server + +# Example +commcare-export --query forms.xlsx --output-format sql \ + --output 'mssql+pyodbc://SA:Password-123@localhost/mydatabase?driver=ODBC+Driver+17+for+SQL+Server' +``` + +**Installation:** +```shell +uv pip install "commcare-export[odbc]" +``` + +> [!NOTE] +> Requires the ODBC Driver for SQL Server to be installed on your +> system. See [Testing Guide](testing.md#odbc-driver-installation) for +> instructions. + +### Other Databases + +For other SQLAlchemy-supported databases: + +```shell +# Install base SQL support +uv pip install "commcare-export[base_sql]" + +# Then install your database's Python driver +uv pip install your-database-driver +``` + +Refer to +[SQLAlchemy's documentation](http://docs.sqlalchemy.org/en/latest/core/engines.html) +for connection string formats. + + +Schema Management +----------------- + +### Automatic Table Creation + +When you first run an export to a database, CommCare Export will +automatically create tables for each `Emit` expression in your query: + +```shell +commcare-export --query forms.xlsx --output-format sql \ + --output postgresql://user:pass@localhost/mydb +``` + +If the `forms` table doesn't exist, it will be created with columns +matching your query. + +### Automatic Column Addition + +If you add new fields to your query, CommCare Export will automatically +add the corresponding columns to existing tables on the next run. + +**Example:** + +1. First run with columns: `patient_id`, `name` +2. Add `age` column to your Excel query +3. Next run automatically adds the `age` column to the database table + +### Column Types + +CommCare Export attempts to infer appropriate column types based on the +data: + +- Text fields: `TEXT` or `VARCHAR` +- Numbers: `INTEGER` or `NUMERIC` +- Dates: `TIMESTAMP` +- Booleans: `BOOLEAN` + +> [!NOTE] +> Type inference happens on first creation. If types are incorrect, drop +> the table and re-run to recreate it. + + +Upsert Behavior +--------------- + +CommCare Export performs "upserts" - it updates existing rows and +inserts new ones. + +### Row Identification + +Rows are identified by a composite key, typically including: + +- For forms: `form_id` (or equivalent unique identifier) +- For cases: `case_id` + +### Update vs Insert + +- If a row with the same key exists: **UPDATE** - Replace all column + values +- If no row with the key exists: **INSERT** - Add new row + +This means: +- Re-running exports is safe (no duplicates) +- Updated data in CommCare HQ will update the database +- Deleted data in CommCare HQ will remain in the database (exports don't + delete) + + +Checkpoints +----------- + +Checkpoints enable incremental exports by tracking the last successfully +exported data. + +### How Checkpoints Work + +1. First run: Export all data, save checkpoint +2. Subsequent runs: Export only data since last checkpoint +3. Checkpoint updated after successful export + +### Checkpoint Storage + +Checkpoints are stored in the database itself, in tables like: + +- `commcare_export_runs` - Track export runs +- Other checkpoint tables as needed + +### Manual Date Control + +You can override checkpoint behavior with command-line flags: + +```shell +# Export data since a specific date +commcare-export --query forms.xlsx --output-format sql \ + --output postgresql://user:pass@localhost/mydb \ + --since 2023-01-01 + +# Export data in a date range +commcare-export --query forms.xlsx --output-format sql \ + --output postgresql://user:pass@localhost/mydb \ + --since 2023-01-01 --until 2023-12-31 + +# Start fresh (ignore checkpoint) +commcare-export --query forms.xlsx --output-format sql \ + --output postgresql://user:pass@localhost/mydb \ + --start-over +``` + +### Checkpoint Files + +For non-SQL outputs, you can use checkpoint files: + +```shell +commcare-export --query forms.xlsx --output-format xlsx \ + --output data.xlsx \ + --since 2023-01-01 \ + --checkpoint-file checkpoint.json +``` + +This saves checkpoint state to a JSON file for the next run. + + +Performance Considerations +-------------------------- + +### Index Creation + +CommCare Export does not automatically create indexes. For better query +performance, create indexes on frequently queried columns: + +```sql +-- PostgreSQL example +CREATE INDEX idx_patient_id ON forms (patient_id); +CREATE INDEX idx_received_on ON forms (received_on); +``` + +### Large Datasets + +For very large exports: + +1. **Use --since flag**: Only export recent data on subsequent runs +2. **Use checkpoints**: Enable automatic incremental exports +3. **Batch size**: The tool handles pagination automatically +4. **Database tuning**: Configure your database for bulk inserts + +### Connection Pooling + +For repeated exports, the tool creates a new connection each time. For +high-frequency exports, consider using a connection pooler like +PgBouncer (PostgreSQL). + + +Troubleshooting +--------------- + +### Connection Issues + +**Problem:** Can't connect to database + +**Solutions:** +- Verify database is running and accessible +- Check connection string format (quote special characters) +- Test connection string with a simple SQLAlchemy script +- Check firewall rules and network connectivity +- Verify username/password are correct + +### Column Type Mismatches + +**Problem:** Data doesn't fit in column + +**Solutions:** +- Drop and recreate the table +- Manually alter the column type +- Update your query to transform data appropriately + +### Permission Errors + +**Problem:** User lacks permission to create tables/columns + +**Solutions:** +- Grant appropriate permissions (CREATE, ALTER, INSERT, UPDATE) +- Use a database superuser for initial setup +- Pre-create tables with appropriate schema + +### Duplicate Key Errors + +**Problem:** Multiple rows with same key in a single export + +**Solutions:** +- Check your query for duplicate data +- Ensure your data source filters are correct +- Review the MiniLinq query structure + + +Security Best Practices +----------------------- + +1. **Use environment variables** for passwords: + ```shell + export DB_URL='postgresql://user:password@localhost/db' + commcare-export --query forms.xlsx --output-format sql --output "$DB_URL" + ``` + +2. **Use read-only credentials** when possible (for queries, not exports) + +3. **Limit network access** to the database + +4. **Use SSL/TLS connections** for remote databases: + ``` + postgresql://user:pass@host/db?sslmode=require + ``` + +5. **Avoid putting passwords in scripts** - use environment variables or + credential files + + +Example Workflows +----------------- + +### Initial Setup + +```shell +# Install with PostgreSQL support +uv pip install "commcare-export[postgres]" + +# First export (creates tables and exports all data) +commcare-export \ + --project myproject \ + --query forms.xlsx \ + --output-format sql \ + --output postgresql://user:pass@localhost/mydb +``` + +### Scheduled Incremental Updates + +```shell +# Subsequent runs (exports only new data since last run) +commcare-export \ + --project myproject \ + --query forms.xlsx \ + --output-format sql \ + --output postgresql://user:pass@localhost/mydb + +# Checkpoints are automatic - only new/modified data is exported +``` + +### Complete Refresh + +```shell +# Drop the table +psql -c "DROP TABLE forms;" mydb + +# Re-run the export +commcare-export \ + --project myproject \ + --query forms.xlsx \ + --output-format sql \ + --output postgresql://user:pass@localhost/mydb +``` + + +See Also +-------- + +- [Output Formats](output-formats.md) - All available output formats +- [Scheduling](scheduling.md) - Automating regular exports +- [Testing Guide](testing.md) - Database testing and ODBC setup +- [User and Location Data](user-location-data.md) - Exporting organizational data diff --git a/docs/development.md b/docs/development.md new file mode 100644 index 0000000..1e77ada --- /dev/null +++ b/docs/development.md @@ -0,0 +1,418 @@ +Development Guide +================= + +*Part of [Technical Documentation](index.md)* + +This guide covers setting up a development environment for CommCare +Export and understanding the codebase structure. + + +Setting Up Development Environment +---------------------------------- + +> [!NOTE] +> This guide provides detailed technical setup information. For a quick +> start guide to contributing, see +> [CONTRIBUTING.md](../CONTRIBUTING.md). + +### Prerequisites + +- Python 3.9 or higher +- Git +- [uv](https://docs.astral.sh/uv/) + +### Installation Steps + +1. Fork and clone the repository: + ```shell + git clone git@github.com:your-username/commcare-export.git + cd commcare-export + ``` + +2. Create and activate a virtual environment: + ```shell + uv venv + source .venv/bin/activate # On Windows: .venv\Scripts\activate + ``` + +3. Install in development mode with test dependencies: + ```shell + uv pip install -e ".[test]" + ``` + +4. Verify the installation: + ```shell + commcare-export --version + pytest --version + mypy --version + ``` + +### Optional Dependencies + +For specific database or output format support: + +```shell +# PostgreSQL support +uv pip install -e ".[postgres]" + +# MySQL support +uv pip install -e ".[mysql]" + +# MS SQL Server support +uv pip install -e ".[odbc]" + +# Excel output support +uv pip install -e ".[xlsx,xls]" + +# Everything (for comprehensive development) +uv pip install -e ".[test,postgres,mysql,odbc,xlsx,xls]" +``` + + +Project Structure +----------------- + +### Main Package: `commcare_export/` + +**Core Modules:** + +- `cli.py` (558 lines) - Command-line interface implementation + - Argument parsing + - Main entry point + - Command orchestration + +- `minilinq.py` (593 lines) - MiniLinq query language core + - Abstract syntax tree (AST) definitions + - Query evaluation engine + - Core language primitives + +- `env.py` (639 lines) - Execution environments + - Built-in function environment + - JSON path environment + - Environment composition + +- `excel_query.py` (724 lines) - Excel query parsing + - Workbook parsing + - Query compilation to MiniLinq + - Excel-specific logic + +- `commcare_minilinq.py` (330 lines) - CommCare-specific extensions + - CommCare HQ environment + - API data functions + - Pagination handling + +- `commcare_hq_client.py` (333 lines) - REST API client + - HTTP client for CommCare HQ + - Authentication handling + - Resource iteration + +- `writers.py` (629 lines) - Output format writers + - CSV, Excel, JSON, Markdown writers + - SQL writer with upsert logic + - Streaming and buffered writers + +- `checkpoint.py` (523 lines) - Checkpoint management + - Checkpoint storage and retrieval + - Incremental export state tracking + - Multiple checkpoint strategies + +**Supporting Modules:** + +- `builtin_queries.py` - Pre-built queries for users/locations +- `utils.py`, `misc.py` - Utility functions +- `exceptions.py` - Custom exception types +- `data_types.py` - Data type definitions +- `jsonpath_utils.py` - JSON path utilities +- `repeatable_iterator.py` - Iterator utilities +- `specs.py` - Query specifications +- `version.py` - Version management +- `map_format.py` - Data format mapping +- `location_info_provider.py` - Location data handling +- `utils_cli.py` - CLI utilities + +### Other Directories + +- `tests/` - Test suite +- `migrations/` - Alembic database migrations +- `build_exe/` - Executable building configuration +- `examples/` - Example queries and scripts +- `docs/` - Technical documentation (this directory) + + +Code Organization +----------------- + +### MiniLinq Architecture + +The MiniLinq query language has three main components: + +1. **AST (`minilinq.py`)**: Abstract syntax tree defining query + structure + - Literal, Reference, List, Map, FlatMap, Filter, Bind, Emit, Apply + +2. **Evaluation (`minilinq.py`, `env.py`)**: Query execution engine + - Environment-based evaluation + - Lazy evaluation where possible + - Composable environments + +3. **Extensions (`commcare_minilinq.py`)**: CommCare-specific functions + - API data fetching + - Pagination + - CommCare-specific built-ins + +### Data Flow + +``` +Excel/JSON Query + ↓ + Parse/Load (excel_query.py) + ↓ + MiniLinq AST (minilinq.py) + ↓ + Evaluate with Env (env.py + commcare_minilinq.py) + ↓ + Fetch Data (commcare_hq_client.py) + ↓ + Transform Data (minilinq.py evaluation) + ↓ + Write Output (writers.py) +``` + +### Key Design Patterns + +1. **Environment Pattern**: Functions and data sources provided via + composed environments + +2. **Visitor Pattern**: AST traversal for evaluation and transformation + +3. **Strategy Pattern**: Multiple writers, pagination strategies, + checkpoint managers + +4. **Builder Pattern**: Query construction from Excel/JSON sources + + +Code Style +---------- + +The project follows standard Python conventions: + +- PEP 8 style guide +- Clear, descriptive names +- Docstrings for public functions + + +Type Hints and Type Checking +----------------------------- + +Type hints are treated as documentation, and as such are used sparingly. +Use type hints when: + +* A parameter's type is not obvious from its name +* It would be useful to know a parameter's class +* A function's or method's return value is not obvious from its name + +### Guidelines + +**When to add type hints:** +- Complex data structures +- Functions with non-obvious return types +- Public API methods and functions +- Callbacks and higher-order functions + +**Best practices:** +- If you add a type hint to one parameter, add hints to all parameters + and the return value for readability +- Use type aliases (e.g., + `type CredentialsType = tuple[UsernameType, PasswordType]`) where they + clarify the purpose of a type +- As with documentation, don't add type hints where the type is obvious. + +### Running Type Checks + +After making changes to typed modules, ensure type correctness: + +```shell +# Check all modules +mypy --install-types commcare_export/ tests/ migrations/ + +# Check specific file +mypy commcare_export/env.py +``` + + +Making Changes +-------------- + +### Feature Development Workflow + +1. **Create a feature branch** from `master`: + ```shell + git checkout -b my-super-duper-feature + ``` + +2. **Make your changes** following the code style guidelines + +3. **Write tests** for your changes: + - Add tests to appropriate test file in `tests/` + - Ensure new features have test coverage + - Run tests locally: `pytest` + +4. **Test your changes**: + ```shell + # Run all tests + pytest + + # Run specific test file + pytest tests/test_minilinq.py + ``` + For detailed testing instructions, database setup, and + troubleshooting, see the [Testing Guide](docs/testing.md). + +5. **Check type hints** (if modifying typed modules): + ```shell + mypy --install-types commcare_export/ tests/ migrations/ + ``` + +6. **Commit your changes** with clear messages: + ```shell + git add . + git commit -m "Add feature: clear description of what you did" + ``` + +7. **Push to your fork**: + ```shell + git push -u origin my-super-duper-feature + ``` + +8. **Submit a pull request**: + - Visit https://github.com/dimagi/commcare-export + - Create a pull request from your branch to `master` + - Fill out the PR description template + - Wait for CI checks and code review + +### Bug Fix Workflow + +Follow the same workflow as features, with these notes: +- Branch name: `fix-issue-123` or `fix-bug-description` +- Commit message: "Fix bug where [description]" +- Include a test that reproduces the bug and verifies the fix +- Reference the issue number in your PR description + +### Best Practices + +- **Keep changes focused**: One feature or bug fix per PR +- **Write good commit messages**: Clear, concise, and descriptive +- **Update documentation**: If behavior changes, update relevant docs +- **Run tests frequently**: Catch issues early +- **Ask for help**: Open a draft PR if you need feedback + + +Building Executables +-------------------- + +CommCare Export can be compiled to standalone executables for Linux and +Windows using PyInstaller. + +See [build_exe/README.md](../build_exe/README.md) for detailed +instructions. + +Quick build: + +```shell +cd build_exe +pip install -r requirements.txt +pyinstaller --clean commcare-export.spec +``` + + +Database Migrations +------------------- + +The project uses Alembic for database schema migrations (for checkpoint +tables). + +See [migrations/README.md](../migrations/README.md) for migration +instructions. + + +CI/CD +----- + +### GitHub Actions + +The project uses GitHub Actions for continuous integration: + +- **test.yml**: Runs tests on Python 3.9-3.13 across multiple platforms +- **release_actions.yml**: Builds executables on release + + +Debugging Tips +-------------- + +### Using pdb + +```python +# Add to code where you want to debug +import pdb; pdb.set_trace() +``` + +### Verbose Output + +```shell +# See detailed API requests and responses +commcare-export --query forms.xlsx --output-format markdown --verbose + +# See compiled MiniLinq query +commcare-export --query forms.xlsx --dump-query +``` + +### Test with Small Data + +```shell +# Limit date range for faster iteration +commcare-export --query forms.xlsx \ + --output-format markdown \ + --since 2023-01-01 --until 2023-01-02 +``` + + +Common Development Tasks +------------------------ + +### Adding a New Built-in Function + +1. Add the function to `env.py` in the `BuiltInEnv` class +2. Add tests in `tests/test_minilinq.py` +3. Document in `docs/minilinq-reference.md` + +### Adding a New Output Format + +1. Create a new writer class in `writers.py` inheriting from `TableWriter` +2. Implement required methods: `write_table()`, etc. +3. Register in CLI (`cli.py`) +4. Add tests in `tests/test_writers.py` +5. Document in `docs/output-formats.md` + +### Adding a New API Resource + +1. Add resource handling in `commcare_minilinq.py` +2. Add pagination support if needed +3. Add integration tests +4. Document usage + + +Resources +--------- + +- [SQLAlchemy Documentation](https://docs.sqlalchemy.org/) +- [pytest Documentation](https://docs.pytest.org/) +- [mypy Documentation](https://mypy.readthedocs.io/) +- [CommCare API Documentation](https://confluence.dimagi.com/display/commcarepublic/Data+APIs) + + +See Also +-------- + +- [Testing Guide](testing.md) - Running tests and test infrastructure +- [CONTRIBUTING.md](../CONTRIBUTING.md) - Contribution guidelines +- [Library Usage](library-usage.md) - Using commcare-export as a library diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..84e55c7 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,74 @@ +CommCare Export Technical Documentation +======================================= + +Welcome to the CommCare Export technical documentation. This +documentation is intended for developers who want to use +commcare-export as a Python library, contribute to the project, or +understand its internals. + +For end-user documentation about installing and using the command-line +tool, please see the +[User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET). + + +Quick Links +----------- + +- [Python Library Usage](library-usage.md) - Get started using + commcare-export as a library +- [MiniLinq Reference](minilinq-reference.md) - Query language + documentation +- [Contributing Guide](../CONTRIBUTING.md) - How to contribute to the + project + + +For Library Users +----------------- + +- [Python Library Usage](library-usage.md) - Using commcare-export as a + Python library +- [MiniLinq Reference](minilinq-reference.md) - Query language syntax + and built-in functions +- [API Client](library-usage.md#commcare-hq-api-client) - CommCare HQ + REST API client usage + + +Query Specifications +-------------------- + +- [Query Formats](query-formats.md) - Excel and JSON query formats +- [Output Formats](output-formats.md) - CSV, Excel, JSON, SQL, and + Markdown outputs + + +Advanced Topics +--------------- + +- [Database Integration](database-integration.md) - SQL database + connections and syncing +- [User and Location Data](user-location-data.md) - Exporting + organization data +- [Scheduling](scheduling.md) - Running DET on a schedule + + +Development +----------- + +- [Development Guide](development.md) - Setting up development + environment +- [Testing Guide](testing.md) - Running tests with multiple databases +- [Building Executables](../build_exe/README.md) - Creating standalone + binaries +- [Database Migrations](../migrations/README.md) - Using Alembic + migrations + + +Contributing +------------ + +See [CONTRIBUTING.md](../CONTRIBUTING.md) for information about: + +- Setting up your development environment +- Running tests +- Submitting pull requests +- Release process diff --git a/docs/library-usage.md b/docs/library-usage.md new file mode 100644 index 0000000..f46bdfc --- /dev/null +++ b/docs/library-usage.md @@ -0,0 +1,209 @@ +Python Library Usage +==================== + +*Part of [Technical Documentation](index.md)* + +As a library, the various `commcare_export` modules make it easy to: + +- Interact with the CommCare HQ REST API +- Execute "Minilinq" queries against the API (a very simple query + language, described in the + [MiniLinq Reference](minilinq-reference.md)) +- Load and save JSON representations of Minilinq queries +- Compile Excel configurations to Minilinq queries + + +CommCare HQ API Client +---------------------- + +To directly access the CommCare HQ REST API: + +```python +from commcare_export.checkpoint import CheckpointManagerWithDetails +from commcare_export.commcare_hq_client import CommCareHqClient, AUTH_MODE_APIKEY +from commcare_export.commcare_minilinq import get_paginator, PaginationMode + +username = 'some@username.com' +domain = 'your-awesome-domain' +hq_host = 'https://commcarehq.org' +API_KEY= 'your_secret_api_key' + +api_client = CommCareHqClient(hq_host, domain, username, API_KEY, AUTH_MODE_APIKEY) +case_paginator=get_paginator(resource='case', pagination_mode=PaginationMode.date_modified) +case_paginator.init() +checkpoint_manager=CheckpointManagerWithDetails(None, None, PaginationMode.date_modified) + +cases = api_client.iterate('case', case_paginator, checkpoint_manager=checkpoint_manager) + +for case in cases: + print(case['case_id']) + +``` + +### Authentication Modes + +The `CommCareHqClient` supports two authentication modes: + +- `AUTH_MODE_PASSWORD` - Username and password authentication +- `AUTH_MODE_APIKEY` - API key authentication (recommended) + +### Pagination + +The library provides different pagination strategies through the +`PaginationMode` enum: + +- `PaginationMode.date_modified` - Paginate by date modified + (recommended for cases) +- `PaginationMode.date_indexed` - Paginate by date indexed +- Other modes available in `commcare_minilinq.py` + + +Executing MiniLinq Queries +-------------------------- + +To issue a `minilinq` query against the API, and then print out that +query in a JSON serialization: + +```python +import json +import sys +from commcare_export.minilinq import * +from commcare_export.commcare_hq_client import CommCareHqClient +from commcare_export.commcare_minilinq import CommCareHqEnv +from commcare_export.env import BuiltInEnv, JsonPathEnv +from commcare_export.writers import StreamingMarkdownTableWriter + +api_client = CommCareHqClient( + url="http://www.commcarehq.org", + project='your_project', + username='your_username', + password='password', + version='0.5' +) + +source = Map( + source=Apply( + Reference("api_data"), + Literal("form"), + Literal({"filter": {"term": {"app_id": "whatever"}}}) + ), + body=List([ + Reference("received_on"), + Reference("form.gender"), + ]) +) + +query = Emit( + 'demo-table', + [ + Literal('Received On'), + Literal('Gender') + ], + source +) + +print(json.dumps(query.to_jvalue(), indent=2)) + +results = query.eval(BuiltInEnv() | CommCareHqEnv(api_client) | JsonPathEnv()) + +if len(list(env.emitted_tables())) > 0: + with StreamingMarkdownTableWriter(sys.stdout) as writer: + for table in env.emitted_tables(): + writer.write_table(table) +``` + +This will output JSON equivalent to this: + +```json +{ + "Emit": { + "headings": [ + { + "Lit": "Received On" + }, + { + "Lit": "Gender" + } + ], + "source": { + "Map": { + "body": { + "List": [ + { + "Ref": "received_on" + }, + { + "Ref": "form.gender" + } + ] + }, + "name": null, + "source": { + "Apply": { + "args": [ + { + "Lit": "form" + }, + { + "Lit": { + "filter": { + "term": { + "app_id": "whatever" + } + } + } + } + ], + "fn": { + "Ref": "api_data" + } + } + } + } + }, + "table": "demo-table" + } +} +``` + + +Environment Composition +----------------------- + +The MiniLinq query evaluation relies on composing multiple environments: + +- `BuiltInEnv()` - Provides built-in functions like math, string + operations, etc. +- `CommCareHqEnv(api_client)` - Provides the `api_data` function for + fetching from CommCare HQ +- `JsonPathEnv()` - Provides JSON path navigation (e.g., `form.gender`) + +These are composed using the `|` operator: + +```python +env = BuiltInEnv() | CommCareHqEnv(api_client) | JsonPathEnv() +results = query.eval(env) +``` + + +Module Overview +--------------- + +The main modules in the `commcare_export` package: + +- `commcare_hq_client` - REST API client for CommCare HQ +- `minilinq` - MiniLinq query language implementation +- `commcare_minilinq` - CommCare-specific MiniLinq extensions +- `env` - Execution environments for MiniLinq queries +- `excel_query` - Excel query parsing and compilation +- `writers` - Output format writers (CSV, Excel, SQL, JSON, Markdown) +- `checkpoint` - Checkpoint management for incremental exports +- `cli` - Command-line interface implementation + + +See Also +-------- + +- [MiniLinq Reference](minilinq-reference.md) - Complete language reference +- [Query Formats](query-formats.md) - Excel and JSON query specifications +- [Output Formats](output-formats.md) - Available output formats diff --git a/docs/minilinq-reference.md b/docs/minilinq-reference.md new file mode 100644 index 0000000..658d267 --- /dev/null +++ b/docs/minilinq-reference.md @@ -0,0 +1,240 @@ +MiniLinq Reference +================== + +*Part of [Technical Documentation](index.md)* + +MiniLinq is a simple query language for extracting and transforming data +from CommCare HQ. It can be expressed in both Python (for library +users) and JSON (for serialization and Excel compilation). + +The abstract syntax can be directly inspected in the +`commcare_export.minilinq` module. Note that the choice between +functions and primitives is deliberately chosen to expose the structure +of the MiniLinq for possible optimization, and to restrict the overall +language. + + +Abstract Syntax +--------------- + +Here is a description of the abstract syntax and semantics: + +| Python | JSON | Evaluates to | +|-------------------------------|----------------------------------------------------------|------------------------------------------------------------------| +| `Literal(v)` | `{"Lit": v}` | Just `v` | +| `Reference(x)` | `{"Ref": x}` | Whatever `x` resolves to in the environment | +| `List([a, b, c, ...])` | `{"List": [a, b, c, ...]}` | The list of what `a`, `b`, `c` evaluate to | +| `Map(source, name, body)` | `{"Map": {"source": ..., "name": ..., "body": ...}}` | Evals `body` for each elem in `source`. If `name` is provided, the elem will be bound to it, otherwise it will replace the whole env. | +| `FlatMap(source, name, body)` | `{"FlatMap": {"source": ..., "name": ..., "body": ...}}` | Flattens after mapping, like nested list comprehensions | +| `Filter(source, name, body)` | `{"Filter": {"source": ..., "name": ..., "body": ...}}` | Filters `source` keeping elements where `body` evaluates to true | +| `Bind(value, name, body)` | `{"Bind": {"value": ..., "name": ..., "body": ...}}` | Binds the result of `value` to `name` when evaluating `body` | +| `Emit(table, headings, rows)` | `{"Emit": {"table": ..., "headings": ..., "rows": ...}}` | Emits `table` with `headings` and `rows`. Note that `table` is a string, `headings` is a list of expressions, and `rows` is a list of lists of expressions. See [Output Formats](output-formats.md) for emitted output. | +| `Apply(fn, args)` | `{"Apply": {"fn": ..., "args": [...]}}` | Evaluates `fn` to a function, and all of `args`, then applies the function to the args. | + + +Examples +-------- + +### Basic Reference and Literal + +```python +# Reference a field from the current environment +Reference("form.name") # JSON: {"Ref": "form.name"} + +# Literal value +Literal("Hello") # JSON: {"Lit": "Hello"} +``` + +### List Construction + +```python +# Create a list of expressions +List([ + Reference("form.name"), + Reference("form.age"), + Reference("form.gender") +]) +``` + +### Map Operation + +```python +# Map over a list of forms, extracting specific fields +Map( + source=Apply(Reference("api_data"), Literal("form")), + name="form", + body=List([ + Reference("form.id"), + Reference("form.name") + ]) +) +``` + +### Filter Operation + +```python +# Filter forms where form.completed is true +Filter( + source=Apply(Reference("api_data"), Literal("form")), + name="form", + body=Apply( + Reference("="), + [Reference("form.completed"), Literal("true")] + ) +) +``` + +### Emit for Output + +```python +# Create a table output +Emit( + 'patient_data', + [ + Literal('Patient ID'), + Literal('Name'), + Literal('Age') + ], + Map( + source=Apply(Reference("api_data"), Literal("form")), + name="form", + body=List([ + Reference("form.patient_id"), + Reference("form.patient_name"), + Reference("form.patient_age") + ]) + ) +) +``` + + +Built-in Functions +------------------ + +Built-in functions like `api_data` and basic arithmetic and comparison +are provided via the environment, referred to by name using `Ref`, and +utilized via `Apply`. + +### Arithmetic and Comparison + +| Function | Description | +|------------------|--------------------------------| +| `+, -, *, //, /` | Standard arithmetic operations | +| `>, <, >=, <=` | Comparison operators | + +### Type Conversions + +| Function | Description | Example Usage | +|------------|-----------------------------------|-------------------------------------------------| +| `len` | Length of a string or list | `Apply(Reference("len"), [Reference("field")])` | +| `bool` | Convert to boolean | | +| `str2bool` | Convert string to boolean. True values are 'true', 't', '1' (case insensitive) | | +| `str2date` | Convert string to date | | +| `bool2int` | Convert boolean to integer (0, 1) | | +| `str2num` | Parse string as a number | | + +### String Operations + +| Function | Description | Example Usage | +|---------------|--------------------------------------------------------------|-------------------------------------| +| `substr` | Returns substring indexed by [first arg, second arg), zero-indexed | `substr(2, 5)` of 'abcdef' = 'cde' | +| `template` | Render a string template (not robust) | `template("{} on {}", state, date)` | +| `format-uuid` | Parse a hex UUID, and format it into hyphen-separated groups | | +| `json2str` | Convert a JSON object to a string | | + +### Multi-select Operations + +These functions are useful for working with CommCare multi-select +questions: + +| Function | Description | Example Usage | +|------------------|-----------------------------------------------------|------------------------------------| +| `selected-at` | Returns the Nth word in a string. N is zero-indexed | `selected-at(3)` - return 4th word | +| `selected` | Returns True if the given word is in the value | `selected("fever")` | +| `count-selected` | Count the number of words | | + +### CommCare-specific Functions + +| Function | Description | +|------------------|--------------------------------------------------| +| `attachment_url` | Convert an attachment name into its download URL | +| `form_url` | Output the URL to the form view on CommCare HQ | +| `case_url` | Output the URL to the case view on CommCare HQ | + +### List Operations + +| Function | Description | +|----------|-------------------------------------| +| `unique` | Output only unique values in a list | + + +Environment Concepts +-------------------- + +MiniLinq queries are evaluated in an environment that provides: + +1. **Built-in Functions**: Math, string operations, type conversions + (provided by `BuiltInEnv`) + +2. **Data Access**: The `api_data` function for fetching from CommCare + HQ (provided by `CommCareHqEnv`) + +3. **JSON Path Navigation**: Access to nested data structures + (provided by `JsonPathEnv`) + +Environments are composed using the `|` operator: + +```python +env = BuiltInEnv() | CommCareHqEnv(api_client) | JsonPathEnv() +``` + + +Converting Excel to JSON +------------------------ + +If you have an Excel query and want to see the corresponding MiniLinq +JSON, use the `--dump-query` option: + +```shell +commcare-export --query my-query.xlsx --dump-query +``` + +This will output the compiled MiniLinq query in JSON format without +executing it. + + +Optimization Tips +----------------- + +1. **Filter Early**: Apply filters as early as possible to reduce the + amount of data processed + +2. **Use Specific API Filters**: Leverage CommCare HQ's API filters + (e.g., `date_modified`) rather than filtering in MiniLinq + +3. **Minimize Nested Maps**: Deeply nested Map operations can be slow; + consider restructuring if possible + + +Debugging Strategies +-------------------- + +1. **Use Markdown Output**: Start with `--output-format markdown` to see + query results quickly + +2. **Dump Query JSON**: Use `--dump-query` to inspect the compiled + query + +3. **Test with Small Date Ranges**: Use `--since` and `--until` to limit + data while debugging + +4. **Check API Responses**: Use the `--verbose` flag to see API requests + and responses + + +See Also +-------- + +- [Python Library Usage](library-usage.md) - Using MiniLinq from Python +- [Query Formats](query-formats.md) - Excel and JSON query specifications +- [Examples](../examples/) - Example queries in both Excel and JSON formats diff --git a/docs/output-formats.md b/docs/output-formats.md new file mode 100644 index 0000000..6fb6bfb --- /dev/null +++ b/docs/output-formats.md @@ -0,0 +1,300 @@ +Output Formats +============== + +*Part of [Technical Documentation](index.md)* + +CommCare Export supports multiple output formats for your exported data. +The format is selected via the `--output-format` option, and the +destination can be specified with `--output`. + + +Format Overview +--------------- + +Your MiniLinq query may define multiple tables with headings +(using `Emit` expressions), or may simply return the results of a +single query: + +- **With `Emit` expressions**: Data will be written in the specified + format with multiple tables +- **Without `Emit` expressions**: Results will be output as + pretty-printed JSON to standard output + + +Available Formats +----------------- + +### CSV + +Each table will be a CSV file within a Zip archive. + +**Usage:** +```shell +commcare-export --query my-query.xlsx --output-format csv --output data.zip +``` + +**Characteristics:** +- Multiple tables = multiple CSV files in a single ZIP +- Compatible with all spreadsheet applications +- Good for sharing data with non-technical users +- File size can be smaller than Excel formats + +**When to use:** +- Exporting for analysis in R, Python, or other data tools +- Sharing data with users who don't have Excel +- When file size is a concern + +### XLS + +Each table will be a sheet in an old-format Excel spreadsheet (.xls). + +**Usage:** +```shell +commcare-export --query my-query.xlsx --output-format xls --output data.xls +``` + +**Requires:** `uv pip install "commcare-export[xls]"` + +**Characteristics:** +- Legacy Excel format (Excel 97-2003) +- Row limit: 65,536 rows per sheet +- Column limit: 256 columns +- Smaller file size than XLSX + +**When to use:** +- Compatibility with very old Excel versions +- When file size is critical +- **Not recommended** for new projects - use XLSX instead + +### XLSX + +Each table will be a sheet in a new-format Excel spreadsheet (.xlsx). + +**Usage:** +```shell +commcare-export --query my-query.xlsx --output-format xlsx --output data.xlsx +``` + +**Requires:** `uv pip install "commcare-export[xlsx]"` + +**Characteristics:** +- Modern Excel format (Excel 2007+) +- Row limit: 1,048,576 rows per sheet +- Column limit: 16,384 columns +- Widely compatible + +**When to use:** +- Sharing with Excel users +- When you need multiple related tables in one file +- Large datasets (within row limits) +- **Recommended** Excel format for most use cases + +### JSON + +The tables will each be a member of a JSON dictionary, printed to +standard output. + +**Usage:** +```shell +commcare-export --query my-query.xlsx --output-format json > data.json +``` + +**Characteristics:** +- Machine-readable format +- Preserves data types precisely +- Can be piped to other tools +- No external dependencies required + +**Output structure:** +```json +{ + "table1": [ + {"col1": "value1", "col2": "value2"}, + {"col1": "value3", "col2": "value4"} + ], + "table2": [ + {"col1": "value5", "col2": "value6"} + ] +} +``` + +**When to use:** +- Feeding data to another application +- Programmatic data processing +- When you need to preserve data types +- Integration with web services or APIs + +### Markdown + +The tables will be streamed to standard output in Markdown format. + +**Usage:** +```shell +commcare-export --query my-query.xlsx --output-format markdown +``` + +**Characteristics:** +- Human-readable text format +- Displays nicely in terminals +- Can be pasted into documentation +- Very fast (streaming output) +- No file size limits + +**Example output:** +```markdown +| Patient ID | Name | Visit Date | +|------------|------|------------| +| 001 | John | 2023-01-15 | +| 002 | Jane | 2023-01-16 | +``` + +**When to use:** +- **Debugging queries** (highly recommended) +- Quick data inspection +- Creating documentation +- Terminal-based workflows + +### SQL + +All data will be idempotently "upserted" into the SQL database you +specify, including creating the needed tables and columns. + +**Usage:** +```shell +# PostgreSQL +commcare-export --query my-query.xlsx --output-format sql \ + --output postgresql://user:password@localhost/dbname + +# MySQL +commcare-export --query my-query.xlsx --output-format sql \ + --output mysql+pymysql://user:password@localhost/dbname + +# MS SQL Server +commcare-export --query my-query.xlsx --output-format sql \ + --output 'mssql+pyodbc://user:password@localhost/dbname?driver=ODBC+Driver+17+for+SQL+Server' +``` + +**Characteristics:** +- Automatically creates tables and columns +- Upserts data (updates existing, inserts new) +- Supports incremental exports via checkpoints +- Multiple database backends supported + +**When to use:** +- Building a data warehouse +- Integration with BI tools (Tableau, PowerBI, etc.) +- Scheduled/recurring exports +- Large datasets requiring database performance +- When you need SQL query capabilities + +For complete details, see +[Database Integration](database-integration.md). + + +Connection String Formats +------------------------- + +CommCare Export uses SQLAlchemy's +[create_engine](http://docs.sqlalchemy.org/en/latest/core/engines.html), +which is based on [RFC-1738](https://www.ietf.org/rfc/rfc1738.txt). + +### PostgreSQL + +``` +postgresql://username:password@localhost/database_name +postgresql+psycopg2://username:password@localhost/database_name +``` + +**Requires:** `uv pip install "commcare-export[postgres]"` + +### MySQL + +``` +mysql://username:password@localhost/database_name +mysql+pymysql://username:password@localhost/database_name +``` + +**Requires:** `uv pip install "commcare-export[mysql]"` + +### MS SQL Server + +``` +mssql+pyodbc://username:password@localhost/database_name?driver=ODBC+Driver+17+for+SQL+Server +``` + +**Requires:** +- `uv pip install "commcare-export[odbc]"` +- ODBC Driver for SQL Server (see + [Testing Guide](testing.md#odbc-driver-installation)) + +### Other Databases + +For other SQLAlchemy-supported databases: + +```shell +uv pip install "commcare-export[base_sql]" +# Then install your database's Python driver +``` + + +Choosing an Output Format +------------------------- + +| Use Case | Recommended Format | Alternative | +|--------------------------|--------------------|--------------| +| Debugging queries | Markdown | JSON | +| Ad-hoc analysis | XLSX | CSV | +| Sharing with Excel users | XLSX | CSV | +| Data warehouse/BI tools | SQL | CSV + import | +| Programmatic processing | JSON | CSV | +| Large recurring exports | SQL | CSV | +| Web service integration | JSON | - | +| Documentation/reports | Markdown | XLSX | + + +Multiple Runs and Incremental Updates +------------------------------------- + +### File-based Formats (CSV, Excel, JSON, Markdown) + +- Each run completely replaces the previous output +- No incremental update capability +- Use `--since` flag to control date ranges + +### SQL Format + +- Supports incremental updates via checkpoints +- Automatically tracks last successful export +- Upserts data (no duplicates) +- See [Database Integration](database-integration.md#checkpoints) for + details + + +Performance Considerations +-------------------------- + +### Fastest to Slowest (for large datasets) + +1. **SQL** - Direct database write, can handle millions of rows +2. **JSON** - Streaming output, minimal processing +3. **Markdown** - Streaming output, text formatting overhead +4. **CSV** - Compression overhead +5. **XLSX** - Excel file format has higher overhead +6. **XLS** - Legacy format, slowest and most limited + +### Memory Usage + +- **Streaming formats** (Markdown, SQL): Low memory footprint +- **Buffered formats** (CSV, Excel, JSON): Entire dataset loaded in + memory +- For very large exports, use SQL format + + +See Also +-------- + +- [Database Integration](database-integration.md) - Detailed SQL + database documentation +- [Query Formats](query-formats.md) - Creating queries +- [Command-Line Usage](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/) - + Full CLI reference diff --git a/docs/query-formats.md b/docs/query-formats.md new file mode 100644 index 0000000..8fe3008 --- /dev/null +++ b/docs/query-formats.md @@ -0,0 +1,272 @@ +Query Formats +============= + +*Part of [Technical Documentation](index.md)* + +CommCare Export supports two query formats: Excel and JSON. Both formats +are compiled to [MiniLinq](minilinq-reference.md) for execution. + + +Excel Query Format +------------------ + +An Excel query is any `.xlsx` workbook. Each sheet in the workbook +represents one table you wish to create. This format is recommended as +it's more user-friendly and stable across library versions. + +### Structure + +There are several column groupings to configure each table: + +| Column Group | Description | +|------------------------------------|----------------------------------------------------------------------------------------| +| **Data Source** | Set this to `form` to export form data, or `case` for case data | +| **Filter Name** / **Filter Value** | These columns are paired up to filter the input cases or forms | +| **Field** | The destination column name in your output table | +| **Source Field** | The particular field from the form/case you wish to extract. This can be any JSON path | + +### Column Details + +#### Data Source Column + +- **Values**: `form` or `case` +- **Purpose**: Specifies whether to query form submissions or cases +- **Required**: Yes, one per sheet + +#### Filter Name / Filter Value Pairs + +These columns work together to filter the data retrieved from CommCare HQ: + +- **Filter Name**: The name of the filter (e.g., `xmlns`, `app_id`, + `case_type`) +- **Filter Value**: The value to filter by +- **Multiple Filters**: You can have multiple filter pairs in the same + sheet + +**Common Filter Examples:** + +| Filter Name | Filter Value | Description | +|-------------|----------------------------------------|-----------------------------| +| `xmlns` | `http://openrosa.org/formdesigner/...` | Filter forms by XMLNS | +| `app_id` | Your app ID | Filter forms by application | +| `case_type` | `patient` | Filter cases by type | + +**Finding Form XMLNS:** + +To determine the XMLNS for your form, see +[Finding a Form's XMLNS](https://confluence.dimagi.com/display/commcarepublic/Finding+a+Form%27s+XMLNS). + +#### Field Column + +- **Purpose**: The name of the column in your output table +- **Format**: Any valid column name (avoid special characters) +- **Example**: `patient_name`, `visit_date`, `form_id` + +#### Source Field Column + +- **Purpose**: The JSON path to extract data from the form or case +- **Format**: JSON path notation (dot-separated) +- **Examples**: + - `form.patient_name` - Extract patient_name from form + - `received_on` - Extract the received_on timestamp + - `form.visit.symptoms.fever` - Extract nested fields + - `case_id` - Extract the case ID + +**JSON Path Support:** + +The Source Field supports full JSON path notation, allowing you to: +- Access nested objects: `form.patient.name.first` +- Access array elements: `form.children[0].name` +- Use wildcards: `form.*.value` + +### Example Excel Query + +Here's what a simple Excel query sheet might look like: + +| Data Source | Filter Name | Filter Value | Field | Source Field | +|-------------|-------------|--------------------------------------|-------------------|-------------------| +| form | xmlns | http://openrosa.org/.../registration | | | +| | | | Patient ID | form.patient_id | +| | | | Patient Name | form.patient_name | +| | | | Registration Date | received_on | +| | | | Visit Type | form.visit_type | + +This query would: +1. Export form data +2. Filter to forms with the specified XMLNS +3. Create a table with 4 columns: Patient ID, Patient Name, Registration + Date, and Visit Type + +### Multiple Sheets + +Each sheet in the Excel workbook creates a separate output table. The +sheet name becomes the table name (for SQL outputs) or sheet name +(for Excel outputs). + +**Example workbook structure:** + +- Sheet: `patient_registrations` - Export registration forms +- Sheet: `patient_visits` - Export visit forms +- Sheet: `patient_cases` - Export patient cases + +### Best Practices + +1. **Use Descriptive Sheet Names**: These become your table names +2. **Keep Column Names Simple**: Avoid spaces and special characters in + Field columns +3. **Filter Appropriately**: Use filters to limit data and improve + performance +4. **Test with Small Data**: Use date filters (via command-line) when + testing +5. **Document Your Queries**: Add comments in unused columns to explain + complex logic + + +JSON Query Format +----------------- + +JSON queries provide a more direct representation of +[MiniLinq](minilinq-reference.md) queries. They offer more flexibility +but are less user-friendly than Excel. + +### Structure + +A JSON query is a MiniLinq expression serialized as JSON. See the +[MiniLinq Reference](minilinq-reference.md) for complete syntax. + +### Converting Excel to JSON + +The best way to understand JSON queries is to create an Excel query and +convert it: + +```shell +commcare-export --query my-query.xlsx --dump-query +``` + +This will output the compiled MiniLinq JSON without executing the query. + +### Example JSON Query + +Here's a simple JSON query equivalent to the Excel example above: + +```json +{ + "Emit": { + "table": "patient_registrations", + "headings": [ + {"Lit": "Patient ID"}, + {"Lit": "Patient Name"}, + {"Lit": "Registration Date"}, + {"Lit": "Visit Type"} + ], + "source": { + "Map": { + "source": { + "Apply": { + "fn": {"Ref": "api_data"}, + "args": [ + {"Lit": "form"}, + {"Lit": { + "filter": { + "term": { + "xmlns": "http://openrosa.org/.../registration" + } + } + }} + ] + } + }, + "body": { + "List": [ + {"Ref": "form.patient_id"}, + {"Ref": "form.patient_name"}, + {"Ref": "received_on"}, + {"Ref": "form.visit_type"} + ] + } + } + } + } +} +``` + +### When to Use JSON + +Use JSON queries when you need: + +- Programmatic query generation +- Complex transformations not expressible in Excel +- Custom filtering logic +- Dynamic queries based on runtime conditions +- Version control friendly format (though Excel works too) + +### When to Use Excel + +Use Excel queries when you want: + +- User-friendly query creation +- Visual organization of multiple tables +- Quick prototyping and iteration +- Stable format across library versions (recommended) +- Easy sharing with non-technical users + + +Examples +-------- + +The `examples/` directory contains sample queries in both formats: + +**Excel Examples:** +- `examples/demo-registrations.xlsx` +- `examples/demo-pregnancy-cases.xlsx` +- `examples/demo-pregnancy-cases-with-forms.xlsx` +- `examples/demo-deliveries.xlsx` +- `examples/generic-form-metadata.xlsx` + +**JSON Examples:** +- `examples/demo-registrations.json` +- `examples/demo-pregnancy-cases.json` +- `examples/demo-pregnancy-cases-with-forms.json` +- `examples/demo-deliveries.json` + +All examples are based on the CommCare Demo App available on the +CommCare HQ Exchange. + + +Troubleshooting +--------------- + +### Issue: No data returned + +**Solutions:** +- Verify your Filter Value matches exactly (case-sensitive) +- Check that data exists in the date range you're querying +- Use `--output-format markdown` to see if any data is being retrieved +- Test without filters first to ensure the data source is correct + +### Issue: Wrong data in columns + +**Solutions:** +- Verify Source Field JSON paths are correct +- Use CommCare HQ's Export Tool to see raw data structure +- Check for typos in field names (they're case-sensitive) +- Test with `--dump-query` to see the compiled query + +### Issue: Excel workbook not recognized + +**Solutions:** +- Ensure file has `.xlsx` extension (not `.xls`) +- Verify file is not corrupted +- Check that all required columns are present +- Make sure Data Source is set to `form` or `case` + + +See Also +-------- + +- [MiniLinq Reference](minilinq-reference.md) - Query language + documentation +- [Output Formats](output-formats.md) - Available output formats +- [Python Library Usage](library-usage.md) - Using queries from Python + code +- [Examples Directory](../examples/) - Sample queries diff --git a/docs/scheduling.md b/docs/scheduling.md new file mode 100644 index 0000000..73ba531 --- /dev/null +++ b/docs/scheduling.md @@ -0,0 +1,118 @@ +Scheduling DET Runs +=================== + +*Part of [Technical Documentation](index.md)* + +Scheduling the DET (Data Export Tool) to run at regular intervals is a +useful tactic to keep your database up to date with CommCare HQ. + +For detailed instructions and best practices, see the +[User Documentation on Scheduling](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/CommCare+Data+Export+Tool+DET). + + +Quick Reference +--------------- + +### Windows + +On Windows systems, use the +[Task Scheduler](https://sqlbackupandftp.com/blog/how-to-schedule-a-script-via-windows-task-scheduler/) +to run scheduled scripts. + +**Example script:** `examples/scheduled_run_windows.bat` + +**Setup steps:** +1. Copy `examples/scheduled_run_windows.bat` to a desired location +2. Edit the file with your project details and credentials +3. Follow the + [Task Scheduler guide](https://sqlbackupandftp.com/blog/how-to-schedule-a-script-via-windows-task-scheduler/) + to create a scheduled task + +### Linux/Mac + +On Linux and Mac systems, use +[cron](https://www.techtarget.com/searchdatacenter/definition/crontab) +to create scheduled jobs. + +**Example script:** `examples/scheduled_run_linux.sh` + +**Setup steps:** + +1. Copy the example script to your home directory: + ```shell + cp ./examples/scheduled_run_linux.sh ~/scheduled_run_linux.sh + ``` + +2. Edit the file with your details: + ```shell + nano ~/scheduled_run_linux.sh + ``` + +3. Create a cron job: + ```shell + crontab -e + ``` + +4. Add an entry (example runs at top of every 12th hour): + ``` + 0 12 * * * bash ~/scheduled_run_linux.sh + ``` + +**Cron schedule tool:** Use [crontab.guru](https://crontab.guru/) to +generate and interpret cron schedules. + + +Best Practices +-------------- + +1. **Use API keys** instead of passwords in scheduled scripts +2. **Store credentials securely** - use environment variables or secure + credential storage +3. **Use SQL output format** for scheduled exports to leverage + checkpoints +4. **Monitor logs** - use `--log-dir` to specify a log directory for + troubleshooting +5. **Test manually first** before scheduling +6. **Start with longer intervals** (e.g., daily) and decrease if needed +7. **Handle failures gracefully** - checkpoints will resume from last + success + + +Checkpoint Benefits +------------------- + +When using SQL output format, checkpoints provide: + +- **Automatic incremental updates** - Only new/modified data is exported +- **Resume after failures** - If an export fails, the next run continues + from the last successful point +- **Faster execution** - Less data to process on each run +- **Reduced API load** - Fewer requests to CommCare HQ + + +Example Scheduled Command +------------------------- + +```shell +# Export forms incrementally to PostgreSQL database +commcare-export \ + --commcare-hq https://www.commcarehq.org \ + --username user@example.com \ + --api-key YOUR_API_KEY \ + --project myproject \ + --query /path/to/query.xlsx \ + --output-format sql \ + --output postgresql://user:pass@localhost/mydb \ + --log-dir /path/to/logs +``` + + +See Also +-------- + +- [Database Integration](database-integration.md) - SQL output and + checkpoints +- [User Documentation](https://dimagi.atlassian.net/wiki/spaces/commcarepublic/pages/2143955952/) - + Complete scheduling guide +- [Example Scripts](../examples/) - Template scripts for Windows and + Linux diff --git a/docs/testing.md b/docs/testing.md new file mode 100644 index 0000000..c210630 --- /dev/null +++ b/docs/testing.md @@ -0,0 +1,251 @@ +Testing Guide +============= + +*Part of [Technical Documentation](index.md)* + +This guide covers running tests for CommCare Export, including setup for +database tests. + + +Running Tests +------------- + +### Full Test Suite + +To run the entire test suite (requires database environment variables to +be set): + +```shell +pytest +``` + +### Individual Tests + +To run an individual test class or method: + +```shell +# Run a specific test class +pytest -k "TestExcelQuery" + +# Run a specific test method +pytest -k "test_get_queries_from_excel" +``` + +### Excluding Database Tests + +To exclude the database tests: + +```shell +pytest -m "not dbtest" +``` + + +Database Tests +-------------- + +CommCare Export supports testing against PostgreSQL, MySQL, and MS SQL +Server. + +### Running Database-Specific Tests + +To run tests against specific databases using test marks: + +```shell +# PostgreSQL tests only +pytest -m postgres + +# MySQL tests only +pytest -m mysql + +# MS SQL Server tests only +pytest -m mssql + +# Multiple databases +pytest -m "postgres or mysql" +``` + + +Database Setup with Docker +-------------------------- + +Use Docker and docker-compose to start database services for tests. + +### Starting Services + +1. Start the database services: + ```shell + docker-compose up -d + ``` + +2. Wait for services to be healthy: + ```shell + docker-compose ps + ``` + + Wait until all services show "healthy" status. + +3. Run your tests (default environment variables work automatically): + ```shell + pytest + ``` + +### Database Connection Defaults + +The default environment variables in `tests/conftest.py` work +automatically with Docker Compose: + +- **PostgreSQL**: `postgresql://postgres@localhost/` +- **MySQL**: `mysql+pymysql://travis@/` +- **MS SQL Server**: `mssql+pyodbc://SA:Password-123@localhost/` + +### Custom Database URLs + +If needed, you can override with environment variables: + +```shell +export POSTGRES_URL='postgresql://postgres@localhost/' +export MYSQL_URL='mysql+pymysql://root@localhost/' +export MSSQL_URL='mssql+pyodbc://SA:Password-123@localhost/' +``` + +### Stopping Services + +Stop the services when done: + +```shell +docker-compose down +``` + +To also remove the data volumes: + +```shell +docker-compose down -v +``` + + +ODBC Driver Installation +------------------------ + +For MS SQL Server tests, you'll need the ODBC Driver for SQL Server +installed on your host system for the `pyodbc` connection to work. + +### Debian/Ubuntu + +From [learn.microsoft.com](https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server): + +```shell +# Download the package to configure the Microsoft repo +curl -sSL -O https://packages.microsoft.com/config/debian/$(grep VERSION_ID /etc/os-release | cut -d '"' -f 2 | cut -d '.' -f 1)/packages-microsoft-prod.deb + +# Install the package +sudo dpkg -i packages-microsoft-prod.deb + +# Delete the file +rm packages-microsoft-prod.deb + +# Update and install +sudo apt-get update +sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 + +# Verify installation +odbcinst -q -d +``` + +### macOS + +```shell +# Install Homebrew if not already installed +/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)" + +# Add Microsoft tap +brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release + +# Update and install +brew update +HOMEBREW_ACCEPT_EULA=Y brew install msodbcsql18 +``` + + +Writing Tests +------------- + +### Basic Test Structure + +```python +import pytest +from commcare_export.minilinq import Literal, Reference + +def test_my_feature(): + # Arrange + query = Literal("test") + + # Act + result = query.eval({}) + + # Assert + assert result == "test" +``` + +### Database Tests + +Mark tests that require a database: + +```python +@pytest.mark.dbtest +@pytest.mark.postgres +def test_postgres_export(postgres_db): + # Test code here + pass +``` + + +Continuous Integration +---------------------- + +### GitHub Actions + +Tests run automatically on: +- Every push to any branch +- Every pull request +- Multiple Python versions (3.9, 3.10, 3.11, 3.12, 3.13) +- Multiple platforms (Ubuntu, macOS, Windows) + + +Troubleshooting +--------------- + +### Database Connection Failures + +**Problem:** Can't connect to test databases + +**Solutions:** +- Ensure Docker services are running: `docker-compose ps` +- Check database logs: `docker-compose logs postgres` +- Verify ports aren't in use: `lsof -i :5432` (PostgreSQL) +- Restart services: `docker-compose restart` + +### ODBC Driver Issues + +**Problem:** pyodbc can't find SQL Server driver + +**Solutions:** +- Verify driver is installed: `odbcinst -q -d` +- Install correct driver version (see ODBC installation above) +- Check connection string format matches driver name + +### Database Server Unavailable + +**Problem:** A particular database server is not available + +**Solutions:** +- Skip tests for a particular database server: `pytest -m "not mssql"` + + +See Also +-------- + +- [Development Guide](development.md) - Setting up development + environment +- [CONTRIBUTING.md](../CONTRIBUTING.md) - Contributing guidelines +- [Database Integration](database-integration.md) - Database usage + documentation diff --git a/docs/user-location-data.md b/docs/user-location-data.md new file mode 100644 index 0000000..b6da596 --- /dev/null +++ b/docs/user-location-data.md @@ -0,0 +1,352 @@ +User and Location Data +====================== + +*Part of [Technical Documentation](index.md)* + +CommCare Export can export user and location data from your CommCare +project, which can be joined with form and case data for organizational +reporting. + + +Overview +-------- + +The `--users` and `--locations` options export data from a CommCare +project that can be joined with form and case data. The +`--with-organization` option does all of that and adds a field to Excel +query specifications to be joined on. + + +Exporting Users +--------------- + +### Basic Usage + +```shell +commcare-export --project myproject \ + --users \ + --output-format sql \ + --output postgresql://user:pass@localhost/mydb +``` + +### User Table Schema + +Specifying the `--users` option or `--with-organization` option will +export an additional table named `commcare_users` containing the +following columns: + +| Column | Type | Note | +|----------------------------------|------|-------------------------------------| +| id | Text | Primary key | +| default_phone_number | Text | | +| email | Text | | +| first_name | Text | | +| groups | Text | | +| last_name | Text | | +| phone_numbers | Text | | +| resource_uri | Text | | +| commcare_location_id | Text | Foreign key to `commcare_locations` | +| commcare_location_ids | Text | | +| commcare_primary_case_sharing_id | Text | | +| commcare_project | Text | | +| username | Text | | + +### Data Source + +The data in the `commcare_users` table comes from the +[List Mobile Workers API endpoint](https://confluence.dimagi.com/display/commcarepublic/List+Mobile+Workers). + + +Exporting Locations +------------------- + +### Basic Usage + +```shell +commcare-export --project myproject \ + --locations \ + --output-format sql \ + --output postgresql://user:pass@localhost/mydb +``` + +### Location Table Schema + +Specifying the `--locations` option or `--with-organization` options +will export an additional table named `commcare_locations` containing +the following columns: + +| Column | Type | Note | +|------------------------------|------|-----------------------------------------------| +| id | Text | | +| created_at | Date | | +| domain | Text | | +| external_id | Text | | +| last_modified | Date | | +| latitude | Text | | +| location_data | Text | | +| location_id | Text | Primary key | +| location_type | Text | | +| longitude | Text | | +| name | Text | | +| parent | Text | Resource URI of parent location | +| resource_uri | Text | | +| site_code | Text | | +| location_type_administrative | Text | | +| location_type_code | Text | | +| location_type_name | Text | | +| location_type_parent | Text | | +| *location level code* | Text | Column name depends on project's organization | +| *location level code* | Text | Column name depends on project's organization | + +### Organization Level Columns + +The last columns in the table exist if you have set up organization +levels for your projects. One column is created for each organization +level. The column name is derived from the Location Type that you +specified. The column value is the location_id of the containing +location at that level of your organization. + +Consider the example organization from the +[CommCare help page](https://confluence.dimagi.com/display/commcarepublic/Setting+up+Organization+Levels+and+Structure). +A piece of the `commcare_locations` table could look like this: + +| location_id | location_type_name | chw | supervisor | clinic | district | +|-------------|--------------------|--------|------------|--------|----------| +| 939fa8 | District | NULL | NULL | NULL | 939fa8 | +| c4cbef | Clinic | NULL | NULL | c4cbef | 939fa8 | +| a9ca40 | Supervisor | NULL | a9ca40 | c4cbef | 939fa8 | +| 4545b9 | CHW | 4545b9 | a9ca40 | c4cbef | 939fa8 | + +### Data Source + +The data in the `commcare_locations` table comes from the Location API +endpoint along with some additional columns from the Location Type API +endpoint. + + +Exporting with Organization Data +-------------------------------- + +The `--with-organization` option combines user, location, and form/case +exports, automatically adding a `commcare_userid` field for joining. + +### Basic Usage + +```shell +commcare-export --project myproject \ + --query forms.xlsx \ + --with-organization \ + --output-format sql \ + --output postgresql://user:pass@localhost/mydb +``` + +### What This Does + +1. Exports your form/case data as specified in the query +2. Automatically adds a `commcare_userid` field to each query table +3. Exports the `commcare_users` table +4. Exports the `commcare_locations` table + + +Joining Data +------------ + +In order to join form or case data to `commcare_users` and +`commcare_locations`, the exported forms and cases need to contain a +field identifying which user submitted them. The `--with-organization` +option automatically adds a field called `commcare_userid` to each +query in an Excel specification for this purpose. + +### Example: Forms by Clinic + +Using that field, you can use a SQL query with a join to report data +about any level of your organization. For example, to count the number +of forms submitted by all workers in each clinic: + +```sql +SELECT l.clinic, + COUNT(*) +FROM form_table t +LEFT JOIN (commcare_users u + LEFT JOIN commcare_locations l + ON u.commcare_location_id = l.location_id) +ON t.commcare_userid = u.id +GROUP BY l.clinic; +``` + +### Example: Forms by Location Type + +```sql +SELECT l.location_type_name, + COUNT(*) as form_count +FROM form_table t +LEFT JOIN commcare_users u ON t.commcare_userid = u.id +LEFT JOIN commcare_locations l ON u.commcare_location_id = l.location_id +GROUP BY l.location_type_name; +``` + +### Example: User Details with Forms + +```sql +SELECT u.username, + u.first_name, + u.last_name, + COUNT(t.form_id) as submissions +FROM form_table t +LEFT JOIN commcare_users u ON t.commcare_userid = u.id +GROUP BY u.username, u.first_name, u.last_name +ORDER BY submissions DESC; +``` + + +Reserved Table Names +-------------------- + +Note that the table names `commcare_users` and `commcare_locations` are +treated as reserved names and the export tool will produce an error if +given a query specification that writes to either of them. + + +Data Refresh Behavior +--------------------- + +The export tool will write all users to `commcare_users` and all +locations to `commcare_locations`, overwriting existing rows with +current data and adding rows for new users and locations. + +### Handling Removed Users/Locations + +If you want to remove obsolete users or locations from your tables, drop +them and the next export will leave only the current ones: + +```sql +-- Drop and refresh users table +DROP TABLE commcare_users; +# Run export again + +-- Drop and refresh locations table +DROP TABLE commcare_locations; +# Run export again +``` + +### Handling Organization Changes + +If you modify your organization to add or delete levels, you will change +the columns of the `commcare_locations` table and it is very likely you +will want to drop the table before exporting with the new +organization: + +```sql +DROP TABLE commcare_locations; +``` + +Then run your export again to recreate the table with the new structure. + + +Incremental Updates +------------------- + +When using SQL output format with checkpoints: + +- **Form/case data**: Incremental updates based on checkpoints +- **User data**: Full refresh on every run +- **Location data**: Full refresh on every run + +This ensures user and location data is always current, while form/case +exports remain efficient. + + +Use Cases +--------- + +### Organizational Reporting + +Track performance across your organization hierarchy: + +```sql +-- Forms per district per month +SELECT l.district, + DATE_TRUNC('month', t.received_on) as month, + COUNT(*) as forms +FROM form_table t +LEFT JOIN commcare_users u ON t.commcare_userid = u.id +LEFT JOIN commcare_locations l ON u.commcare_location_id = l.location_id +GROUP BY l.district, DATE_TRUNC('month', t.received_on) +ORDER BY month, l.district; +``` + +### User Performance + +Identify top performers and those needing support: + +```sql +-- Forms per user with location context +SELECT u.username, + u.first_name || ' ' || u.last_name as full_name, + l.location_type_name, + l.name as location_name, + COUNT(*) as forms_submitted +FROM form_table t +LEFT JOIN commcare_users u ON t.commcare_userid = u.id +LEFT JOIN commcare_locations l ON u.commcare_location_id = l.location_id +WHERE t.received_on >= CURRENT_DATE - INTERVAL '30 days' +GROUP BY u.username, full_name, l.location_type_name, l.name +ORDER BY forms_submitted DESC; +``` + +### Geographic Analysis + +When locations have latitude/longitude: + +```sql +-- Forms by location with coordinates +SELECT l.name, + l.latitude, + l.longitude, + COUNT(*) as forms +FROM form_table t +LEFT JOIN commcare_users u ON t.commcare_userid = u.id +LEFT JOIN commcare_locations l ON u.commcare_location_id = l.location_id +WHERE l.latitude IS NOT NULL +GROUP BY l.name, l.latitude, l.longitude; +``` + + +Troubleshooting +--------------- + +### Missing commcare_userid Field + +**Problem:** `commcare_userid` column doesn't exist in form/case tables + +**Solution:** Use `--with-organization` flag, not just `--users` and +`--locations` + +### NULL Values in Joins + +**Problem:** Many NULL values when joining to users or locations + +**Solutions:** +- Verify forms were submitted by users (not admin forms) +- Check that user IDs in forms match user IDs in commcare_users +- Ensure users table was exported from the same project + +### Location Hierarchy Not Showing + +**Problem:** Location level columns are NULL or missing + +**Solutions:** +- Verify project has organization levels configured in CommCare HQ +- Drop and recreate locations table if organization changed +- Check that users are assigned to locations in CommCare HQ + + +See Also +-------- + +- [Database Integration](database-integration.md) - SQL database setup + and connection +- [Query Formats](query-formats.md) - Creating queries that will include + commcare_userid +- [CommCare HQ Documentation](https://confluence.dimagi.com/display/commcarepublic/Setting+up+Organization+Levels+and+Structure) - + Setting up organization levels