flatteningRequests

A place to communicate about the team flattened data needs.

Use the issues tools to make and manage requests.

The flattening workflow:

Note: This flow diagram shows the general flow of information, but the exact function names and file names may be simplified for the purposes of the diagram.

flowchart LR
    subgraph BQ [BigQuery]
        SRC[[Source Table]]:::table
        QUERY(Scheduled Query):::table
        FLAT[[Flattened Table]]:::table
    end
    class BQ subgraphTitle;

    SCH[/schema.json/]:::file
    DEV[/dev.sql/]:::file
    STG[/stg.sql/]:::file
    PROD[/prod.sql/]:::file

    subgraph LOC ["flatten.R"]
        FETCH("fetch_schema()"):::function
        SCH_CSV[/"schema.csv"/]:::file
        FILT("sort_variables()"):::function
        ARR[/arrays.json/]:::file
        VAR[/variables.csv/]:::file
        GEN("generate_queries()"):::function
    end
    class LOC subgraphTitle;

    SRC --- |API| SCH:::link
    SCH --> FETCH:::link
    FETCH --> SCH_CSV:::link
    SCH_CSV --> FILT:::link
    FILT --> ARR:::link & VAR:::link
    ARR & VAR --> GEN:::link
    GEN --> DEV:::link & STG:::link & PROD:::link
    DEV & STG & PROD --> |API| QUERY:::link
    QUERY --> FLAT:::link

    classDef table link stroke:#000,stroke-width:1.5px;
    classDef schema fill:#ff9,stroke:#333,stroke-width:2px;
    classDef function fill:#ffc,stroke:#333,stroke-width:2px;
    classDef file fill:#cfc,stroke:#333,stroke-width:2px;
    classDef subgraphTitle font-size:24px;

flatten.R is an R function that simplifies the process of flattening the Connect4Cancer source tables:

Retrieve the schema of the source table using the BigQuery API.
Convert the schema from JSON to CSV.
Sort the variables in the source table by their data structure: arrays and non-array variables and stores them in appropriate files.
Generate SQL queries for each environment.
Update the Scheduled Query which generates the flattened table daily.

flatten_batch.R is the main entry point for the user to call flatten() for a given set of tables.

Note: The user can ignore the bash/, python/ and js/ directories which store legacy code that could be referenced later. The core functions of this workflow are in teh r/ directory

Query configurations

The query_configs/ directory stores a metadata for each table to be flattened and is referenced by the flatten.Rmd notebook. To set up a new table for flattening, a new config folder needs to be setup for the new table with meta data about the table.

Here is an example for the PROMIS survey table:

query_configs/promis_v1/
├── dev
│   ├── FlatConnect.promis_v1_JP-dev.sql
│   ├── promis_v1-lists.json
│   ├── promis_v1-variables.csv
│   └── query-config.json
├── prod
│   ├── FlatConnect.promis_v1_JP-prod.sql
│   ├── promis_v1-lists.json
│   ├── promis_v1-variables.csv
│   └── query-config.json
└── stg
    ├── FlatConnect.promis_v1_JP-stg.sql
    ├── promis_v1-lists.json
    ├── promis_v1-variables.csv
    └── query-config.json

Each tier dev, stg and prod has a *.sql, *-lists.json, *-variables.csv, and a query-config.json file.

query-config.json: Specifies key information about the table, including the name of the source table, the name of the resulting SQL query, the name of target table and the
resource_name (containing the transfer_config_id) of the scheduled query to be updated. These bit of information must be updated manually.
*-lists.json: A JSON file key/value pairs are the names of "array variables" and an array of possible responses for these variables, respectively.
*-variables.csv: A CSV file listing the names of the variables for which a single value can be given.
*.sql: The resulting SQL query that is generated by this workflow, which is delivered to BigQuery via the BQ Transfer Config API.

Name		Name	Last commit message	Last commit date
Latest commit History 281 Commits
bash		bash
imgs		imgs
js		js
python		python
query_configs		query_configs
r		r
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
flatten.R		flatten.R
flatten_batch.R		flatten_batch.R
module1_v2_stage_raw.parquet		module1_v2_stage_raw.parquet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

flatteningRequests

The flattening workflow:

Query configurations

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

Analyticsphere/flatteningRequests

Folders and files

Latest commit

History

Repository files navigation

flatteningRequests

The flattening workflow:

Query configurations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages