Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions dags/kids_first/dataservice_studies.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
from airflow.sdk import Variable

from cosmos import (
DbtDag,
ProjectConfig,
ProfileConfig,
ExecutionConfig,
RenderConfig,
)
from cosmos.profiles import PostgresUserPasswordProfileMapping

profile_config = ProfileConfig(
# make sure target_name and profile_mapping align
profile_name=Variable.get("DBT_PROFILE_NAME"),
target_name="prd",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="postgres_prd_svc",
profile_args={"schema": "prd"},
),
)

example_study_dag = DbtDag(
project_config=ProjectConfig(
Variable.get("DBT_PROJECT_DIR"),
install_dbt_deps=True,
),
profile_config=profile_config,
execution_config=ExecutionConfig(
dbt_executable_path=Variable.get("DBT_EXECUTABLE_PATH"),
),
render_config=RenderConfig(select=["config.meta.study:kf_dataservice_study"]),
# normal dag parameters
schedule="@daily",
dag_id="kf_dataservice_studies",
tags=["POC", "Kids First"],
)
68 changes: 64 additions & 4 deletions dbt_project/models/_metadata_description_files/docs_fields.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,32 @@ The dewrangle generated id for a family. This id is a lower-cased version of the
Denotes type of family using a set of enums, such as proband only or trio. Not currently populated in Kids First dataservie, but is calculcated by the portal etl and displayed on the Kids First portal.
{% enddocs %}

### family relationship fields

{% docs participant1_id %}
The kf id of one person in the family relationship.
{% enddocs %}

{% docs participant2_id %}
The kf id of the second person in the family relationship.
{% enddocs %}

{% docs participant1_to_participant2_relation %}
A descriptor that indicates person 1's genetic relationship to person 2. Is typically mother, father, child, or sibling.
{% enddocs %}

{% docs participant2_to_participant1_relation %}
A descriptor that indicates person 2's genetic relationship to person 1. Is typically null, mother, father, son/daughter, brother/sister.
{% enddocs %}

{% docs relationship_id %}
The Kids First assigned kf id that represents a genetic relationship between two participants. In the format, "FR_XXXXXXXX"
{% enddocs %}

{% docs source_text_notes %}
Additional text notes from source describing the relationship. Not typically populated.
{% enddocs %}

### genomic_file fields

{% docs dewrangle_genomic_file_id %}
Expand Down Expand Up @@ -204,6 +230,16 @@ The dewrangle generated id for an investigator. This id is a lower-cased version
The name of the investigator's institution.
{% enddocs %}

### outcome fields

{% docs vital_status %}
The patient's reported state of being alive or deceased.
{% enddocs %}

{% docs disease_related %}
A yes or no field indicating whether a patient's deceased vital status is a result of the disease.
{% enddocs %}

### participant fields

{% docs alias_group_id %}
Expand Down Expand Up @@ -260,6 +296,34 @@ Denotes whether a phenotype is negative or positive
The ID of the term from Systematized Nomenclature of Medicine --Clinical Terms which encodes clinical terminology. Not actively populated.
{% enddocs %}

### sample fields

{% docs sample_event_key %}
Identifier for event when sample was first drawn
{% enddocs %}

{% docs tissue_type %}
Description of the kind of tissue collected if its a tissue type sample.
{% enddocs %}

{% docs sample_type %}
The kind of material of the sample.
{% enddocs %}

{% docs anatomical_location %}
The anatomical location of collection.
{% enddocs %}

{% docs external_collection_id %}
Identifier for the collection event
{% enddocs %}

### sequencing center fields

{% docs sequencing_center_name %}
The official name of the sequencing center used to generate source genomic file outputs.
{% enddocs %}

### sequencing experiment fields

{% docs dewrangle_sequencing_experiment_id %}
Expand Down Expand Up @@ -634,10 +698,6 @@ Sex of pariticipant
Age of participant when phenotype was asserted
{% enddocs %}

{% docs vital_status %}
Vital status of participant
{% enddocs %}


### Broad Manifest

Expand Down
164 changes: 106 additions & 58 deletions dbt_project/models/_metadata_description_files/docs_tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,140 +2,188 @@

## Kids First Dataservice Tables - Source Stage

{% docs src_bsgf %}
{% docs kf_ds_src_bsgf %}
Kids First Dataservivce source table for linking specimens to genomic files. One file may be linked to many specimens.
{% enddocs %}

{% docs src_diagnosis %}
{% docs kf_ds_src_diagnosis %}
Kids First Dataservice source table for harmonized conditions curated to MONDO codes at the patient level. All conditions in this table are implied to be observed in patients. Each row represents one condition per patient.
{% enddocs %}

{% docs src_family %}
Kids First Dataservice source table that holds family ids for each participant. This table can be joined to src_participants to obtain participant to family id mappings.
{% docs kf_ds_src_family %}
Kids First Dataservice source table that holds family ids for each participant. This table can be joined to kf_ds_src_participants to obtain participant to family id mappings.
{% enddocs %}

{% docs src_genomic_files %}
Kids First Dataservice source table that holds raw and harmonized genomic file outputs. This table provides file and bioinformatic workflow metadata for each file. Must be joined to src_bsgf to obtain specimen to file mappings.
{% docs kf_ds_src_family_relationship %}
Kids First Dataservice source table that holds family relationships for each participant. Usually only reports relationships for duos, trios, or trios+.
{% enddocs %}

{% docs src_investigator %}
{% docs kf_ds_src_genomic_file %}
Kids First Dataservice source table that holds raw and harmonized genomic file outputs. This table provides file and bioinformatic workflow metadata for each file. Must be joined to kf_ds_src_bsgf to obtain specimen to file mappings.
{% enddocs %}

{% docs kf_ds_src_investigator %}
Kids First Dataservice source table for investigator information. Only contains minimal contact information for the Principle Investigator of a study. One investigator may be associated to multiple study ids.
{% enddocs %}

{% docs src_participant %}
Kids First Dataservice source table for participant demographic information. Also contains information regarding a participant's affected status. Links each participant to an assigned family id from src_family and an assigned study id from src_study.
{% docs kf_ds_src_outcome %}
Kids First Dataservice source table for outcome information. Reports the vital status of patients and whether or not death was disease related.
{% enddocs %}

{% docs kf_ds_src_participant %}
Kids First Dataservice source table for participant demographic information. Also contains information regarding a participant's affected status. Links each participant to an assigned family id from kf_ds_src_family and an assigned study id from kf_ds_src_study.
{% enddocs %}

{% docs src_phenotype %}
{% docs kf_ds_src_phenotype %}
Kids First Dataservice source table for harmonized conditions curated to HPO codes at the patient level. Conditions can be observed or not observed in a patient. Each row represents one condition and observation status per patient.
{% enddocs %}

{% docs src_segf %}
{% docs kf_ds_src_sample %}
Kids First Dataservice source table for samples.
{% enddocs %}

{% docs kf_ds_src_segf %}
Kids First Dataservice source table for linking sequencing experiments to genomic files. Multiple files can be linked to one sequencing experiment.
{% enddocs %}

{% docs src_sequencing_experiments %}
{% docs kf_ds_src_sequencing_center %}
Kids First Dataserivce source table for sequencing center information.
{% enddocs %}

{% docs kf_ds_src_sequencing_experiment %}
Kids First Dataservice source table for sequencing experiments that holds sequencing metadata.
{% enddocs %}

{% docs src_specimens %}
{% docs kf_ds_src_biospecimen %}
Kids First Dataservice source table for biospecimen information. Contains specimen collection information and specimen material information, as well as VBR specific entities to support CBTN VBR fields. Each row represents one aliquot per participant.
{% enddocs %}

{% docs src_study %}
{% docs kf_ds_src_study %}
Kids First Dataservice source table for study metadata. Contains full and short study names, study codes, study program, and dbgap phs numbers.
{% enddocs %}

## Kids First Dataservice Tables - Int Stage

{% docs int_bsgf %}
Intermediate table for src_bsgf. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% docs kf_ds_int_bsgf %}
Intermediate table for kf_ds_src_bsgf. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs kf_ds_int_diagnosis %}
Intermediate table for kf_ds_src_diagnosis. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs kf_ds_int_family %}
Intermediate table for kf_ds_src_family. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs int_diagnosis %}
Intermediate table for src_diagnosis. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% docs kf_ds_int_family_relationship %}
Intermediate table for kf_ds_src_family_relationship. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs int_family %}
Intermediate table for src_family. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% docs kf_ds_int_genomic_file %}
Intermediate table for kf_ds_src_genomic_files. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs int_genomic_files %}
Intermediate table for src_genomic_files. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% docs kf_ds_int_investigator %}
Intermediate table for kf_ds_src_investigator. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs int_investigator %}
Intermediate table for src_investigator. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% docs kf_ds_int_outcome %}
Intermediate table for kf_ds_src_outcome. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs int_participant %}
Intermediate table for src_participant. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% docs kf_ds_int_participant %}
Intermediate table for kf_ds_src_participant. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs int_phenotype %}
Intermediate table for src_phenotype. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% docs kf_ds_int_phenotype %}
Intermediate table for kf_ds_src_phenotype. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs int_segf %}
Intermediate table for src_segf. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% docs kf_ds_int_sample %}
Intermediate table for kf_ds_src_sample. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs int_sequencing_experiment %}
Intermediate table for src_sequencing_experiments. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% docs kf_ds_int_sequencing_center %}
Intermediate table for kf_ds_src_sequencing_center. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs int_specimens %}
Intermediate table for src_specimens. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% docs kf_ds_int_segf %}
Intermediate table for kf_ds_src_segf. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs int_study %}
Intermediate table for src_study. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% docs kf_ds_int_sequencing_experiment %}
Intermediate table for kf_ds_src_sequencing_experiments. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs kf_ds_int_biospecimen %}
Intermediate table for kf_ds_src_biospecimen. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

{% docs kf_ds_int_study %}
Intermediate table for kf_ds_src_study. Transforms dataservice entities for better usability and clarity. Excludes certain entites that are not needed.
{% enddocs %}

## Kids First Dataservice Tables - Stable Stage

{% docs stable_bsgf %}
Stable table for int_bsgf. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% docs kf_ds_stable_bsgf %}
Stable table for kf_ds_int_bsgf. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs kf_ds_stable_diagnosis %}
Stable table for kf_ds_int_diagnosis. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs kf_ds_stable_family %}
Stable table for kf_ds_int_family. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs kf_ds_stable_family_relationship %}
Stable table for kf_ds_src_family_relationship. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs kf_ds_stable_genomic_file %}
Stable table for kf_ds_int_families. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs stable_diagnosis %}
Stable table for int_diagnosis. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% docs kf_ds_stable_investigator %}
Stable table for kf_ds_int_investigator. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs stable_family %}
Stable table for int_family. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% docs kf_ds_stable_outcome %}
Stable table for kf_ds_int_outcome. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs stable_genomic_file %}
Stable table for int_families. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% docs kf_ds_stable_participant %}
Stable table for kf_ds_int_participant. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs stable_investigator %}
Stable table for int_investigator. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% docs kf_ds_stable_phenotype %}
Stable table for kf_ds_int_phenotype. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs stable_participant %}
Stable table for int_participant. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% docs kf_ds_stable_sample %}
Stable table for kf_ds_int_sample. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs stable_phenotype %}
Stable table for int_phenotype. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% docs kf_ds_stable_sequencing_center %}
Stable table for kf_ds_int_sequencing_center. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs stable_segf %}
Stable table for int_segf. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% docs kf_ds_stable_segf %}
Stable table for kf_ds_int_segf. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs stable_sequencing_experiment %}
Stable table for int_sequencing_experiment. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% docs kf_ds_stable_sequencing_experiment %}
Stable table for kf_ds_int_sequencing_experiment. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs stable_specimens %}
Stable table for int_specimens. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% docs kf_ds_stable_biospecimen %}
Stable table for kf_ds_int_biospecimen. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}

{% docs stable_study %}
Stable table for int_study. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% docs kf_ds_stable_study %}
Stable table for kf_ds_int_study. Finalized mapping of transformed dataservice entities that are ready to be brought into the access layer.
{% enddocs %}


Expand Down
Loading