diff --git a/learn/pipelines/index.qmd b/learn/pipelines/index.qmd new file mode 100644 index 00000000..e887ad81 --- /dev/null +++ b/learn/pipelines/index.qmd @@ -0,0 +1,67 @@ +--- +title: "Outbreak analytics Pipelines" +author: + - name: "Andree Valle-Campos" + orcid: "0000-0002-7779-481X" + - name: " Carmen Tamayo Cuartero" + orcid: "0000-0003-4184-2864" + - name: "Anna Carnegie" + orcid: "0000-0002-6385-7795" + - name: "Sebastian Funk" + orcid: "0000-0002-2842-3406" + - name: "Adam Kucharski" + orcid: "0000-0001-8814-9421" + - name: "Rosalind M Eggo" + orcid: "0000-0002-0362-6717" +date: last-modified +categories: [outbreak analytics, pipelines, tasks, packages] +bibliography: pipelines.bib +image: "sigmund-4CNNH2KEjhc-unsplash.jpg" +format: + html: + toc: true +--- + +## The Pipeline approach + +We can solve Outbreak Analytics *tasks* connecting multiple packages in *pipelines*. + +## Outbreak analytics + +*Outbreak analytics* is a specialized field within data science that focuses on the technological and methodological aspects of the outbreak data pipeline. This includes the systematic collection, analysis, modeling, and reporting of data to inform outbreak response [@polonsky2019outbreak]. + +### Tasks + +We can view Outbreak analytics as a set of related data analysis __Tasks__. In @fig-tasks we represent this in a directed graph, where each *node* is a Task and each *directed edge* represents the flow of input and output data. Tasks are connected similarly to the [tidyverse](https://r4ds.hadley.nz/whole-game.html) diagram for exploratory data analysis. + +![Task for outbreak analytics](task_pipeline-minimal.svg){#fig-tasks fig-alt="Directed graph where tasks are nodes and data flows are directed edges like arrows. One task connect with multiple other tasks."} + +In @fig-tasks-detailed we have a summarized detail of data inputs and outputs between Tasks. For example, for the first task on the left called *Read case data* we need a data input called *Case data* to get two data outputs called *Linelist* and *Contact data*. + +![Detailed task paths](task_pipeline-detailed.svg){#fig-tasks-detailed} + +One Task can contain different methods and packages for similar data inputs and outputs. + +### Pipelines + +We defined a __Pipeline__ as a set of connected Tasks required to obtain an informative outcome for decision-making purposes. + +For example, to quantify the time-varying reproduction number we can follow the *Transmissibility pipeline* (@fig-pipe-01). First, we *Read case data* to generate a linelist. Then, we *Describe case data*, using the linelist as inputs to generate delay distributions and epicurves. Finally, we use both outputs as inputs to *Quantify transmission* and generate an estimate of transmission. This output allows us to determine the intensity of interventions needed to achieve epidemic control [@cori2017key]. + +![Transmissibility pipeline](task_pipeline-pipe_01.svg){#fig-pipe-01} + +Similarly, to simulate the final size of an epidemic we can follow the *Scenarios pipeline* (@fig-pipe-02). First, we *Read population data* to obtain its demographic distribution and social contact matrix. Next, we collect the estimate of transmission data output, ideally from the *Transmissibility pipeline*. Finally, we use these three data as inputs to *Simulate transmission scenarios* and determine the proportion of the population infected. This output allows us to assess the long-term impact of the outbreak and evaluate intervention choices [@cori2017key]. + +![Scenarios pipelines](task_pipeline-pipe_02.svg){#fig-pipe-02} + +## How we use the Pipelines? + +We use the Pipeline approach to connect multiple packages in the design of: + +- Reproducible report templates per Pipeline stored in the [`{episoap}`](https://epiverse-trace.github.io/episoap/) package, +- Code scripts stored in the [`{howto}`](https://epiverse-trace.github.io/howto/) repository, and +- [New](https://github.com/orgs/epiverse-trace/discussions/87) packages in relation to other upstream packages and tasks. + +## Attributions + +- The image of this feed is from [Unsplash](https://unsplash.com/photos/4CNNH2KEjhc), provided by [Sigmund](https://unsplash.com/@sigmund), free to use under the [Unsplash License](https://unsplash.com/license). diff --git a/learn/pipelines/pipelines.bib b/learn/pipelines/pipelines.bib new file mode 100644 index 00000000..c4038387 --- /dev/null +++ b/learn/pipelines/pipelines.bib @@ -0,0 +1,26 @@ +@article{cori2017key, + doi = {10.1098/rstb.2016.0371}, + url = {https://doi.org/10.1098/rstb.2016.0371}, + year = {2017}, + month = apr, + publisher = {The Royal Society}, + volume = {372}, + number = {1721}, + pages = {20160371}, + author = {Anne Cori and Christl A. Donnelly and Ilaria Dorigatti and Neil M. Ferguson and Christophe Fraser and Tini Garske and Thibaut Jombart and Gemma Nedjati-Gilani and Pierre Nouvellet and Steven Riley and Maria D. Van Kerkhove and Harriet L. Mills and Isobel M. Blake}, + title = {Key data for outbreak evaluation: building on the Ebola experience}, + journal = {Philosophical Transactions of the Royal Society B: Biological Sciences} +} + +@article{polonsky2019outbreak, + doi = {10.1098/rstb.2018.0276}, + url = {https://doi.org/10.1098/rstb.2018.0276}, + title={Outbreak analytics: a developing data science for informing the response to emerging pathogens}, + author={Polonsky, Jonathan A and Baidjoe, Amrish and Kamvar, Zhian N and Cori, Anne and Durski, Kara and Edmunds, W John and Eggo, Rosalind M and Funk, Sebastian and Kaiser, Laurent and Keating, Patrick and others}, + journal={Philosophical Transactions of the Royal Society B}, + volume={374}, + number={1776}, + pages={20180276}, + year={2019}, + publisher={The Royal Society} +} diff --git a/learn/pipelines/sigmund-4CNNH2KEjhc-unsplash.jpg b/learn/pipelines/sigmund-4CNNH2KEjhc-unsplash.jpg new file mode 100644 index 00000000..740ff711 Binary files /dev/null and b/learn/pipelines/sigmund-4CNNH2KEjhc-unsplash.jpg differ diff --git a/learn/pipelines/task_pipeline-detailed.svg b/learn/pipelines/task_pipeline-detailed.svg new file mode 100644 index 00000000..08247adf --- /dev/null +++ b/learn/pipelines/task_pipeline-detailed.svg @@ -0,0 +1,4 @@ + + + +
Read
case data
Read...
Describe
case data
Describe...
Reconstruct
transmission
chain
Reconstruct...
Quantify
transmission
Quantify...
Read
population
data
Read...
Compare
intervention
scenarios
Compare...
Linelist
Linelist
Contact
data
Contactdata
Delay
distributions
Delay...
Epicurves
Epicurves
Transmission
chains
Transmission...
Estimates of transmission
Estimates of...
Social
contact
matrix
Social...
Compare
economic
impacts
Compare...
Cost-effectiveness
Cost-effectiv...
Create
short-term
forecast
Create...
Estimate
severity
Estimate...
Demographic
distribution
Demographicdi...
Simulate
transmission
scenarios
Simulate...
Case
fatality
ratio
Case...
Read
intervention
data
Read...
Cases
Cases
Hospital
admissions
Hospital...
Deaths
Deaths
Case
data
Case...
Population
data
Population...
Intervention
data
Intervention...
Delay
distributions
Delay...
Early tasks
Early tasks
Middle tasks
Middle tasks
Late tasks
Late tasks
Data
source
Data...
Early task
Early task
Middle task
Middle task
Late task
Late task
Data
output
Data...
Legend
Legend
Text is not SVG - cannot display
\ No newline at end of file diff --git a/learn/pipelines/task_pipeline-minimal.svg b/learn/pipelines/task_pipeline-minimal.svg new file mode 100644 index 00000000..e71f6156 --- /dev/null +++ b/learn/pipelines/task_pipeline-minimal.svg @@ -0,0 +1,4 @@ + + + +
Read
case data
Read...
Describe
case data
Describe...
Reconstruct
transmission
chains
Reconstruct...
Quantify
transmission
Quantify...
Read
population
data
Read...
Compare
intervention
scenarios
Compare...
Compare
economic
impacts
Compare...
Create
short-term
forecast
Create...
Estimate
severity
Estimate...
Simulate
transmission
scenarios
Simulate...
Read
intervention
data
Read...
Early tasks
Early tasks
Middle tasks
Middle tasks
Late tasks
Late tasks
Text is not SVG - cannot display
\ No newline at end of file diff --git a/learn/pipelines/task_pipeline-pipe_01.svg b/learn/pipelines/task_pipeline-pipe_01.svg new file mode 100644 index 00000000..165293e3 --- /dev/null +++ b/learn/pipelines/task_pipeline-pipe_01.svg @@ -0,0 +1,4 @@ + + + +
Read
case data
Read...
Describe
case data
Describe...
Reconstruct
transmission
chain
Reconstruct...
Quantify
transmission
Quantify...
Read
population
data
Read...
Linelist
Linelist
Contact
data
Contactdata
Delay
distributions
Delay...
Epicurves
Epicurves
Transmission
chains
Transmission...
Create
short-term
forecast
Create...
Estimate
severity
Estimate...
Demographic
distribution
Demographicdi...
Case
data
Case...
Population
data
Population...
Compare
intervention
scenarios
Compare...
Estimates of transmission
Estimates of...
Social
contact
matrix
Social...
Compare
economic
impacts
Compare...
Cost-effectiveness
Cost-effectiv...
Simulate
transmission
scenarios
Simulate...
Case
fatality
ratio
Case...
Read
intervention
data
Read...
Cases
Cases
Hospital
admissions
Hospital...
Deaths
Deaths
Intervention
data
Intervention...
Delay
distributions
Delay...
Transmissibility
pipeline
Transmissibilit...
Early tasks
Early tasks
Middle tasks
Middle tasks
Late tasks
Late tasks
Text is not SVG - cannot display
\ No newline at end of file diff --git a/learn/pipelines/task_pipeline-pipe_02.svg b/learn/pipelines/task_pipeline-pipe_02.svg new file mode 100644 index 00000000..47eb03da --- /dev/null +++ b/learn/pipelines/task_pipeline-pipe_02.svg @@ -0,0 +1,4 @@ + + + +
Read
case data
Read...
Describe
case data
Describe...
Reconstruct
transmission
chain
Reconstruct...
Quantify
transmission
Quantify...
Read
population
data
Read...
Linelist
Linelist
Contact
data
Contactdata
Delay
distributions
Delay...
Epicurves
Epicurves
Transmission
chains
Transmission...
Create
short-term
forecast
Create...
Estimate
severity
Estimate...
Demographic
distribution
Demographicdi...
Scenarios
pipeline
Scenarios...
Case
data
Case...
Population
data
Population...
Compare
intervention
scenarios
Compare...
Estimates of transmission
Estimates of...
Social
contact
matrix
Social...
Compare
economic
impacts
Compare...
Cost-effectiveness
Cost-effectiv...
Simulate
transmission
scenarios
Simulate...
Case
fatality
ratio
Case...
Read
intervention
data
Read...
Cases
Cases
Hospital
admissions
Hospital...
Deaths
Deaths
Intervention
data
Intervention...
Delay
distributions
Delay...
Early tasks
Early tasks
Middle tasks
Middle tasks
Late tasks
Late tasks
Text is not SVG - cannot display
\ No newline at end of file