Skip to content

nih-cfde/icc-eval-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

404 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CFDE ICC Evaluation Core

Run pipeline Run tests

Overview

This repo supports the activities described in this repo.

Requirements

  • Linux or MacOS system
  • Node v22+
  • Bun (for package management only, as faster/smaller replacement for Yarn)

Commands

Use ./run.sh with a --flag to conveniently run scripts of the same name in /data/package.json and /app/package.json (if they exist) from the root of this repo.

Flag Description
--install Install packages and dependencies
--install-playwright Install Playwright
no flag Run main pipeline steps in order
--gather Run "gather data" pipeline step
--print Run "print PDFs" pipeline step
--dev Run dashboard in dev mode
--build Build dashboard for production
--preview Preview production dashboard build
--lint Auto-fix linting/formatting
--test:lint Check linting and formatting
--test:types Check types
--test:e2e Run custom tests
--test Run all tests above
--clean Hard uninstall packages
--script ./some-file.ts Run arbitrary ts file

Pipeline

The automated steps in this repo are roughly as follows:

  1. Gather
    1. Get raw data from an external resource, e.g. scraping an HTML page, downloading/parsing a PDF/CSV, making a request to an API, etc.
    2. Save raw data exactly as-is for provenance and caching.
    3. Collate most important information from raw data into common high-level output data format suited to making desired dashboard pages and PDF reports.
    4. Repeat previous steps in order of dependency (e.g. opportunity number -> grant numbers) until all needed info is gathered.
  2. Print
    1. Run dashboard webapp.
    2. Import output data from gather step, and do some minimal final processing (e.g. combine journal info with each publication listing).
    3. Render select dashboard pages (e.g. /core-project/abc123) to PDF reports.
  3. Deploy dashboard and PDFs to private web addresses.

Repo content

  • /app - Dashboard webapp made with Vue. Also used for generating PDF reports.
    • /public/pdfs - Outputted PDF reports.
  • /data - All other functionality involving data.
    • /api - Types and functions for getting raw data from external APIs.
    • /raw - Raw data gathered from external sources, for provenance.
    • /gather - Functions for gathering data and putting it in a common format.
    • /output - Gathered data in format for making desired reports.
    • /print - Functions specific to making printed reports.
    • /util - Small-scope general purpose functions.

Technology

  • TypeScript - Language used to provide type-safety from beginning to end of pipeline.
  • Playwright - Tool used for scraping public web pages and rendering dashboard pages to PDF reports.
  • Netlify - Service used for privately hosting dashboard webapp (and PR previews).

The pipeline is optimized wherever possible and appropriate. Things like network requests and rendering are parallelized (e.g. PDF reports are printed simultaneously in separate tabs of the same Playwright browser instance). External resources are cached in their raw format to speed up subsequent runs, and to avoid being rate-limited or blocked by those providers.

About

Tools for collecting and reporting CFDE metrics

Topics

Resources

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •