Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/workbook/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ PyStatsV1 Workbook
troubleshooting
track_c
track_d_student_edition
track_d_playbook/index
track_d_chapter_index
track_d
track_d_dataset_map
Expand Down
8 changes: 8 additions & 0 deletions docs/source/workbook/track_d.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,14 @@ What you get
When you run ``pystatsv1 workbook init --track d``, PyStatsV1 creates a local
folder containing:

Big picture map (recommended)
-----------------------------

If you feel like you are learning lots of commands but losing the "why", read:

- :doc:`Track D Playbook: Big Picture <track_d_playbook/index>`


* convenience runner scripts (``d01`` … ``d23``) that map to Track D chapters
* a reproducible, pre-installed dataset under ``data/synthetic/`` (seed=123)
* an ``outputs/track_d/`` folder where results are written
Expand Down
2 changes: 2 additions & 0 deletions docs/source/workbook/track_d_chapter_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ Track D chapter index (PyPI)

This page is a "table of contents" for running Track D from the PyPI workbook.

Track D is the “big picture” track: you’re learning how to do **statistics on accounting data** (not toy datasets) in a way that is repeatable, testable, and usable on your own books. The workflow loop is the same in every chapter: start from accounting tables (canonical demos or your own exports), normalize them into a consistent GL contract (see :doc:`track_d_byod`), validate the structure, then analyze with scripts that produce tidy CSVs, figures, and short JSON/MD summaries. In the PyPI workbook you run chapters with ``pystatsv1 workbook run dXX`` and outputs land under ``outputs/track_d/`` (see :doc:`track_d_outputs_guide`). When you bring your own data, you use the BYOD pipeline (export → ``tables/`` → normalize → ``normalized/`` → analyze); start at :doc:`track_d_byod` and :doc:`track_d_playbook/index` for the end-to-end “how it all fits together.” Keep asking one question as you go: *what does this accounting structure measure, and what statistical summary answers a real decision problem?*

After you've initialized a Track D workbook:

.. code-block:: bash
Expand Down
10 changes: 6 additions & 4 deletions docs/source/workbook/track_d_outputs_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,15 +172,17 @@ This is a reliable "lab rhythm" that works for almost any Track D chapter:
Optional: changing the output location
--------------------------------------

Most Track D scripts support an ``--outdir`` argument.
Most Track D scripts support an ``--outdir`` argument **when you run the script directly**.

This is useful if you want one folder per lab group, or you want to keep a "clean" outputs folder.
The ``pystatsv1 workbook run ...`` command is the simplest way to run Track D, but it does not forward
extra arguments to the underlying script. So if you want a custom outputs folder, run the script with Python:

.. code-block:: console
pystatsv1 workbook run d01 --outdir outputs/track_d_groupA
# from inside your Track D workbook folder
python scripts/d01.py --outdir outputs/track_d_groupA
If you are new to command-line tools, ignore this at first and use the default.
If you are new to command-line tools, ignore this at first and use the default ``outputs/track_d`` folder.

Common gotchas
--------------
Expand Down
49 changes: 49 additions & 0 deletions docs/source/workbook/track_d_playbook/01_orientation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
Orientation: what Track D is and how to use it
==============================================

**Why this exists:** Track D can feel like “a lot of scripts.” This chapter shows the *workflow* that ties everything together.

Learning objectives
-------------------

- Explain Track D in one sentence (statistics on accounting data).
- Describe the Track D workflow: export → normalize → validate → analyze → communicate.
- Know the three kinds of Track D work: case study, labs, and BYOD.
- If you forget a command, run ``pystatsv1 --help`` or ``pystatsv1 trackd byod --help``.

Outline
-------

The Track D workflow in one page
--------------------------------

- Start from an accounting export (or the NSO case study dataset).
- Get the data into the Track D dataset contract (either already canonical, or via BYOD normalization).
- Run a chapter script to answer a question (and write outputs).
- Use the artifacts (CSV/PNG/JSON) to write a short business interpretation.

What you should have at the end
-------------------------------

- A reproducible folder with inputs + scripts + outputs (so you can rerun later).
- A small set of charts/tables that tell a story about revenue, costs, or risk.
- A written summary that a manager could act on.

Common mental model mistakes (and fixes)
----------------------------------------

- Mistake: treating accounting data as “just categories.” Fix: it’s a time-stamped database with structure.
- Mistake: skipping validation. Fix: always run a quick check before believing results.
- Mistake: staring at raw rows. Fix: aggregate into daily/monthly totals and compare periods.

Where this connects in the workbook
-----------------------------------

- :doc:`index` (the Playbook overview / map)
- :doc:`../track_d_student_edition` (how students actually run chapters)
- :doc:`../track_d_outputs_guide` (how to read what scripts produce)
- :doc:`../track_d_byod` (how to analyze your own exports)

.. note::

This page is intentionally an outline right now. Expand it incrementally as we refine Track D narrative.
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Accounting data as a dataset pipeline
=====================================

**Why this exists:** Students often know debits/credits but not how that becomes an analyzable dataset. This bridges that gap.

Learning objectives
-------------------

- Describe the path from business events to statements and analytics.
- Recognize the difference between a chart of accounts, journal, ledger, and trial balance.
- Explain what a “normalization step” does and why it matters.

Outline
-------

From events to reports
----------------------

- Business event → journal entry (date, accounts, amounts, memo).
- Journal entries are the “source record”; the ledger is the “by-account view” of those entries.
- Trial balance is a snapshot of balances by account.
- Statements are *views* built from the trial balance and classifications.

From reports to analysis
------------------------

- Analytics usually starts from the journal/ledger (not the formatted financial statements).
- We create time series (daily/monthly totals), ratios, and variance explanations.
- We then ask: what changed, why, and what should we do next?

Where BYOD fits
---------------

- Different systems export different CSV shapes.
- Adapters convert exports into the Track D canonical tables.
- After normalization, you typically work from ``normalized/gl_journal.csv`` (plus ``normalized/chart_of_accounts.csv``).
- After normalization, analysis scripts don’t care where the data came from.

Where this connects in the workbook
-----------------------------------

- :doc:`../track_d_dataset_map` (what tables exist and what they mean)
- :doc:`../track_d_byod` (the adapter/normalize/validate workflow)

.. note::

This page is intentionally an outline right now. Expand it incrementally as we refine Track D narrative.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
The Track D dataset contract (what scripts expect)
==================================================

**Why this exists:** Track D works because every chapter agrees on a shared data contract. This chapter explains the contract at a high level.

Learning objectives
-------------------

- Know the minimum tables required for GL-based analysis (``chart_of_accounts`` + ``gl_journal``).
- Explain what ``normalized/`` outputs are and why we prefer them for analysis.
- Understand where synthetic datasets come from (seeded, reproducible).

Outline
-------

Inputs vs normalized outputs
----------------------------

- BYOD projects store raw exports under ``tables/`` (source-specific).
- Normalization produces ``normalized/chart_of_accounts.csv`` and ``normalized/gl_journal.csv`` (canonical).
- Everything after that is “just analysis.”

Column naming and why it matters
--------------------------------

- Stable column headers allow scripts to be reused across systems.
- If headers drift, you want a failure early (during normalize/validate), not silent bad analysis.

What ``pystatsv1 trackd validate`` does conceptually
----------------------------------------------------

- Uses a profile (for example, ``core_gl``) to decide what tables/columns are required.
- Checks basic schema and required columns.
- Catches common data issues: missing dates, non-numeric amounts, or malformed account identifiers.

Where this connects in the workbook
-----------------------------------

- :doc:`../track_d_dataset_map` (table-by-table map)
- :doc:`../track_d_outputs_guide` (artifacts and how to use them)
- :doc:`../track_d_byod` (normalization and validation commands)

.. note::

This page is intentionally an outline right now. Expand it incrementally as we refine Track D narrative.
45 changes: 45 additions & 0 deletions docs/source/workbook/track_d_playbook/04_nso_case_story.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
NSO case study: why these numbers exist
=======================================

**Why this exists:** A narrative is the easiest way to keep students oriented. This chapter frames the NSO dataset as a business story.

Learning objectives
-------------------

- Describe the NSO business in 60 seconds (what it sells, who it serves, why data matters).
- Identify the main questions Track D is trying to answer with NSO.
- Explain why a case study is useful before BYOD.

Outline
-------

The business story we’re modeling
---------------------------------

- What NSO does and what financial drivers matter (sales volume, margins, seasonality).
- What data we have and what we *don’t* have (and how that affects conclusions).

The analysis questions Track D repeats
--------------------------------------

- Performance: what happened this period vs last period?
- Drivers: which categories/accounts explain the change?
- Risk: where are anomalies, volatility, or concentration?
- Decisions: what would you recommend based on evidence?
- Each chapter produces a small set of artifacts (CSV/PNG/JSON/MD) that support one of these questions.

Transfer to your own data
-------------------------

- The same questions apply to any small business ledger.
- BYOD lets students swap in their own exports later (see :doc:`../track_d_byod`).

Where this connects in the workbook
-----------------------------------

- :doc:`../track_d` (Track D overview)
- :doc:`../track_d_chapter_index` (where the story shows up in chapters)

.. note::

This page is intentionally an outline right now. Expand it incrementally as we refine Track D narrative.
49 changes: 49 additions & 0 deletions docs/source/workbook/track_d_playbook/05_core_analysis_recipes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
Core analysis recipes (what students actually do)
=================================================

**Why this exists:** This is the practical chapter: recurring tasks and the patterns behind them.

Learning objectives
-------------------

- Compute daily/monthly totals and compare periods.
- Build a simple sales proxy from ledger data.
- Create a small set of plots/tables that answer one clear question.

Outline
-------

Recipe: daily totals
--------------------

- Start from ``normalized/gl_journal.csv`` (canonical) and choose a revenue (or cash) account group.
- Group by date, sum signed amounts.
- If you’re using BYOD, you can generate daily totals with ``pystatsv1 trackd byod daily-totals --project <BYOD_DIR>``.
- Plot a time series; note spikes and missing days.
- Write one sentence about the pattern you see.

Recipe: monthly P&L by category
-------------------------------

- Start from ``normalized/gl_journal.csv`` and map accounts to categories (revenue/COGS/opex).
- Aggregate by month and category; compute shares and changes.
- Identify the top 3 drivers of change month-over-month.
- Sales proxy = sum of signed amounts for revenue accounts by day/month.

Recipe: concentration and outliers
----------------------------------

- Start from ``normalized/gl_journal.csv`` and find the largest transactions and their accounts.
- Compute the share of total explained by the top N rows.
- Flag unusual values for follow-up documentation.

Where this connects in the workbook
-----------------------------------

- :doc:\../track_d_byod` (Bring Your Own Data hub)`
- :doc:`../track_d_outputs_guide` (how to read artifacts)
- :doc:`../track_d_byod_gnucash_demo_analysis` (daily totals helper + example plots)

.. note::

This page is intentionally an outline right now. Expand it incrementally as we refine Track D narrative.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
Time series + forecasting for accounting data
=============================================

**Why this exists:** Forecasting becomes less scary once you’ve built clean daily/monthly series. This chapter outlines the progression.

Learning objectives
-------------------

- Explain trend, seasonality, and noise using accounting time series.
- Build a baseline forecast and evaluate it.
- Understand when forecasting is inappropriate (garbage in / structural breaks).

Outline
-------

Start with baselines
--------------------

- Start from ``normalized/gl_journal.csv`` and build a clean daily/monthly series (revenue proxy, expense totals, or cash).
- Last value, moving average, seasonal naive.
- Always do a simple backtest (train on earlier months, test on later months).
- Compare forecasts with simple error metrics.

Add explanatory variables
-------------------------

- Promotions, holidays, payroll cycles, or other known drivers.
- Use regression as a driver model (not magic).

Keep it business-grounded
-------------------------

- Always interpret: what would make the forecast wrong?
- Document assumptions and data limitations.
- Structural breaks examples: pricing changes, a new location, system migrations, one-time events, policy changes.

Where this connects in the workbook
-----------------------------------

- :doc:`../track_d_chapter_index` (chapters that introduce forecasting ideas)
- :doc:`../track_d_my_own_data` (how to apply the same methods to your exports)

.. note::

This page is intentionally an outline right now. Expand it incrementally as we refine Track D narrative.
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Risk, controls, and data quality checks
=======================================

**Why this exists:** Accounting data is only useful if you trust it. Track D teaches a light version of audit/control thinking for analysts.

Learning objectives
-------------------

- Describe why controls and reconciliation matter for analytics.
- Run simple anomaly checks and interpret them carefully.
- Explain the difference between an error and a legitimate outlier.

Outline
-------

Practical checks that scale
---------------------------

- Start from ``normalized/gl_journal.csv`` (and optionally ``normalized/chart_of_accounts.csv``).
- Missing dates, negative amounts where unexpected, duplicated rows or duplicated transaction references (when present).
- Unusual spikes relative to typical ranges.

Sampling mindset
----------------

- You can’t check everything; choose samples based on risk and materiality.
- Document what you checked and what you didn’t.

When to stop and ask for accounting context
-------------------------------------------

- A statistical red flag is not automatically fraud or error.
- Your next step is often: ask for invoices, contracts, or policy notes (e.g., revenue timing, refunds, capitalization).

Where this connects in the workbook
-----------------------------------

- :doc:`../track_d_outputs_guide` (where checks appear in script outputs)
- :doc:`../track_d_byod` (validate step and why it exists)

.. note::

This page is intentionally an outline right now. Expand it incrementally as we refine Track D narrative.
Loading