From fce428a72a769be32b3fc81cb4f41cf75a212693 Mon Sep 17 00:00:00 2001 From: Isaiah Akorita Date: Mon, 4 Aug 2025 17:14:39 +0100 Subject: [PATCH 1/5] add how-to and explanation sections --- doc/explanation/index.md | 10 ++++++++++ doc/how_to/index.md | 10 ++++++++++ doc/index.md | 2 ++ 3 files changed, 22 insertions(+) create mode 100644 doc/explanation/index.md create mode 100644 doc/how_to/index.md diff --git a/doc/explanation/index.md b/doc/explanation/index.md new file mode 100644 index 000000000..5af3e7348 --- /dev/null +++ b/doc/explanation/index.md @@ -0,0 +1,10 @@ +# Explanation + +Explanation guides provide in-depth understanding of key concepts in hvPlot. These guides help you understand the reasoning behind design decisions and when to use different approaches. + +```{toctree} +:titlesonly: +:hidden: +:maxdepth: 2 + +``` diff --git a/doc/how_to/index.md b/doc/how_to/index.md new file mode 100644 index 000000000..f1c886fff --- /dev/null +++ b/doc/how_to/index.md @@ -0,0 +1,10 @@ +# How-To Guides + +How-to guides are practical, problem-oriented instructions that help you accomplish specific tasks with hvPlot. These guides assume you're already familiar with the basics and want to solve particular problems or achieve specific goals. + +```{toctree} +:titlesonly: +:hidden: +:maxdepth: 2 + +``` diff --git a/doc/index.md b/doc/index.md index e868f5cec..5340961fd 100644 --- a/doc/index.md +++ b/doc/index.md @@ -434,8 +434,10 @@ align: center Tutorials User Guide +How-To Guides Gallery Reference +Explanation Developer Guide Releases Roadmap From 697c1587d27828c372c745168bb2207d8844f147 Mon Sep 17 00:00:00 2001 From: Isaiah Akorita Date: Thu, 21 Aug 2025 16:53:13 +0100 Subject: [PATCH 2/5] add how-to notebooks --- doc/how_to/index.md | 2 + .../multivariate_statistical_plots.ipynb | 121 +++++++++++++++++ doc/how_to/time_series_lag_plots.ipynb | 123 ++++++++++++++++++ 3 files changed, 246 insertions(+) create mode 100644 doc/how_to/multivariate_statistical_plots.ipynb create mode 100644 doc/how_to/time_series_lag_plots.ipynb diff --git a/doc/how_to/index.md b/doc/how_to/index.md index f1c886fff..265ea5a54 100644 --- a/doc/how_to/index.md +++ b/doc/how_to/index.md @@ -7,4 +7,6 @@ How-to guides are practical, problem-oriented instructions that help you accompl :hidden: :maxdepth: 2 +multivariate_statistical_plots +time_series_lag_plots ``` diff --git a/doc/how_to/multivariate_statistical_plots.ipynb b/doc/how_to/multivariate_statistical_plots.ipynb new file mode 100644 index 000000000..f2b61a10e --- /dev/null +++ b/doc/how_to/multivariate_statistical_plots.ipynb @@ -0,0 +1,121 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "22a52969", + "metadata": {}, + "source": [ + "# How to Visualize Multivariate Data with Statistical Plots\n", + "\n", + "When working with datasets that have multiple variables, hvPlot provides several statistical plotting functions to help you explore relationships and patterns. This guide shows you how to use three key methods: scatter matrices, parallel coordinates, and Andrews curves." + ] + }, + { + "cell_type": "markdown", + "id": "c461984a", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "First, import hvplot and load a multivariate dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19eb19e7", + "metadata": {}, + "outputs": [], + "source": [ + "import hvplot.pandas # noqa\n", + "\n", + "penguins = hvplot.sampledata.penguins(\"pandas\").dropna()\n", + "penguins.head(3)" + ] + }, + { + "cell_type": "markdown", + "id": "ea800985", + "metadata": {}, + "source": [ + "## Scatter Matrix\n", + "\n", + "Use a scatter matrix to visualize all pairwise relationships between numeric variables:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "66bc47a4", + "metadata": {}, + "outputs": [], + "source": [ + "num_penguins = penguins[['species', 'bill_length_mm', 'bill_depth_mm',\n", + " 'flipper_length_mm', 'body_mass_g']]\n", + "hvplot.scatter_matrix(num_penguins, c=\"species\")" + ] + }, + { + "cell_type": "markdown", + "id": "ac6e0be3", + "metadata": {}, + "source": [ + "## Parallel Coordinates\n", + "\n", + "Use parallel coordinates to see patterns across all dimensions simultaneously:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1cca872a", + "metadata": {}, + "outputs": [], + "source": [ + "hvplot.parallel_coordinates(num_penguins, \"species\")" + ] + }, + { + "cell_type": "markdown", + "id": "926ca828", + "metadata": {}, + "source": [ + "## Andrews Curves\n", + "\n", + "Use Andrews curves to visualize aggregate differences between classes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d510d131", + "metadata": {}, + "outputs": [], + "source": [ + "hvplot.andrews_curves(num_penguins, \"species\")" + ] + }, + { + "cell_type": "markdown", + "id": "bc2a7c0e", + "metadata": {}, + "source": [ + ":::{admonition} Next Steps\n", + ":class: seealso\n", + "\n", + "- See the [explanation guide](../explanation/statistical_plot_types.ipynb) to understand when to use each plot type\n", + "- Check the [reference documentation](../ref/api/index.md) for complete parameter lists\n", + "- For time series analysis, see [how to analyze time series relationships](time_series_lag_plots.ipynb)\n", + ":::" + ] + } + ], + "metadata": { + "language_info": { + "name": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/how_to/time_series_lag_plots.ipynb b/doc/how_to/time_series_lag_plots.ipynb new file mode 100644 index 000000000..53f62244e --- /dev/null +++ b/doc/how_to/time_series_lag_plots.ipynb @@ -0,0 +1,123 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "56329d5f", + "metadata": {}, + "source": [ + "# How to Analyze Time Series Relationships with Lag Plots\n", + "\n", + "Lag plots help you analyze temporal relationships in time series data by comparing values at different time intervals. This guide shows you how to use hvPlot's `lag_plot()` function to identify patterns, volatility, and autocorrelation in your time series data." + ] + }, + { + "cell_type": "markdown", + "id": "b8ab7520", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "Import hvplot and load time series data:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bf17f07f", + "metadata": {}, + "outputs": [], + "source": [ + "import hvplot.pandas # noqa\n", + "\n", + "stocks = hvplot.sampledata.stocks(\"pandas\", engine_kwargs={\"index_col\": \"date\"})\n", + "stocks.head(2)" + ] + }, + { + "cell_type": "markdown", + "id": "b409471b", + "metadata": {}, + "source": [ + "## Basic Lag Plot\n", + "\n", + "Create a lag plot to compare stock prices with a 30-day lag:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "764ab191", + "metadata": {}, + "outputs": [], + "source": [ + "hvplot.lag_plot(stocks, lag=30, alpha=0.5)" + ] + }, + { + "cell_type": "markdown", + "id": "b89f838d", + "metadata": {}, + "source": [ + "## Comparing Multiple Series\n", + "\n", + "Compare different stocks to see which shows more volatility:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa5409da", + "metadata": {}, + "outputs": [], + "source": [ + "selected_stocks = stocks[['Google', 'Microsoft']]\n", + "\n", + "lag_plot = hvplot.lag_plot(selected_stocks, lag=90, alpha=0.6, frame_width=400)\n", + "lag_plot" + ] + }, + { + "cell_type": "markdown", + "id": "8f6751aa", + "metadata": {}, + "source": [ + "## Interpreting Results\n", + "\n", + "Compare the lag plot with a simple line chart to understand the patterns:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b852e4cc", + "metadata": {}, + "outputs": [], + "source": [ + "line_plot = selected_stocks.hvplot.line(title=\"Stock Prices Over Time\", frame_width=400, alpha=0.6)\n", + "\n", + "(line_plot + lag_plot).cols(1)" + ] + }, + { + "cell_type": "markdown", + "id": "cf194661", + "metadata": {}, + "source": [ + ":::{admonition} Next Steps\n", + ":class: seealso\n", + "\n", + "- See the [explanation guide](../explanation/statistical_plot_types.ipynb) to understand what lag plots reveal about your data\n", + "- Check the [reference documentation](../ref/api/index.md) for complete parameter options\n", + "- For multivariate analysis, see [how to visualize multivariate data](multivariate_statistical_plots.ipynb)" + ] + } + ], + "metadata": { + "language_info": { + "name": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 0172d31622b19b31e603327b3577e14e410739f5 Mon Sep 17 00:00:00 2001 From: Isaiah Akorita Date: Thu, 21 Aug 2025 17:56:31 +0100 Subject: [PATCH 3/5] add explanation notebook --- doc/explanation/index.md | 1 + doc/explanation/statistical_plot_types.ipynb | 214 +++++++++++++++++++ 2 files changed, 215 insertions(+) create mode 100644 doc/explanation/statistical_plot_types.ipynb diff --git a/doc/explanation/index.md b/doc/explanation/index.md index 5af3e7348..d35053c02 100644 --- a/doc/explanation/index.md +++ b/doc/explanation/index.md @@ -7,4 +7,5 @@ Explanation guides provide in-depth understanding of key concepts in hvPlot. The :hidden: :maxdepth: 2 +statistical_plot_types ``` diff --git a/doc/explanation/statistical_plot_types.ipynb b/doc/explanation/statistical_plot_types.ipynb new file mode 100644 index 000000000..202569bba --- /dev/null +++ b/doc/explanation/statistical_plot_types.ipynb @@ -0,0 +1,214 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1aaf9273", + "metadata": {}, + "source": [ + "# Understanding hvPlot's Statistical Plot Types\n", + "\n", + "hvPlot provides several statistical plotting functions that go beyond basic charts. Each plot type reveals different aspects of your data and has specific strengths and limitations. This guide explains when and why to use each type." + ] + }, + { + "cell_type": "markdown", + "id": "b8b10a4b", + "metadata": {}, + "source": [ + "## Multivariate Data Visualization\n", + "\n", + "When working with datasets containing multiple variables, understanding relationships between all dimensions becomes challenging. hvPlot offers three complementary approaches:" + ] + }, + { + "cell_type": "markdown", + "id": "5f52ca70", + "metadata": {}, + "source": [ + "### Scatter Matrix\n", + "\n", + "**What it shows:** All pairwise relationships between numeric variables\n", + "\n", + "**Strengths:**\n", + "- Provides quantitative insights into correlations\n", + "- Interactive linking allows exploration across all variable pairs\n", + "- Familiar scatter plot format is easy to interpret\n", + "\n", + "**Best for:** Identifying correlations, outliers, and clustering patterns between variable pairs\n", + "\n", + "**Limitations:** Can become cluttered with many variables; doesn't show patterns across all dimensions simultaneously" + ] + }, + { + "cell_type": "markdown", + "id": "2d894a15", + "metadata": {}, + "source": [ + "### Parallel Coordinates\n", + "\n", + "**What it shows:** Patterns and relationships across all variables simultaneously\n", + "\n", + "**Strengths:**\n", + "- Reveals patterns across all dimensions at once\n", + "- Excellent for identifying distinct groups or classes\n", + "- Shows which variables contribute most to group differences\n", + "\n", + "**Best for:** Comparing groups across multiple dimensions, identifying which variables distinguish different classes\n", + "\n", + "**Limitations:** Can be difficult to read with many observations; requires some practice to interpret effectively" + ] + }, + { + "cell_type": "markdown", + "id": "9d1ba8b8", + "metadata": {}, + "source": [ + "### Andrews Curves\n", + "\n", + "**What it shows:** Aggregate differences between classes using Fourier series representation\n", + "\n", + "**Strengths:**\n", + "- Smooth curves make group differences visually apparent\n", + "- Good for showing overall class separation\n", + "- Less cluttered than parallel coordinates with many observations\n", + "\n", + "**Best for:** Visualizing overall differences between classes when you care more about separation than specific variable contributions\n", + "\n", + "**Limitations:** Provides less quantitative insight into which specific features drive differences; mathematical transformation makes individual variable contributions less interpretable" + ] + }, + { + "cell_type": "markdown", + "id": "b925b654", + "metadata": {}, + "source": [ + "## Time Series Analysis\n", + "\n", + "### Lag Plots\n", + "\n", + "**What it shows:** Relationship between current values and values at a previous time point\n", + "\n", + "**Strengths:**\n", + "- Reveals autocorrelation patterns in time series\n", + "- Identifies volatility and stability in temporal data\n", + "- Helps detect seasonal or cyclical patterns\n", + "\n", + "**Best for:** Understanding temporal dependencies, comparing volatility between different time series, detecting autocorrelation\n", + "\n", + "**Key insight:** Tight clustering around the diagonal indicates stable, predictable behavior; scattered points indicate high volatility or weak temporal correlation" + ] + }, + { + "cell_type": "markdown", + "id": "ff27fc8e", + "metadata": {}, + "source": [ + "## Distribution Analysis\n", + "\n", + "Understanding the distribution of your data is fundamental to statistical analysis. hvPlot provides several plot types that reveal different aspects of data distributions:\n", + "\n", + "### Histograms\n", + "\n", + "**What it shows:** Frequency distribution of values in a single variable\n", + "\n", + "**Strengths:**\n", + "- Clear visualization of data distribution shape\n", + "- Easy to identify skewness, modality, and outliers\n", + "- Familiar and intuitive for most users\n", + "- Customizable bin sizes for different levels of detail\n", + "\n", + "**Best for:** Understanding the overall shape and spread of a single variable, identifying distribution patterns\n", + "\n", + "**Limitations:** Can be sensitive to bin size choices; doesn't show relationships between variables\n", + "\n", + "### Box Plots\n", + "\n", + "**What it shows:** Five-number summary (minimum, Q1, median, Q3, maximum) plus outliers\n", + "\n", + "**Strengths:**\n", + "- Compact summary of distribution characteristics\n", + "- Excellent for comparing distributions across groups\n", + "- Clearly identifies outliers and quartile ranges\n", + "- Robust to extreme values\n", + "\n", + "**Best for:** Comparing distributions between groups, identifying outliers, understanding data spread and central tendency\n", + "\n", + "**Limitations:** Hides detailed distribution shape; can miss bimodal or complex distributions\n", + "\n", + "### Violin Plots\n", + "\n", + "**What it shows:** Combination of box plot information with kernel density estimation\n", + "\n", + "**Strengths:**\n", + "- Shows both summary statistics and distribution shape\n", + "- Reveals multimodal distributions that box plots miss\n", + "- Good for comparing complex distributions across groups\n", + "- More informative than box plots for understanding distribution shape\n", + "\n", + "**Best for:** Comparing detailed distribution shapes across groups, when you need both summary statistics and distribution density\n", + "\n", + "**Limitations:** Can be more complex to interpret; kernel density estimation may smooth over important details" + ] + }, + { + "cell_type": "markdown", + "id": "3969834d", + "metadata": {}, + "source": [ + "## Interactive Advantages\n", + "\n", + "All hvPlot statistical plots benefit from Bokeh's interactive features:\n", + "\n", + "- **Linked brushing:** Selections in one part of the plot highlight corresponding points elsewhere\n", + "- **Linked zooming/panning:** Coordinated exploration across multiple plot panels\n", + "- **Hover tooltips:** Detailed information about individual data points\n", + "\n", + "These features make hvPlot's statistical plots significantly more powerful than static alternatives for data exploration." + ] + }, + { + "cell_type": "markdown", + "id": "5dd00ffb", + "metadata": {}, + "source": [ + "## Choosing the Right Plot Type\n", + "\n", + "| Goal | Recommended Plot | Why |\n", + "|------|------------------|-----|\n", + "| Find correlations between variable pairs | Scatter Matrix | Shows quantitative relationships clearly |\n", + "| Compare groups across many variables | Parallel Coordinates | Reveals which variables distinguish groups |\n", + "| Show overall class separation | Andrews Curves | Emphasizes aggregate differences |\n", + "| Analyze temporal dependencies | Lag Plot | Designed specifically for time series patterns |\n", + "| Understand single variable distribution | Histogram | Clear frequency distribution visualization |\n", + "| Compare distributions across groups | Box Plot or Violin Plot | Box plots for simple comparisons, violin plots for detailed shapes |\n", + "| Identify outliers | Box Plot | Explicitly shows outliers beyond quartile ranges |\n", + "| Detect multimodal distributions | Violin Plot or Histogram | Violin plots show density curves, histograms show frequency peaks |\n", + "| Quick distribution summary | Box Plot | Compact five-number summary |\n", + "| Detailed distribution analysis | Violin Plot | Combines summary statistics with full distribution shape |\n", + "| Detect outliers in multivariate data | Scatter Matrix + Parallel Coordinates | Combine pairwise and multi-dimensional views |" + ] + }, + { + "cell_type": "markdown", + "id": "41d345c1", + "metadata": {}, + "source": [ + "## Next Steps\n", + "\n", + "- Learn how to create:\n", + " - [multivariate statistical plots](../how_to/multivariate_statistical_plots.ipynb)\n", + " - [time series lag plots](../how_to/time_series_lag_plots.ipynb)\n", + "- See the [reference documentation](../ref/api/index.md) for complete parameter lists\n", + "- Explore more visualization options at [holoviews.org](https://holoviews.org)" + ] + } + ], + "metadata": { + "language_info": { + "name": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 263a501a158a9b992a993dd5fd886ddaca4a057f Mon Sep 17 00:00:00 2001 From: Isaiah Akorita Date: Mon, 25 Aug 2025 19:19:33 +0100 Subject: [PATCH 4/5] review --- doc/explanation/statistical_plot_types.ipynb | 512 ++++++++++++++++-- doc/how_to/index.md | 2 - .../multivariate_statistical_plots.ipynb | 121 ----- doc/how_to/time_series_lag_plots.ipynb | 123 ----- 4 files changed, 455 insertions(+), 303 deletions(-) delete mode 100644 doc/how_to/multivariate_statistical_plots.ipynb delete mode 100644 doc/how_to/time_series_lag_plots.ipynb diff --git a/doc/explanation/statistical_plot_types.ipynb b/doc/explanation/statistical_plot_types.ipynb index 202569bba..19abe1579 100644 --- a/doc/explanation/statistical_plot_types.ipynb +++ b/doc/explanation/statistical_plot_types.ipynb @@ -7,7 +7,286 @@ "source": [ "# Understanding hvPlot's Statistical Plot Types\n", "\n", - "hvPlot provides several statistical plotting functions that go beyond basic charts. Each plot type reveals different aspects of your data and has specific strengths and limitations. This guide explains when and why to use each type." + "hvPlot provides several statistical plotting functions that go beyond basic charts. Each plot type reveals different aspects of your data and has specific strengths and limitations. This guide explains when and why to use each type.\n", + "\n", + "## Load sample data for examples" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b8294f32", + "metadata": {}, + "outputs": [], + "source": [ + "import hvplot.pandas # noqa\n", + "from sklearn.preprocessing import StandardScaler\n", + "import pandas as pd\n", + "\n", + "\n", + "penguins = hvplot.sampledata.penguins(\"pandas\").dropna()\n", + "stocks = hvplot.sampledata.stocks(\"pandas\")\n", + "\n", + "# Prepare data for multivariate examples\n", + "num_cols = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']\n", + "penguins_subset = penguins[['species'] + num_cols]#.sample(100, random_state=42)\n", + "\n", + "# Normalized version for some plots\n", + "scaler = StandardScaler()\n", + "scaled_features = scaler.fit_transform(penguins_subset[num_cols])\n", + "penguins_scaled = pd.DataFrame(scaled_features, columns=num_cols)\n", + "penguins_scaled['species'] = penguins_subset['species'].values" + ] + }, + { + "cell_type": "markdown", + "id": "3f9e7bfb-276c-4746-8ec9-9d58a7f09d91", + "metadata": {}, + "source": [ + "## Distribution Analysis\n", + "\n", + "Understanding the distribution of your data is fundamental to statistical analysis. hvPlot provides several plot types that reveal different aspects of data distributions:\n", + "\n", + "### Histograms\n", + "\n", + "**What it shows:** Frequency distribution of values in a single variable\n", + "\n", + "**Strengths:**\n", + "- Clear visualization of data distribution shape\n", + "- Easy to identify skewness, modality, and outliers\n", + "- Familiar and intuitive for most users\n", + "- Customizable bin sizes for different levels of detail\n", + "\n", + "**Best for:** Understanding the overall shape and spread of a single variable, identifying distribution patterns\n", + "\n", + "**Limitations:** Can be sensitive to bin size choices; doesn't show relationships between variables" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b1fb63ce-738d-49fe-ae3d-e21918769e88", + "metadata": {}, + "outputs": [], + "source": [ + "# Example: Histogram showing distribution shape\n", + "\n", + "penguins.hvplot.hist(y='body_mass_g', by='species', alpha=0.6, bins=20)" + ] + }, + { + "cell_type": "markdown", + "id": "98fdcf50", + "metadata": {}, + "source": [ + "Notice how each species shows a different distribution shape: Adelie penguins have a wider spread and lower average body mass, while Gentoo penguins are clearly heavier with less overlap with the other species." + ] + }, + { + "cell_type": "markdown", + "id": "2ecfb43c", + "metadata": {}, + "source": [ + ":::{seealso}\n", + "See the reference guide for [Histograms](../ref/api/manual/hvplot.hvPlot.hist.ipynb)\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "id": "5189025d-5a08-485e-beaf-f8af6f60b3ef", + "metadata": {}, + "source": [ + "### Box Plots\n", + "\n", + "**What it shows:** Five-number summary (minimum, Q1, median, Q3, maximum) plus outliers\n", + "\n", + "**Strengths:**\n", + "- Compact summary of distribution characteristics\n", + "- Excellent for comparing distributions across groups\n", + "- Clearly identifies outliers and quartile ranges\n", + "- Robust to extreme values\n", + "\n", + "**Best for:** Comparing distributions between groups, identifying outliers, understanding data spread and central tendency\n", + "\n", + "**Limitations:** Hides detailed distribution shape; can miss bimodal or complex distributions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ea44af35-33de-4b54-8bee-24e216d2583f", + "metadata": {}, + "outputs": [], + "source": [ + "# Example: Box plot comparing distributions across groups\n", + "penguins.hvplot.box(y='flipper_length_mm', by='species')" + ] + }, + { + "cell_type": "markdown", + "id": "6e3e2afd-48bb-4895-9a9a-ffa1069a6cd6", + "metadata": {}, + "source": [ + "The box plots provide a compact summary showing that Gentoo penguins have notably longer flippers with less variability, while Adelie penguins show the shortest flipper lengths. The boxes show quartiles, and any points beyond the whiskers would indicate outliers." + ] + }, + { + "cell_type": "markdown", + "id": "0b83df85-8346-4536-899c-d81bb17cc314", + "metadata": {}, + "source": [ + ":::{seealso}\n", + "See the reference guide for [Box plots](../ref/api/manual/hvplot.hvPlot.box.ipynb)\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "id": "78b9159f-dcbd-431b-9c36-87084ba83cc9", + "metadata": {}, + "source": [ + "### Violin Plots\n", + "\n", + "**What it shows:** Combination of box plot information with kernel density estimation\n", + "\n", + "**Strengths:**\n", + "- Shows both summary statistics and distribution shape\n", + "- Reveals multimodal distributions that box plots miss\n", + "- Good for comparing complex distributions across groups\n", + "- More informative than box plots for understanding distribution shape\n", + "\n", + "**Best for:** Comparing detailed distribution shapes across groups, when you need both summary statistics and distribution density\n", + "\n", + "**Limitations:** Can be more complex to interpret; kernel density estimation may smooth over important details" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47816f95", + "metadata": {}, + "outputs": [], + "source": [ + "# Example: Violin plot showing detailed distribution shapes\n", + "penguins.hvplot.violin(y='bill_length_mm', by='species')" + ] + }, + { + "cell_type": "markdown", + "id": "168e1418-cc07-4cd5-ae37-dfcab390f072", + "metadata": {}, + "source": [ + "The violin plots reveal the full distribution shape within each group. Notice how Chinstrap penguins show a slightly bimodal distribution in bill length, while Gentoo and Adelie show more symmetric, unimodal distributions. The white dot shows the median, and the thick black bar represents the interquartile range." + ] + }, + { + "cell_type": "markdown", + "id": "00cf02bf-60d7-44e3-b357-8f4ed21a6967", + "metadata": {}, + "source": [ + ":::{seealso}\n", + "See the reference guide for [Violin plots](../ref/api/manual/hvplot.hvPlot.violin.ipynb)\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "id": "098c30b2-5aff-48b1-be0f-90d32065ee0e", + "metadata": {}, + "source": [ + "### Heatmaps\n", + "\n", + "**What it shows:** Matrix of values represented as colors, often used for correlation matrices or 2D binned data\n", + "\n", + "**Strengths:**\n", + "- Excellent for visualizing correlation matrices\n", + "- Clear representation of patterns in 2D gridded data\n", + "- Good for showing relationships across many variable pairs simultaneously\n", + "- Effective for identifying clusters and patterns in matrix data\n", + "\n", + "**Best for:** Visualizing correlation matrices, 2D binned data, confusion matrices, or any matrix-structured data\n", + "\n", + "**Limitations:** Requires gridded or matrix-structured data; can lose individual data point information" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e897a9ba", + "metadata": {}, + "outputs": [], + "source": [ + "# Example: Heatmap showing correlation matrix\n", + "correlation_matrix = penguins[num_cols].corr()\n", + "correlation_matrix.hvplot.heatmap(cmap='coolwarm')" + ] + }, + { + "cell_type": "markdown", + "id": "f0940211", + "metadata": {}, + "source": [ + "The heatmap reveals strong positive correlations (darker red) between flipper length and body mass, and between bill length and bill depth. These relationships suggest that larger penguins tend to have proportionally larger features overall." + ] + }, + { + "cell_type": "markdown", + "id": "fa0a0db3", + "metadata": {}, + "source": [ + ":::{seealso}\n", + "See the reference guide for [Heatmaps](../ref/api/manual/hvplot.hvPlot.heatmap.ipynb)\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "id": "acea8c30-54ef-4359-911b-1e637d3c5cf7", + "metadata": {}, + "source": [ + "### KDE (Kernel Density Estimation) Plots\n", + "\n", + "**What it shows:** Smooth density estimation of data distribution using kernel functions\n", + "\n", + "**Strengths:**\n", + "- Provides smooth, continuous representation of data density\n", + "- Good for overlaying multiple distributions for comparison\n", + "- Less sensitive to bin choices than histograms\n", + "- Effective for showing distribution shape and identifying modes\n", + "\n", + "**Best for:** Comparing multiple distributions, showing smooth density estimates, identifying distribution modes\n", + "\n", + "**Limitations:** Bandwidth selection can affect results; may smooth over important details; computationally more expensive than histograms" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dcf6eb80", + "metadata": {}, + "outputs": [], + "source": [ + "# Example: KDE plot comparing smooth density distributions\n", + "penguins.hvplot.kde(y='body_mass_g', by='species', alpha=0.6)" + ] + }, + { + "cell_type": "markdown", + "id": "82eb2970", + "metadata": {}, + "source": [ + "The smooth KDE curves make it easy to compare distribution shapes across species. Note how Gentoo penguins show a distinct peak at higher body mass values, while Adelie and Chinstrap distributions overlap more significantly." + ] + }, + { + "cell_type": "markdown", + "id": "fdfb9a7e", + "metadata": {}, + "source": [ + ":::{seealso}\n", + "See the reference guide for [KDE plots](../ref/api/manual/hvplot.hvPlot.kde.ipynb)\n", + ":::" ] }, { @@ -39,6 +318,35 @@ "**Limitations:** Can become cluttered with many variables; doesn't show patterns across all dimensions simultaneously" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "06816511", + "metadata": {}, + "outputs": [], + "source": [ + "# Example: Scatter matrix showing pairwise relationships\n", + "hvplot.scatter_matrix(penguins_subset, c=\"species\", alpha=0.6)" + ] + }, + { + "cell_type": "markdown", + "id": "b5e79450", + "metadata": {}, + "source": [ + "The scatter matrix shows that Gentoo penguins (orange) form distinct clusters in most variable pairs, particularly visible in flipper length vs body mass. The diagonal histograms reveal the distribution of each individual variable." + ] + }, + { + "cell_type": "markdown", + "id": "8f31353b", + "metadata": {}, + "source": [ + ":::{seealso}\n", + "See the reference guide for [Scatter Matrix](../ref/api/manual/hvplot.plotting.scatter_matrix.ipynb)\n", + ":::" + ] + }, { "cell_type": "markdown", "id": "2d894a15", @@ -58,6 +366,35 @@ "**Limitations:** Can be difficult to read with many observations; requires some practice to interpret effectively" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "de380135", + "metadata": {}, + "outputs": [], + "source": [ + "# Example: Parallel coordinates showing patterns across all dimensions\n", + "hvplot.parallel_coordinates(penguins_scaled, \"species\", alpha=0.7)" + ] + }, + { + "cell_type": "markdown", + "id": "8e9dfda1", + "metadata": {}, + "source": [ + "The parallel coordinates plot reveals that Gentoo penguins consistently have higher values across most features (especially flipper length and body mass), while Adelie and Chinstrap show more similar patterns with some overlap." + ] + }, + { + "cell_type": "markdown", + "id": "29afd03f", + "metadata": {}, + "source": [ + ":::{seealso}\n", + "See the reference guide for [Parallel Coordinates](../ref/api/manual/hvplot.plotting.parallel_coordinates.ipynb)\n", + ":::" + ] + }, { "cell_type": "markdown", "id": "9d1ba8b8", @@ -77,6 +414,88 @@ "**Limitations:** Provides less quantitative insight into which specific features drive differences; mathematical transformation makes individual variable contributions less interpretable" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "45b8b826", + "metadata": {}, + "outputs": [], + "source": [ + "# Example: Andrews curves showing class separation\n", + "hvplot.andrews_curves(penguins_scaled, \"species\", samples=30)" + ] + }, + { + "cell_type": "markdown", + "id": "959f3215", + "metadata": {}, + "source": [ + "The Andrews curves transform the multi-dimensional data into smooth periodic functions. Notice how Gentoo penguins form a distinct curve pattern that's clearly separated from the other two species, confirming their distinctiveness across multiple dimensions." + ] + }, + { + "cell_type": "markdown", + "id": "6bd0fe5e", + "metadata": {}, + "source": [ + ":::{seealso}\n", + "See the reference guide for [Andrews Curves](../ref/api/manual/hvplot.plotting.andrews_curves.ipynb)\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "id": "be99e06a-88fb-4761-97c6-cc6854cba2bb", + "metadata": {}, + "source": [ + "## Bivariate Analysis\n", + "\n", + "Understanding relationships between pairs of variables requires specialized visualization approaches. hvPlot provides several methods for bivariate exploration:\n", + "\n", + "### Bivariate Plots\n", + "\n", + "**What it shows:** Joint distribution and relationship between two continuous variables\n", + "\n", + "**Strengths:**\n", + "- Combines scatter plot with marginal distributions\n", + "- Shows both individual variable distributions and their relationship\n", + "- Excellent for understanding correlation patterns and outliers\n", + "- Provides comprehensive view of two-variable relationships\n", + "\n", + "**Best for:** Exploring relationships between two continuous variables, understanding joint distributions\n", + "\n", + "**Limitations:** Limited to two variables at a time; can become cluttered with many data points" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6b1bb8b0", + "metadata": {}, + "outputs": [], + "source": [ + "# Example: Bivariate plot showing joint distribution\n", + "penguins.hvplot.bivariate('bill_length_mm', 'flipper_length_mm', by='species')" + ] + }, + { + "cell_type": "markdown", + "id": "665b752b", + "metadata": {}, + "source": [ + "The bivariate plot combines scatter plots with marginal histograms, showing both the relationship between bill length and flipper length and the individual distributions. The clear clustering by species in the main plot confirms these measurements are good discriminators." + ] + }, + { + "cell_type": "markdown", + "id": "7e34c3cb", + "metadata": {}, + "source": [ + ":::{seealso}\n", + "See the reference guide for [Bivariate plots](../ref/api/manual/hvplot.hvPlot.bivariate.ipynb)\n", + ":::" + ] + }, { "cell_type": "markdown", "id": "b925b654", @@ -98,56 +517,34 @@ "**Key insight:** Tight clustering around the diagonal indicates stable, predictable behavior; scattered points indicate high volatility or weak temporal correlation" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "f42e0ac3", + "metadata": {}, + "outputs": [], + "source": [ + "# Example: Lag plot comparing stock volatility\n", + "stock_subset = stocks[['Apple', 'Microsoft']].iloc[:200] # Subset for clarity\n", + "hvplot.lag_plot(stock_subset, lag=30, alpha=0.6)" + ] + }, { "cell_type": "markdown", - "id": "ff27fc8e", + "id": "a8b1db2d", "metadata": {}, "source": [ - "## Distribution Analysis\n", - "\n", - "Understanding the distribution of your data is fundamental to statistical analysis. hvPlot provides several plot types that reveal different aspects of data distributions:\n", - "\n", - "### Histograms\n", - "\n", - "**What it shows:** Frequency distribution of values in a single variable\n", - "\n", - "**Strengths:**\n", - "- Clear visualization of data distribution shape\n", - "- Easy to identify skewness, modality, and outliers\n", - "- Familiar and intuitive for most users\n", - "- Customizable bin sizes for different levels of detail\n", - "\n", - "**Best for:** Understanding the overall shape and spread of a single variable, identifying distribution patterns\n", - "\n", - "**Limitations:** Can be sensitive to bin size choices; doesn't show relationships between variables\n", - "\n", - "### Box Plots\n", - "\n", - "**What it shows:** Five-number summary (minimum, Q1, median, Q3, maximum) plus outliers\n", - "\n", - "**Strengths:**\n", - "- Compact summary of distribution characteristics\n", - "- Excellent for comparing distributions across groups\n", - "- Clearly identifies outliers and quartile ranges\n", - "- Robust to extreme values\n", - "\n", - "**Best for:** Comparing distributions between groups, identifying outliers, understanding data spread and central tendency\n", - "\n", - "**Limitations:** Hides detailed distribution shape; can miss bimodal or complex distributions\n", - "\n", - "### Violin Plots\n", - "\n", - "**What it shows:** Combination of box plot information with kernel density estimation\n", - "\n", - "**Strengths:**\n", - "- Shows both summary statistics and distribution shape\n", - "- Reveals multimodal distributions that box plots miss\n", - "- Good for comparing complex distributions across groups\n", - "- More informative than box plots for understanding distribution shape\n", - "\n", - "**Best for:** Comparing detailed distribution shapes across groups, when you need both summary statistics and distribution density\n", - "\n", - "**Limitations:** Can be more complex to interpret; kernel density estimation may smooth over important details" + "The lag plot shows the relationship between stock prices and their values 30 days earlier. Points scattered widely from the diagonal indicate high volatility, while points close to the diagonal suggest more predictable, stable price movements." + ] + }, + { + "cell_type": "markdown", + "id": "7b82506d", + "metadata": {}, + "source": [ + ":::{seealso}\n", + "See the reference guide for [Lag plots](../ref/api/manual/hvplot.plotting.lag_plot.ipynb)\n", + ":::" ] }, { @@ -159,7 +556,7 @@ "\n", "All hvPlot statistical plots benefit from Bokeh's interactive features:\n", "\n", - "- **Linked brushing:** Selections in one part of the plot highlight corresponding points elsewhere\n", + "- **Shared axes:** Multiple subplots automatically share the same axis ranges, so zooming or panning in one subplot synchronizes across all related plots\n", "- **Linked zooming/panning:** Coordinated exploration across multiple plot panels\n", "- **Hover tooltips:** Detailed information about individual data points\n", "\n", @@ -179,12 +576,16 @@ "| Compare groups across many variables | Parallel Coordinates | Reveals which variables distinguish groups |\n", "| Show overall class separation | Andrews Curves | Emphasizes aggregate differences |\n", "| Analyze temporal dependencies | Lag Plot | Designed specifically for time series patterns |\n", - "| Understand single variable distribution | Histogram | Clear frequency distribution visualization |\n", + "| Understand single variable distribution | Histogram or KDE | Histograms for frequency, KDE for smooth density |\n", "| Compare distributions across groups | Box Plot or Violin Plot | Box plots for simple comparisons, violin plots for detailed shapes |\n", "| Identify outliers | Box Plot | Explicitly shows outliers beyond quartile ranges |\n", - "| Detect multimodal distributions | Violin Plot or Histogram | Violin plots show density curves, histograms show frequency peaks |\n", + "| Detect multimodal distributions | Violin Plot, KDE, or Histogram | Multiple approaches reveal different aspects of modes |\n", "| Quick distribution summary | Box Plot | Compact five-number summary |\n", "| Detailed distribution analysis | Violin Plot | Combines summary statistics with full distribution shape |\n", + "| Explore two-variable relationships | Bivariate Plot | Shows joint distribution and marginal distributions |\n", + "| Visualize correlation patterns | Heatmap | Clear matrix representation of correlations |\n", + "| Compare multiple distributions smoothly | KDE Plot | Smooth density curves for easy comparison |\n", + "| Analyze matrix or gridded data | Heatmap | Designed specifically for matrix visualization |\n", "| Detect outliers in multivariate data | Scatter Matrix + Parallel Coordinates | Combine pairwise and multi-dimensional views |" ] }, @@ -193,13 +594,10 @@ "id": "41d345c1", "metadata": {}, "source": [ - "## Next Steps\n", - "\n", - "- Learn how to create:\n", - " - [multivariate statistical plots](../how_to/multivariate_statistical_plots.ipynb)\n", - " - [time series lag plots](../how_to/time_series_lag_plots.ipynb)\n", - "- See the [reference documentation](../ref/api/index.md) for complete parameter lists\n", - "- Explore more visualization options at [holoviews.org](https://holoviews.org)" + ":::{admonition} Next Steps\n", + ":class: seealso\n", + "Explore more visualization options at [holoviews.org](https://holoviews.org)\n", + ":::" ] } ], diff --git a/doc/how_to/index.md b/doc/how_to/index.md index 265ea5a54..f1c886fff 100644 --- a/doc/how_to/index.md +++ b/doc/how_to/index.md @@ -7,6 +7,4 @@ How-to guides are practical, problem-oriented instructions that help you accompl :hidden: :maxdepth: 2 -multivariate_statistical_plots -time_series_lag_plots ``` diff --git a/doc/how_to/multivariate_statistical_plots.ipynb b/doc/how_to/multivariate_statistical_plots.ipynb deleted file mode 100644 index f2b61a10e..000000000 --- a/doc/how_to/multivariate_statistical_plots.ipynb +++ /dev/null @@ -1,121 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "22a52969", - "metadata": {}, - "source": [ - "# How to Visualize Multivariate Data with Statistical Plots\n", - "\n", - "When working with datasets that have multiple variables, hvPlot provides several statistical plotting functions to help you explore relationships and patterns. This guide shows you how to use three key methods: scatter matrices, parallel coordinates, and Andrews curves." - ] - }, - { - "cell_type": "markdown", - "id": "c461984a", - "metadata": {}, - "source": [ - "## Setup\n", - "\n", - "First, import hvplot and load a multivariate dataset:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "19eb19e7", - "metadata": {}, - "outputs": [], - "source": [ - "import hvplot.pandas # noqa\n", - "\n", - "penguins = hvplot.sampledata.penguins(\"pandas\").dropna()\n", - "penguins.head(3)" - ] - }, - { - "cell_type": "markdown", - "id": "ea800985", - "metadata": {}, - "source": [ - "## Scatter Matrix\n", - "\n", - "Use a scatter matrix to visualize all pairwise relationships between numeric variables:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "66bc47a4", - "metadata": {}, - "outputs": [], - "source": [ - "num_penguins = penguins[['species', 'bill_length_mm', 'bill_depth_mm',\n", - " 'flipper_length_mm', 'body_mass_g']]\n", - "hvplot.scatter_matrix(num_penguins, c=\"species\")" - ] - }, - { - "cell_type": "markdown", - "id": "ac6e0be3", - "metadata": {}, - "source": [ - "## Parallel Coordinates\n", - "\n", - "Use parallel coordinates to see patterns across all dimensions simultaneously:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "1cca872a", - "metadata": {}, - "outputs": [], - "source": [ - "hvplot.parallel_coordinates(num_penguins, \"species\")" - ] - }, - { - "cell_type": "markdown", - "id": "926ca828", - "metadata": {}, - "source": [ - "## Andrews Curves\n", - "\n", - "Use Andrews curves to visualize aggregate differences between classes:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d510d131", - "metadata": {}, - "outputs": [], - "source": [ - "hvplot.andrews_curves(num_penguins, \"species\")" - ] - }, - { - "cell_type": "markdown", - "id": "bc2a7c0e", - "metadata": {}, - "source": [ - ":::{admonition} Next Steps\n", - ":class: seealso\n", - "\n", - "- See the [explanation guide](../explanation/statistical_plot_types.ipynb) to understand when to use each plot type\n", - "- Check the [reference documentation](../ref/api/index.md) for complete parameter lists\n", - "- For time series analysis, see [how to analyze time series relationships](time_series_lag_plots.ipynb)\n", - ":::" - ] - } - ], - "metadata": { - "language_info": { - "name": "python", - "pygments_lexer": "ipython3" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/doc/how_to/time_series_lag_plots.ipynb b/doc/how_to/time_series_lag_plots.ipynb deleted file mode 100644 index 53f62244e..000000000 --- a/doc/how_to/time_series_lag_plots.ipynb +++ /dev/null @@ -1,123 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "56329d5f", - "metadata": {}, - "source": [ - "# How to Analyze Time Series Relationships with Lag Plots\n", - "\n", - "Lag plots help you analyze temporal relationships in time series data by comparing values at different time intervals. This guide shows you how to use hvPlot's `lag_plot()` function to identify patterns, volatility, and autocorrelation in your time series data." - ] - }, - { - "cell_type": "markdown", - "id": "b8ab7520", - "metadata": {}, - "source": [ - "## Setup\n", - "\n", - "Import hvplot and load time series data:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "bf17f07f", - "metadata": {}, - "outputs": [], - "source": [ - "import hvplot.pandas # noqa\n", - "\n", - "stocks = hvplot.sampledata.stocks(\"pandas\", engine_kwargs={\"index_col\": \"date\"})\n", - "stocks.head(2)" - ] - }, - { - "cell_type": "markdown", - "id": "b409471b", - "metadata": {}, - "source": [ - "## Basic Lag Plot\n", - "\n", - "Create a lag plot to compare stock prices with a 30-day lag:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "764ab191", - "metadata": {}, - "outputs": [], - "source": [ - "hvplot.lag_plot(stocks, lag=30, alpha=0.5)" - ] - }, - { - "cell_type": "markdown", - "id": "b89f838d", - "metadata": {}, - "source": [ - "## Comparing Multiple Series\n", - "\n", - "Compare different stocks to see which shows more volatility:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "aa5409da", - "metadata": {}, - "outputs": [], - "source": [ - "selected_stocks = stocks[['Google', 'Microsoft']]\n", - "\n", - "lag_plot = hvplot.lag_plot(selected_stocks, lag=90, alpha=0.6, frame_width=400)\n", - "lag_plot" - ] - }, - { - "cell_type": "markdown", - "id": "8f6751aa", - "metadata": {}, - "source": [ - "## Interpreting Results\n", - "\n", - "Compare the lag plot with a simple line chart to understand the patterns:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b852e4cc", - "metadata": {}, - "outputs": [], - "source": [ - "line_plot = selected_stocks.hvplot.line(title=\"Stock Prices Over Time\", frame_width=400, alpha=0.6)\n", - "\n", - "(line_plot + lag_plot).cols(1)" - ] - }, - { - "cell_type": "markdown", - "id": "cf194661", - "metadata": {}, - "source": [ - ":::{admonition} Next Steps\n", - ":class: seealso\n", - "\n", - "- See the [explanation guide](../explanation/statistical_plot_types.ipynb) to understand what lag plots reveal about your data\n", - "- Check the [reference documentation](../ref/api/index.md) for complete parameter options\n", - "- For multivariate analysis, see [how to visualize multivariate data](multivariate_statistical_plots.ipynb)" - ] - } - ], - "metadata": { - "language_info": { - "name": "python", - "pygments_lexer": "ipython3" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} From 9b1e8769c00644a828d6a938a7779b101ebbedf1 Mon Sep 17 00:00:00 2001 From: Isaiah Akorita Date: Fri, 17 Apr 2026 21:01:15 +0100 Subject: [PATCH 5/5] update notebook --- doc/explanation/statistical_plot_types.ipynb | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/doc/explanation/statistical_plot_types.ipynb b/doc/explanation/statistical_plot_types.ipynb index 19abe1579..21b16d448 100644 --- a/doc/explanation/statistical_plot_types.ipynb +++ b/doc/explanation/statistical_plot_types.ipynb @@ -430,7 +430,7 @@ "id": "959f3215", "metadata": {}, "source": [ - "The Andrews curves transform the multi-dimensional data into smooth periodic functions. Notice how Gentoo penguins form a distinct curve pattern that's clearly separated from the other two species, confirming their distinctiveness across multiple dimensions." + "The Andrews curves transform the multi-dimensional data into smooth periodic functions, plotted over the parameter t ranging from -π to +π. Notice how Gentoo penguins form a distinct curve pattern that's clearly separated from the other two species, confirming their distinctiveness across multiple dimensions." ] }, { @@ -443,6 +443,16 @@ ":::" ] }, + { + "cell_type": "markdown", + "id": "13d62c73-988b-4bac-b919-91696d6c97da", + "metadata": {}, + "source": [ + ":::{note}\n", + "Both parallel coordinates and Andrews curves require scaled/normalized data. Without scaling, variables with different ranges (e.g., 40-60mm vs 3000-6000g) would dominate the visualization and obscure patterns.\n", + ":::" + ] + }, { "cell_type": "markdown", "id": "be99e06a-88fb-4761-97c6-cc6854cba2bb", @@ -534,7 +544,7 @@ "id": "a8b1db2d", "metadata": {}, "source": [ - "The lag plot shows the relationship between stock prices and their values 30 days earlier. Points scattered widely from the diagonal indicate high volatility, while points close to the diagonal suggest more predictable, stable price movements." + "The lag plot shows the relationship between stock prices and their values 30 days earlier. A theoretical diagonal line represents perfect correlation where value(t) = value(t+lag). Points scattered widely from this diagonal indicate high volatility, while points close to the diagonal suggest more predictable, stable price movements with strong autocorrelation." ] }, {