feat: reading PivotTable (PivotCache) #559

siqpush · 2025-09-18T22:35:43Z

Feature to read all data available to a pivot table.

The data supporting a pivot table is referred to as the pivotCache. And I like to consider this feature "Calamine for your Cache".

An example use case would be auditing filtered content in an externally sourced pivot table.

Pivot Table's require both xl/pivotCache/PivotCacheDefinitions and xl/pivotCache/PivotCacheRecords files. The definitions file has relevant metadata as well as shared items. While the records file has values - rows are delimited with the <r> tag. <x> indicates only their position in the Definitions file is given (sample below from tests/pivots.xlsx)

xl/pivotCache/PivotCacheRecords1.xml

<r>
    <n v="10"/>
    <s v="j"/>

    <!-- use data from PivotCacheDefinitions1.xml's 3rd 'CacheField Tag'  -->
    <x v="1"/>

    <x v="3"/>
    <n v="1.20452"/>
    <x v="5"/>
    <n v="4.1510311161292464"/>
    <b v="0"/>
    <m/>
    <s v="blue"/>
  </r>

xl/pivotCache/PivotCacheDefinitions1.xml

  <cacheFields count="10">
    <cacheField name="Id" numFmtId="0">
      <sharedItems containsSemiMixedTypes="0" containsString="0" containsNumber="1" containsInteger="1" minValue="1" maxValue="10"/>
    </cacheField>
    <cacheField name="Name" numFmtId="0">
      <sharedItems/>
    </cacheField>

    <!-- Corresponding lookup value <x v="1"/> above refers to <s v="blue"/> -->
    <cacheField name="Category" numFmtId="0">
      <sharedItems count="2">
        <s v="blue"/>
        <s v="yellow"/>
      </sharedItems>
    </cacheField>

    <cacheField name="Value" numFmtId="0">
      <sharedItems containsSemiMixedTypes="0" containsString="0" containsNumber="1" containsInteger="1" minValue="5" maxValue="20" count="4">
        <n v="10"/>
        <n v="20"/>
        <n v="15"/>
        <n v="5"/>
      </sharedItems>

Example



    let mut wb: Xlsx<_> = wb("pivots.xlsx");
    for result in wb.get_pivot_data_by_name_ref("PivotTable1").unwrap() {
        println!("{:?}", result);
    }

    /*
     prints the following:
     
    [String("Id"), String("Name"), String("Category"), String("Value"), String("Size"), String("Date"), String("Value / Size"), String("IsBlue"), String("Null"), String("Misc")]
    [Int(1), String("a"), String("blue"), Int(10), Float(1.78), DateTimeIso("2024-11-01T00:00:00"), Float(5.617977528089887), Bool(true), Empty, Empty]
    [Int(2), String("b"), String("blue"), Int(20), Float(2.012), DateTimeIso("2024-01-04T00:00:00"), Float(9.940357852882704), Bool(true), Empty, Float(2.012)]
...
*/

This may be determined to go outside the scope Calamine - but if it fits then it will need to be applied to other workbook formats (only .xlsx currently) and worked on in a few places like error handling.

jmcnamara · 2025-09-18T23:38:36Z

Overall it looks okay. However there are a number of issues to fix before review:

Rebase to master on tafia/calamine. There are some fixes on master relative to your clone.
Probably best to move the changes to a branch on your repo for easier PRs.
Turn on the CI on your branch and fix any issues before resubmitting the PR. This PR fails in the calamine CI.
Run cargo fmt on the code.
Fix any cargo clippy issues.
Fix the warnings from cargo build -F pivot-cache.
Comment style should be proper sentence case with period at the end.
Don't use /// comments for non public comments. Use //.
Explain why fn xml_reader() is being made public. Use pub(crate) if necessary or don't make it public if it isn't necessary.
Probably best to upgrade to Rust v1.89.0 or 1.90.0, if not already using them. This will give you the latest clippy at least.
Avoid making whitespace changes like removing blank lines unless there is a valid reason.
Rebase your local changes, to fix the above issues, into a single commit.

siqpush · 2025-09-20T22:21:38Z

@jmcnamara I may have accidentally resubmitted the PR. I took your advice on moving to a branch on my own repo but didn't realize it would automatically resubmit once I rebased it to my master. If that was the case I still have a little cleanup left removing the unnecessary comments.

Also, do you plan to have this released for just .xlsx workbooks then have the remaining prd seperately?

jmcnamara · 2025-09-20T23:22:14Z

but didn't realize it would automatically resubmit once I rebased it to my master. If that was the case I still have a little cleanup left removing the unnecessary comments.

That is fine, and normal. People often submit the PR as "draft" (it is an option in the initial GitHub dialog or you can do it in git) while they are iterating and then move it to full once it is ready to merge.

Also, do you plan to have this released for just .xlsx workbooks then have the remaining prd seperately?

I would think it would be too hard to do this for xls. It may be possible to do it for xlsb and I don't know about ods. So I think it is probably ok to work it out for just xlsx in this PR.

Also, I will run the AI code review on this later. Use the usual amount of judgement in relation to the suggestions.

Copilot

Pull Request Overview

This PR adds pivot table data reading functionality to the Calamine library, specifically for XLSX files. The feature allows users to extract the underlying data (pivot cache) that supports pivot tables, which can be useful for auditing filtered content in externally sourced pivot tables.

Key changes:

Adds a new pivot-cache feature flag to enable pivot table functionality
Implements pivot table metadata parsing and data extraction from XLSX files
Provides public API methods for accessing pivot table names and data

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`Cargo.toml`	Adds the new `pivot-cache` feature flag
`src/lib.rs`	Includes the new pivot module when the feature is enabled
`src/pivot.rs`	Core pivot table data structures and parsing utilities
`src/xlsx/mod.rs`	Main implementation of pivot table reading functionality for XLSX files
`src/auto.rs`	Commented placeholder for future pivot table support in auto-detection
`src/ods.rs`	Commented placeholder for future ODS pivot table support
`src/xls.rs`	Commented placeholder for future XLS pivot table support
`src/xlsb.rs`	Commented placeholder for future XLSB pivot table support
`tests/test.rs`	Comprehensive test cases for the new pivot table functionality

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/pivot.rs

src/xlsx/mod.rs

src/pivot.rs

Cargo.toml

src/pivot.rs

src/xlsx/mod.rs

siqpush · 2025-09-23T15:49:46Z

@jmcnamara thank you for the quick feedback and shared git knowledge. Since this branch is now a draft I'll keep the same approach. Also, the copilot suggestions were actually decent for identifying places I intended to come back to (unwraps and such).

@sftse thanks as well for feedback. I took your approach in the many of the suggestions (those with the thumbs up) and addressed comments / questions for the others.

sftse

Think most uses of ref here can be removed.

src/pivot.rs

src/xlsx/mod.rs

siqpush · 2025-09-25T11:28:46Z

@jmcnamara I believe the failing check for stable appears to be due to the addition of clippy in 458b8ca. The image installs the toolchain with --profile minimal. This setting does not include clippy by default. See line 48 in screen shot which would conflict.

jmcnamara · 2025-09-25T12:55:45Z

@siqpush Thanks. I'll fix that.

I wonder why it was working previously.

jmcnamara · 2025-09-25T13:43:38Z

@siqpush Thanks it is fixed on master. You will need to do a fetch and rebase to pick up the changes.

siqpush · 2025-09-25T15:44:13Z

@siqpush Thanks it is fixed on master. You will need to do a fetch and rebase to pick up the changes.

Using github "sync" your code was merged not rebased onto my branch. I saw your note too late - my local branch was merge squashed by then. Tried the rebase follow up but it was useless. @jmcnamara

jmcnamara · 2025-09-25T16:16:37Z

Tried the rebase follow up but it was useless.

Don't worry about it we can sort it out at merge.

src/xlsx/mod.rs

jmcnamara · 2025-10-15T10:22:39Z

@siqpush Could you "Resolve" the comments that have been already addressed to make the review cleaner. Thanks.

jmcnamara · 2025-10-15T18:34:48Z

I fixed the failing typo check on master. If you rebase to that it will fix the failing CI check.

jmcnamara · 2025-11-18T08:39:57Z

@siqpush I would like to merge this for the next release but there is a conflict due to recent changes. Could you fix this?

sftse · 2025-11-18T18:02:50Z

@jmcnamara I don't think we should merge this in its current state. The commit history is very confusing (empty rebase commits, merge commits inside this PR) and I think the code could still use some improvements.
I can make another review.

jmcnamara · 2025-11-18T18:44:07Z

The commit history is very confusing (empty rebase commits, merge commits inside this PR

@sftse I could probably deal with that but if it still needs reviewing I will pause the merge.

@siqpush Could you try rebase this down into a single commit that is up to date with main. Or start a clean branch and a new PR.

siqpush · 2025-11-18T18:46:14Z

@sftse @jmcnamara does commit 0b7a884 better present the fix for you? On my side it shows i want to merge both my commits and others into master (very confusing unless i drill into the commit link).

Feels like a new PR may be the way...

jmcnamara · 2025-11-18T21:13:34Z

does commit 0b7a884 better present the fix for you

Yes. The looks clean, from the point of view of a review but it seems to be detached from a branch.

sftse · 2025-11-19T09:51:40Z

It may be easier to visualize what is going on with this PR. git log --all --graph may help with orientation for what to merge into what and what a clear history should look like.

Here a more compact rendering.

As a rule of thumb, do not merge master into other branches, as this makes for a confusing history. Github is confused by it as well, as you can see by how it chooses to render the commits belonging to this PR, it treats the commits that were merged from master into your branch (labeled pr-559 in the rendered graph) as belonging to this PR, even though they do not contribute to the final diff or understanding of what has changed. At that point, the usefulness of the individual commits is lower than could have been and it may be clearer to just squash all of them into a single one.

Of note as well, it seems you rebased your commits beginning at 175f and then decided to merge the rebased branch into the old one, this duplicates a large amount of commits, see 175f and aff6.

tests/test.rs

src/xlsx/mod.rs

src/lib.rs

jmcnamara · 2025-11-25T13:39:56Z

This eliminates the problem of having to add a feature flag to initialize the pivot tables deep within Xlsx::new and the additional costs for every caller, even the ones that don't need the feature.

I think we should only use feature flags for features that require an additional dependency like "chrono". I think all other features like "pictures" should be initialised when the user calls them. (This isn't 100% simple in some cases since it could require a second parsing of parts of a file but in general it should be achievable).

src/xlsx/mod.rs

Co-authored-by: sftse <c@farsight.net>

* adding pivotref vec wrapper for public facing * removing pivot cache mod * tag enumeration * misc design changes --------- Co-authored-by: GitHub Actions <actions@github.com>

Co-authored-by: GitHub Actions <actions@github.com>

src/xlsx/mod.rs

sftse

Please look proactively for similar code patterns to the ones we highlight and consider how additional commits can help reviewers understand the changes.

Git diff relies on heuristics that are easily trashed by big diffs, so I'd overindex on small diffs and rather too many commits than too few.

src/xlsx/mod.rs

sftse · 2025-11-29T12:12:36Z

src/xlsx/mod.rs

+    Ok(pivots_on_sheet)
+}


This is my pattern-matching kicking in, is it correct that multiple pivots per sheet are permitted? Other functions error when more than one candidate item is found in the zip.

If you recheck, can you add a reference to the spec?

The link / screenshot to the reference is below. Originally this was just something I noticed during a few random tests. The test referenced in this comment #559 (comment) was to address.

https://web.mit.edu/~stevenj/www/ECMA-376-new-merged.pdf

See section 12.3.11 (page 78 in the spec, or 90 in pdf pages) otherwise screenshot below:

src/xlsx/mod.rs

sftse

Thanks for bringing this PR across the finish line, nearly there!

src/xlsx/mod.rs

sftse · 2025-11-29T21:59:24Z

src/xlsx/mod.rs

+    pub fn get_pivot_tables_by_name_and_sheet(&self) -> Vec<(String, String)> {
+        self.0
+            .iter()
+            .map(|pt| (pt.sheet().to_string(), pt.name().to_string()))
+            .collect()
+    }


Some of the proposed API is hard to judge as-is, can we have a complete example how to use this in practice?

As an alternative, would it make sense to expose fn iter(&self) -> impl Iterator<Item = &'_ PivotTableRef> and let the caller use the API on PivotTableRef to call pt.name() and pt.sheet() as needed? This function just changes the representation of the data, which is not in itself a reason to exclude it, but would need more justification.

I think it felt strange because I exposed functionality to PivotTables and even PivotTableRef that felt like they should be able to get data on their own. In turn this made functions like the above feel extra.

As for the alternative, my gut thinking is that because we also have get_pivot_tables_by_sheet, which would then also need be removed. But if we give the filter map to the user, then at best case they are left having to do some extra doc reading on some of the uniqueness subtleties.

As for a concrete example, I imagine for auditing data in workbook that might be sensitive to expose to the wrong user / client / ..etc. Using the relevant workbook from tests, this is how I would avoid leaking the top secret details of Category Blue:

let mut workbook: Xlsx<_> = open_workbook(path)?; let pivot_tables = workbook.read_pivot_table_metadata()?; for (sheet, pt) in pivot_tables.get_pivot_tables_by_name_and_sheet() { let mut check_col = 0; for (row_number, row) in workbook.pivot_table_data(&pivot_tables, sheet, pt)?.enumerate() { // header is the first row if row_number == 0 { for (col_number, field) in row.iter().enumerate() { if field.eq(&crate::calamine::Data::String("Category".to_string())) { check_col = col_number } } } else if row[check_col].eq(&crate::calamine::Data::String("blue".to_string())) { panic!("Blue should not be included in this report.") } } }```

Review Cleanup

Aligning worksheet - pivot table hierarchy

sftse

Havent forgotten about this PR, sorry about the delays.
I'd have to find time to use the current API to endorse it, but the internal details are getting close to being right.

src/xlsx/mod.rs

Rename + Error Handling

siqpush · 2025-12-10T23:44:07Z

Havent forgotten about this PR, sorry about the delays. I'd have to find time to use the current API to endorse it, but the internal details are getting close to being right.

Dont be. I was happy you got a healthy break from reviewing my mess.

Let me know what you think once you get a moment!

import not needed

jmcnamara self-assigned this Sep 18, 2025

jmcnamara added enhancement awaiting user changes Awaiting changes to a PR to fix requested changes or CI issues. xlsx labels Sep 18, 2025

jmcnamara requested a review from Copilot September 20, 2025 23:22

Copilot AI reviewed Sep 20, 2025

View reviewed changes

sftse reviewed Sep 23, 2025

View reviewed changes

jmcnamara marked this pull request as draft September 23, 2025 10:51

sftse reviewed Sep 24, 2025

View reviewed changes

siqpush marked this pull request as ready for review September 27, 2025 20:26

sftse reviewed Oct 14, 2025

View reviewed changes

src/xlsx/mod.rs Outdated Show resolved Hide resolved

siqpush force-pushed the master branch from 4d07248 to aaaa1f2 Compare October 15, 2025 01:23

sftse mentioned this pull request Nov 11, 2025

refactor: dont need ref and ref mut #580

Merged

sftse reviewed Nov 25, 2025

View reviewed changes

tests/test.rs Show resolved Hide resolved

src/xlsx/mod.rs Outdated Show resolved Hide resolved

jmcnamara reviewed Nov 25, 2025

View reviewed changes

src/xlsx/mod.rs Outdated Show resolved Hide resolved

jmcnamara reviewed Nov 25, 2025

View reviewed changes

src/xlsx/mod.rs Outdated Show resolved Hide resolved

jmcnamara reviewed Nov 25, 2025

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

sftse reviewed Nov 25, 2025

View reviewed changes

src/xlsx/mod.rs Outdated Show resolved Hide resolved

src/xlsx/mod.rs Outdated Show resolved Hide resolved

siqpush and others added 5 commits November 25, 2025 21:10

Apply suggestions from code review

90cf125

Co-authored-by: sftse <c@farsight.net>

Apply suggestions from code review

f559e87

Co-authored-by: sftse <c@farsight.net>

Implementing suggestions that required more involvement. (#11)

1570575

* adding pivotref vec wrapper for public facing * removing pivot cache mod * tag enumeration * misc design changes --------- Co-authored-by: GitHub Actions <actions@github.com>

removing added spacing (#12)

c092c0e

Co-authored-by: GitHub Actions <actions@github.com>

Error handling for cacheField tag (#13)

c3f4236

Co-authored-by: GitHub Actions <actions@github.com>

sftse reviewed Nov 29, 2025

View reviewed changes

src/xlsx/mod.rs Outdated Show resolved Hide resolved

sftse reviewed Nov 29, 2025

View reviewed changes

actions-user and others added 9 commits November 30, 2025 06:52

stronger validation when finding attributes of a pivot table

b7e4c20

code block no longer necessary

c390ce5

removing pub where no longer needed

6112834

map_err can be replaced with ?

a33dd97

eliminating is_some / unwrap in match

c241186

default derived

1ae1490

Merge pull request #14 from siqpush/pr559

face224

Review Cleanup

aligning heirarchy to be worksheet name then pivot table name

755bbfd

Merge pull request #15 from siqpush/pr559

98f5758

Aligning worksheet - pivot table hierarchy

sftse reviewed Dec 9, 2025

View reviewed changes

src/xlsx/mod.rs Outdated Show resolved Hide resolved

src/xlsx/mod.rs Outdated Show resolved Hide resolved

src/xlsx/mod.rs Outdated Show resolved Hide resolved

actions-user and others added 3 commits December 10, 2025 13:19

renaming pivot_table_metadata -> pivot_tables

7cd1ab1

iterator error handling

0931584

Merge pull request #16 from siqpush/pr559

1c08510

Rename + Error Handling

actions-user and others added 2 commits December 13, 2025 08:58

import not needed

a5fee99

Merge pull request #17 from siqpush/pr559

c859eca

import not needed

feat: reading PivotTable (PivotCache) #559

Are you sure you want to change the base?

feat: reading PivotTable (PivotCache) #559

Uh oh!

Conversation

siqpush commented Sep 18, 2025

Uh oh!

jmcnamara commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

siqpush commented Sep 20, 2025

Uh oh!

jmcnamara commented Sep 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

siqpush commented Sep 23, 2025

Uh oh!

sftse left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

siqpush commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmcnamara commented Sep 25, 2025

Uh oh!

jmcnamara commented Sep 25, 2025

Uh oh!

siqpush commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmcnamara commented Sep 25, 2025

Uh oh!

Uh oh!

jmcnamara commented Oct 15, 2025

Uh oh!

jmcnamara commented Oct 15, 2025

Uh oh!

jmcnamara commented Nov 18, 2025

Uh oh!

sftse commented Nov 18, 2025

Uh oh!

jmcnamara commented Nov 18, 2025

Uh oh!

siqpush commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmcnamara commented Nov 18, 2025

jmcnamara commented Sep 18, 2025 •

edited

Loading

siqpush commented Sep 25, 2025 •

edited

Loading

siqpush commented Sep 25, 2025 •

edited

Loading

siqpush commented Nov 18, 2025 •

edited

Loading

siqpush Nov 30, 2025 •

edited

Loading