Skip to content

Conversation

sethaxen
Copy link
Member

@sethaxen sethaxen commented Aug 20, 2023

This PR makes the following replacements everywhere:

  • hpd -> PosteriorStats.hdi (hpd retains its old behavior and is now deprecated)
  • summarize -> PosteriorStats.summarize
  • ChainDataFrame -> PosteriorStats.SummaryStats

Other changes:

  • overloads and exports PosteriorStats.eti
  • changes default interval probability to 0.89 (PosteriorStats's default)
  • changes plots to plot ETI by default instead of HDI (consistent with summarize's defaults), adding new API for setting interval probability and CI type
  • Julia lower bound bumped to v1.10 (PosteriorStats v3's Julia lower bound)

The replacement of ChainDataFrame and the slight change in API and behavior of the methods makes this a breaking change.

Implements #430

e.g.

julia> val = rand(500, 2, 3);

julia> chn
Chains MCMC chain (4000×4×2 Array{Float64, 3}):

Iterations        = 1:2:7999
Number of chains  = 2
Samples per chain = 4000
parameters        = param_1, param_2, param_3, param_4


Use `describe(chains)` for summary statistics and quantiles.


julia> describe(chn)
Chains MCMC chain (1000×8×4 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 4
Samples per chain = 1000
parameters        = a, b
internals         = c, d, e, f, g, h

Summary Statistics
     mean    std  eti89            ess_tail  ess_bulk  rhat  mcse_mean  mcse_std 
 a  0.496  0.285  0.0550 .. 0.946      3970      4088  1.00     0.0045    0.0021
 b  0.504  0.290  0.0551 .. 0.946      3972      4073  1.00     0.0045    0.0021

Quantiles
      2.5%  25.0%  50.0%  75.0%  97.5% 
 a  0.0223  0.258  0.492  0.732  0.976
 b  0.0240  0.253  0.504  0.759  0.976
julia> hdi(chn)
HDI
    hdi89            
 a  0.00178 .. 0.887
 b    0.112 .. 0.997

julia> eti(chn)
ETI
    eti89           
 a  0.0550 .. 0.946
 b  0.0551 .. 0.946

Closes #491

changerates, mvchangerate = changerate(chains)

# Summarize the results in a named tuple.
nt = (; zip(names_of_params, changerates)..., multivariate = mvchangerate)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lacking a parameter column meant the show method was broken. But since there is a changerate for every parameter, it makes more sense to do the same thing as gelmandiag_multivariate and return a SummaryStats for the marginal values and return the multivariate changerate separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Genuine question: what is the change-rate in this context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From inspecting the code, it's, for each parameter and chain, the fraction of draws that are different from the previous draw. I suppose it's similar to "acceptance rate."

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter v1.0.62

[JuliaFormatter v1.0.62] reported by reviewdog 🐶

function PosteriorStats.eti(chn::Chains; prob::Real=DEFAULT_CI_PROB, kwargs...)


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

function PosteriorStats.hdi(chn::Chains; prob::Real=DEFAULT_CI_PROB, kwargs...)


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

_prob_to_string(prob; digits=2) = replace(string(round(100 * prob; digits)), r"\.0+$" => "")


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

@deprecate hpd(chn::Chains; alpha::Real=0.05, kwargs...) hdi(chn; prob=1 - alpha, kwargs...)


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

kwargs...


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

name="Quantiles",


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

chains::Chains, funs...;


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

var_names=nothing,


[JuliaFormatter v1.0.62] reported by reviewdog 🐶


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

summarize(data, funs...; var_names=names_of_params, name, kwargs...)


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

summarize(z, funs...; var_names=names_of_params, name=name_chain, kwargs...)


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

@test setinfo(chn, NamedTuple{(:A, :B)}((1,2))).info == NamedTuple{(:A, :B)}((1,2))


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

tchain = Chains(rand(niter, nparams, nchains), ["a", "b", "c"], Dict(:internals => ["c"]))


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

@test eltype(discretediag(chn_disc[:,2:2,:])) <: SummaryStats


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

acor = autocor(c; append_chains=append_chains)


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

@test autocor(c) == autocor(c; append_chains=true)


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

@test MCMCChains.changerate(chn; append_chains = false) isa Vector{<:Tuple{SummaryStats,Float64}}


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

@test hpd(chn) == hdi(chn; prob=0.95)


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

for autocov_method in (AutocovMethod(), FFTAutocovMethod(), BDAAutocovMethod()), kind in (:bulk, :basic), f in (ess, ess_rhat, rhat)


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

permutedims(x, (1, 3, 2)); autocov_method = autocov_method, kind = kind,


[JuliaFormatter v1.0.62] reported by reviewdog 🐶


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

PermutedDimsArray(x, (1, 3, 2)); autocov_method = autocov_method, kind = kind,


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

return all(((x, y),) -> isapprox(x, y; atol=1e-2), Iterators.drop(zip(cdf1, cdf2), 1))


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

@testset "diagnostics missing tests" for i in 1:nchains


[JuliaFormatter v1.0.62] reported by reviewdog 🐶


[JuliaFormatter v1.0.62] reported by reviewdog 🐶


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

chns = Chains(
val,
parm_names,
Dict(:internals => ["c", "d", "e", "f", "g", "h"]),
)


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

parm_stats = summarize(chns, sections=[:parameters])


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

parm_array_stats = summarize(
PermutedDimsArray(val[:, 1:2, :], (1, 3, 2));
var_names =[:a, :b],
)


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

all_sections_array_stats = summarize(
PermutedDimsArray(val, (1, 3, 2)); var_names = Symbol.(parm_names),
)


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

three_parms_df = DataFrame(summarize(
chns[[:a, :b, :c]],
mean, std;
sections = [:parameters, :internals],
))


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

three_parms_df_2 = DataFrame(summarize(
chns[[:a, :b, :g]],
:mymean => mean,
:mystd => std;
sections = [:parameters, :internals],
))

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter v1.0.62

[JuliaFormatter v1.0.62] reported by reviewdog 🐶

three_parms_df = DataFrame(summarize(
chns[[:a, :b, :c]],
mean, std;
sections = [:parameters, :internals],
))


[JuliaFormatter v1.0.62] reported by reviewdog 🐶

three_parms_df_2 = DataFrame(summarize(
chns[[:a, :b, :g]],
:mymean => mean,
:mystd => std;
sections = [:parameters, :internals],
))

@test chn3.info == chn.info

@test all(MCMCChains.indiscretesupport(chn) .== [false, false, false, true])
@test setinfo(chn, NamedTuple{(:A, :B)}((1,2))).info == NamedTuple{(:A, :B)}((1,2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter v1.0.62] reported by reviewdog 🐶

Suggested change
@test setinfo(chn, NamedTuple{(:A, :B)}((1,2))).info == NamedTuple{(:A, :B)}((1,2))
@test setinfo(chn, NamedTuple{(:A, :B)}((1, 2))).info == NamedTuple{(:A, :B)}((1, 2))

end

@testset "function tests" begin
tchain = Chains(rand(niter, nparams, nchains), ["a", "b", "c"], Dict(:internals => ["c"]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter v1.0.62] reported by reviewdog 🐶

Suggested change
tchain = Chains(rand(niter, nparams, nchains), ["a", "b", "c"], Dict(:internals => ["c"]))
tchain =
Chains(rand(niter, nparams, nchains), ["a", "b", "c"], Dict(:internals => ["c"]))

lags = MCMCChains._default_lags(c, append_chains)
@test lags == filter!(x -> x < n, [1, 5, 10, 50])

acor = autocor(c; append_chains=append_chains)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter v1.0.62] reported by reviewdog 🐶

Suggested change
acor = autocor(c; append_chains=append_chains)
acor = autocor(c; append_chains = append_chains)

x = rand(10_000, 40, 10)
chain = Chains(x)

for autocov_method in (AutocovMethod(), FFTAutocovMethod(), BDAAutocovMethod()), kind in (:bulk, :basic), f in (ess, ess_rhat, rhat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter v1.0.62] reported by reviewdog 🐶

Suggested change
for autocov_method in (AutocovMethod(), FFTAutocovMethod(), BDAAutocovMethod()), kind in (:bulk, :basic), f in (ess, ess_rhat, rhat)
for autocov_method in (AutocovMethod(), FFTAutocovMethod(), BDAAutocovMethod()),
kind in (:bulk, :basic),
f in (ess, ess_rhat, rhat)


# analyze array
ess_array, rhat_array = ess_rhat(
permutedims(x, (1, 3, 2)); autocov_method = autocov_method, kind = kind,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter v1.0.62] reported by reviewdog 🐶

Suggested change
permutedims(x, (1, 3, 2)); autocov_method = autocov_method, kind = kind,
permutedims(x, (1, 3, 2));
autocov_method = autocov_method,
kind = kind,


# analyze array
mcse_array = mcse(
PermutedDimsArray(x, (1, 3, 2)); autocov_method = autocov_method, kind = kind,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter v1.0.62] reported by reviewdog 🐶

Suggested change
PermutedDimsArray(x, (1, 3, 2)); autocov_method = autocov_method, kind = kind,
PermutedDimsArray(x, (1, 3, 2));
autocov_method = autocov_method,
kind = kind,

rf_1 = rafterydiag(chn)
rf_2 = rafterydiag(chn_m)

@testset "diagnostics missing tests" for i in 1:nchains
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter v1.0.62] reported by reviewdog 🐶

Suggested change
@testset "diagnostics missing tests" for i in 1:nchains
@testset "diagnostics missing tests" for i = 1:nchains

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@sethaxen sethaxen marked this pull request as draft September 15, 2025 12:55
- `ci_fun` (default: `eti`): The function used to compute the credible intervals.
(Can be [`eti`](@ref) or [`hdi`](@ref))
- `ci_probs` (default: `[$DEFAULT_CI_PROB, 0.8]`): The probability mass(es) of the credible
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are now quite close; perhaps the 0.8 should be decreased. For reference, ArviZ uses [0.89, 0.5] as the default probs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, happy to change it to 0.5.

@sethaxen sethaxen marked this pull request as ready for review September 22, 2025 22:47
@sethaxen
Copy link
Member Author

This is now up-to-date with PosteriorStats and ready for review. The remaining formatting suggestions seem to all be in lines not touched by this PR but close to lines touched by this PR.

Note that SummaryStats's only current documented interface is its Tables interface. However, because it's both a Table and a Table sink (like DataFrame), instead of the old DataFrames-inspired behavior, one can easily work with a SummaryStats by converting it to a DataFrame, manipulating it as desired, and then converting back to a SummaryStats.

@yebai yebai requested review from penelopeysm and removed request for cpfiffer September 23, 2025 07:43
Copy link
Member

@penelopeysm penelopeysm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @sethaxen!

I'm probably not going to review line by line, but I can tell that the tests have largely been preserved, so I'm going to trust that all the minutiae have been ironed out as part of making CI pass and focus on broader things.

My first question (not sure if there are more) is to do with the SummaryStats struct. What's the best way to index into it? Is there a way to use the label :a here?

julia> ss = mean(Chains(ones(3, 2), [:a, :b]))
Mean
    mean
 a  1.00
 b  1.00

julia> ss[:mean][1]
1.0

@sethaxen
Copy link
Member Author

My first question (not sure if there are more) is to do with the SummaryStats struct. What's the best way to index into it? Is there a way to use the label :a here?

As documented here it can currently be treated like an OrderedDict, but this is not part of the official API. Only the Tables interface is. That's because I haven't decided on the API I want to support. For that I need user feedback on how they'd most like to use it (e.g. do they most want to select columns first or rows first, or both simultaneously? Will they want to select a subset of labels/columns by name?) I don't want to recreate DataFrame's entire indexing API if I don't need to. In the meantime, there's the OrderedDict-like interface for only interactive use, as well as the ability to interconvert between it and a DataFrame for more complicated sub-selection.

When I eventually add an official indexing API, it won't require a breaking release here or downstream.

@penelopeysm
Copy link
Member

penelopeysm commented Sep 26, 2025

Thanks for the explanation!

Just the disclaimer to begin, I have absolutely no intention of reviewing this unfairly, but I'm also cognisant that there's some level of conflict of interest because I'm also separately working on FlexiChains (and trying to solve all these same tricky decisions with APIs!), and I can't prove that I'm fully disinterested. Feel free to say if you'd rather someone else review, I won't be offended in the least.

Whilst I'm very happy with the general aim of centralising functionality and reusing code, I think as it stands it's a loss in terms of usability if it's not possible to use the parameter name somehow. IMO, I should be able to combine the three things ss, :mean, and :a in order to get the mean of :a stored in ss. ChainDataFrame does this:

julia> using MCMCChains; ss = mean(Chains(rand(3, 2), [:a, :b]))
Mean
  parameters      mean
      Symbol   Float64

           a    0.6386
           b    0.5754


julia> ss[:a, :mean]
0.6385968125591274

For SummaryStats right now I've only come up with this, which is a bit verbose:

julia> using MCMCChains; ss = mean(Chains(rand(3, 2), [:a, :b]))
Mean
     mean
 a  0.460
 b  0.647

julia> ss[:mean][findfirst(isequal(:a), ss[:label])]
0.46036692554959524

Once there is an interface like that, I'd be very happy to approve this PR. I'm not particularly wedded to the interface being getindex or ss[:a, :mean], or row-based indexing in general. For example if you had ss[:mean] return a NamedTuple of (a = 0.460, b = 0.647) then we could do ss[:mean].a or ss[:mean][:a], and I'd be fine with that too (as long as it's documented -- which brings us to the next paragraph).

Regarding documentation in general, I just wanted to explain my general stance. I do indeed see that you're documenting things in PosteriorStats.jl (which is great!) but if we bring PosteriorStats into MCMCChains, we also need to worry about it in MCMCChains and (perhaps more importantly) the main Turing docs because this is very user-facing. Currently the interface of MCMCChains is quite underdocumented (it essentially amounts to 'learn by reading the examples in the docs'). This existing situation is of course not your fault at all, but I'd like to make sure that when we add new functionality, someone also ensures that it's explained in the main Turing docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants