Chunking for Bayes: looking for feedback #1950
                  
                    
                      mike-lawrence
                    
                  
                
                  started this conversation in
                General
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
When doing Bayesian inference we typically have a model with a number of array-representable variables, ex:
and inference yields "samples" of the model variables akin to that produced by the following pseudo-code:
I'm currently conflicted on how best to specify chunking for this scenario. The data are generated such that iteration is slowest over those two last/new dimensions, which might suggest chunking using a low chunk size for those dimensions, yet the typical computations that are performed on the results will typically be of the kind where compute is done across samples, and possibly across both chains and samples, i.e.:
So designing chunking to accommodate the way the data fill the arrays will end up yielding poor performance for the likely later read patterns. Yet chunking that seeks to optimize for the likely later read patterns will mean a lot of waiting for enough samples to accrue between write operations, which isn't a huge deal, but impedes aspirations I have for doing compute on intermediate results.
Seem like an unavoidable trade-off?
Beta Was this translation helpful? Give feedback.
All reactions