Skip to content

Conversation

@shunt16
Copy link
Contributor

@shunt16 shunt16 commented Jan 8, 2025

Description

Review for initial release of UNC Specification website. Find on the site:

  • Initial specification draft (capable of creating web and PDF version) - main item for technical review!
  • Governance process for updates
  • Description of CoMet toolkit

Review

Based on comments in this PR, we are adding to an Issue Milestone - to be completed before we merge to main. Specific topics can be discussed in those issues.

Completion of the issues in this milestone is being managed in a corresponding Project.

@shunt16 shunt16 self-assigned this Jan 8, 2025
@shunt16
Copy link
Contributor Author

shunt16 commented Jan 9, 2025

Zhav Loizeau Initial Comments

I think there is an implicit choice about the word “observation” that is used to name what is called “measured values” in the GUM. Might be worth making explicit?

Goal

“Measurement datasets are becoming larger, more complex, …”

Maybe giving examples of how this complexity is characterised could be interesting (multi-modal, increasing number of “dimensions”, …)

Variables

“Should we allow uncertainty variables to be smaller that observation variables? i.e. that have a subset of the dimensions to save space where there are repeated values? (in practice, compression would reduce this as well…)”

Maybe one needs to distinguish “size in memory” and “conceptual size”. If an error covariance matrix is specified to be of “scalar” type, that is, each measured values are associated with an uncertainty with the same value Image, and the errors are independent, one only needs to store Image in memory. However, knowing that the measured value vector has size Image, and that the error covariance Image matrix has such a structure, they can still perform the required operations. For instance Image will return Image and Image will return Image for Image and Image.
So, I think one needs to specify what is a “complete representation” of an error-covariance matrix by the set of operations one should be able to perform with it. In doing so, one can define the “conceptual size” of the matrix. Probably the “conceptual size” should be stored as an attribute of the matrix anyway (it is fair to assume that user want a result to Image in Image time).

I hope that addresses the question?

Dimensions

Dimensions may be of any size, including 1.

Maybe “Strictly positive integer size”?

Data types

Can we permit different types?

In interferometric SAR, I think the data should be stored as complex numbers, no? Possibly relevant

Attributes

link observation variables with their associated uncertainty variables

Should there be a constraint that each uncertainty variable should be linked to a single observation variable? Also, should there be a “how” uncertainty variable is linked to observation variable? I know we mostly deal with additive errors but should there be some way to say if some error is multiplicative for example?

Units

uncertainty variables must have the same "units" as the observation variables they are associated with. If "units" is not defined, the uncertainty variable is assumed fractional

I am a bit unsure about this choice. In an error-covariance matrix, the values in the matrix are covariances and variances which have the unit of the measurand squared, while in an error-correlation matrix, the entries are unitless.

Uncertainty PDF shape

Why name the uniform distribution rectangular and not uniform?

What PDF shapes should we allow? Is there a list somewhere else we can refer to?

Wikipedia has a helpful selection (as always). I would say some “most have” are:

  • Exponential (for aging-free survival times, and generally positive-valued rv’s)
  • Binomial distribution (for discrete-valued rv’s)
  • Poisson distribution (for point processes)
  • Categorical distribution, and multinomial distribution for classification

I understand that you assume mean 0 and, in the uniform and gaussian case, all you need is the variance parameter stored in the uncertainty variable to completely characterise the distribution. What happens for families of distributions with more parameters?

Parameterisating Error-Correlation Matrices

I noticed a likely typo in the title (parameterisating instead of parameterising?)

Maybe a short paragraph describing how one obtains the error-covariance matrix from the error-correlation matrix and the uncertainty vector would be good. Actually having such a note early on to dis-ambiguate, and be clear about what should be stored in memory?

I cannot not comment on “random” form because I reaaaaally wish it were either “un-correlated”, “independent” (this is not quite true as one can define un-correlated random variables that are dependent e.g., Image), or “diagonal” (to describe the shape of the matrix).

Parameterisation based on Matrix structure

I know Zhav was unsuccessful in finding a standard set of matrix structures we can adhere to… but I’m interested in any similar suggestions to simplify this definition…

Still no standard in sight. As mentioned before, there is a Wikipedia page (why not).

@EmmaWoolliams
Copy link

I've combined comments from me and comments from Peter Harris in this single document - he's the yellow "pmh" and I'm the pink "erh" - you'll see I tried to write Emma early on, but it didn't default to that and I couldn't be bothered to change them all (erh is my user name as when I joined NPL I was Emma Hobbs).

You've done a great job to start this process - and it's something we need for ARIA. You've thought through a lot and there are no enormous concerns.

There are also things I realise I don't understand, so if some of my comments are unconstructive, treat them as my confusion and perhaps provide a bit more background in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

3 participants