Specification v0.1 Release #1

shunt16 · 2025-01-08T17:55:30Z

Description

Review for initial release of UNC Specification website. Find on the site:

Initial specification draft (capable of creating web and PDF version) - main item for technical review!
Governance process for updates
Description of CoMet toolkit

Review

Based on comments in this PR, we are adding to an Issue Milestone - to be completed before we merge to main. Specific topics can be discussed in those issues.

Completion of the issues in this milestone is being managed in a corresponding Project.

shunt16 · 2025-01-09T09:52:07Z

Zhav Loizeau Initial Comments

I think there is an implicit choice about the word “observation” that is used to name what is called “measured values” in the GUM. Might be worth making explicit?

Goal

“Measurement datasets are becoming larger, more complex, …”

Maybe giving examples of how this complexity is characterised could be interesting (multi-modal, increasing number of “dimensions”, …)

Variables

“Should we allow uncertainty variables to be smaller that observation variables? i.e. that have a subset of the dimensions to save space where there are repeated values? (in practice, compression would reduce this as well…)”

Maybe one needs to distinguish “size in memory” and “conceptual size”. If an error covariance matrix is specified to be of “scalar” type, that is, each measured values are associated with an uncertainty with the same value Image, and the errors are independent, one only needs to store Image in memory. However, knowing that the measured value vector has size Image, and that the error covariance Image matrix has such a structure, they can still perform the required operations. For instance Image will return Image and Image will return Image for Image and Image.
So, I think one needs to specify what is a “complete representation” of an error-covariance matrix by the set of operations one should be able to perform with it. In doing so, one can define the “conceptual size” of the matrix. Probably the “conceptual size” should be stored as an attribute of the matrix anyway (it is fair to assume that user want a result to Image in Image time).

I hope that addresses the question?

Dimensions

Dimensions may be of any size, including 1.

Maybe “Strictly positive integer size”?

Data types

Can we permit different types?

In interferometric SAR, I think the data should be stored as complex numbers, no? Possibly relevant

Attributes

link observation variables with their associated uncertainty variables

Should there be a constraint that each uncertainty variable should be linked to a single observation variable? Also, should there be a “how” uncertainty variable is linked to observation variable? I know we mostly deal with additive errors but should there be some way to say if some error is multiplicative for example?

Units

uncertainty variables must have the same "units" as the observation variables they are associated with. If "units" is not defined, the uncertainty variable is assumed fractional

I am a bit unsure about this choice. In an error-covariance matrix, the values in the matrix are covariances and variances which have the unit of the measurand squared, while in an error-correlation matrix, the entries are unitless.

Uncertainty PDF shape

Why name the uniform distribution rectangular and not uniform?

What PDF shapes should we allow? Is there a list somewhere else we can refer to?

Wikipedia has a helpful selection (as always). I would say some “most have” are:

Exponential (for aging-free survival times, and generally positive-valued rv’s)
Binomial distribution (for discrete-valued rv’s)
Poisson distribution (for point processes)
Categorical distribution, and multinomial distribution for classification

I understand that you assume mean 0 and, in the uniform and gaussian case, all you need is the variance parameter stored in the uncertainty variable to completely characterise the distribution. What happens for families of distributions with more parameters?

Parameterisating Error-Correlation Matrices

I noticed a likely typo in the title (parameterisating instead of parameterising?)

Maybe a short paragraph describing how one obtains the error-covariance matrix from the error-correlation matrix and the uncertainty vector would be good. Actually having such a note early on to dis-ambiguate, and be clear about what should be stored in memory?

I cannot not comment on “random” form because I reaaaaally wish it were either “un-correlated”, “independent” (this is not quite true as one can define un-correlated random variables that are dependent e.g., Image), or “diagonal” (to describe the shape of the matrix).

Parameterisation based on Matrix structure

I know Zhav was unsuccessful in finding a standard set of matrix structures we can adhere to… but I’m interested in any similar suggestions to simplify this definition…

Still no standard in sight. As mentioned before, there is a Wikipedia page (why not).

shunt16 · 2025-01-09T10:39:16Z

Have created a set of issues based on Zhav Loizeau's initial review above - discussion of those topics can continue there:

EmmaWoolliams · 2025-01-23T18:38:28Z

I've combined comments from me and comments from Peter Harris in this single document - he's the yellow "pmh" and I'm the pink "erh" - you'll see I tried to write Emma early on, but it didn't default to that and I couldn't be bothered to change them all (erh is my user name as when I joined NPL I was Emma Hobbs).

You've done a great job to start this process - and it's something we need for ARIA. You've thought through a lot and there are no enormous concerns.

There are also things I realise I don't understand, so if some of my comments are unconstructive, treat them as my confusion and perhaps provide a bit more background in.

shunt16 added 2 commits January 8, 2025 17:50

remove pre-release banner

fc76362

remove pre-release banner

64e153e

shunt16 self-assigned this Jan 8, 2025

shunt16 mentioned this pull request Jan 10, 2025

Add constraint that each uncertainty variable should be linked to a single observation variable #5

Open

shunt16 added this to the UNC Specification v0.1 Release milestone Jan 10, 2025

shunt16 requested a review from pdevis January 10, 2025 10:44

shunt16 mentioned this pull request Feb 4, 2025

Development plan for autocorrelation in georeferenced EO data comet-toolkit/obsarray#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Specification v0.1 Release #1

Specification v0.1 Release #1

Uh oh!

shunt16 commented Jan 8, 2025 •

edited

Loading

Uh oh!

shunt16 commented Jan 9, 2025

Uh oh!

shunt16 commented Jan 9, 2025

Uh oh!

EmmaWoolliams commented Jan 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Specification v0.1 Release #1

Are you sure you want to change the base?

Specification v0.1 Release #1

Uh oh!

Conversation

shunt16 commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Review

Uh oh!

shunt16 commented Jan 9, 2025

Zhav Loizeau Initial Comments

Goal

Variables

Dimensions

Data types

Attributes

Units

Uncertainty PDF shape

Parameterisating Error-Correlation Matrices

Parameterisation based on Matrix structure

Uh oh!

shunt16 commented Jan 9, 2025

Uh oh!

EmmaWoolliams commented Jan 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shunt16 commented Jan 8, 2025 •

edited

Loading