Skip to content

Exploration/Discussion: has-one Relationships between Dimensions #64

@aaronsteers

Description

@aaronsteers

Currently the BSL concept of "dimension" appears to be modeled after "attributes" - where a dimension is generally a single-column, but not a rich semantic definition.

My mental model for dimensions is heavily influenced by my experience with MicroStrategy and MicroSoft SSAS cubes. But regardless of the inspiration, in my mental model a the primitive of "dimension" is more complex and robust than what I think BSL currently uses:

1. Display name. A dimension has a "display name" or a way to render to users, and/or in a drop-down selector.
   E.g. User "AJ" has name "AJ Steers" or "AJ Steers (id: 101)" or similar. The name is most-often unique, but not always so, due to edge cases.
2. Internal ID. A dimension has a unique key, which may or may not be the same as its display name. The internal ID _must_ be unique amongst all elements in the dimention, even for edge cases.
   E.g. User "AJ" has id `101`.
4. Dimension Properties. A dimension can have one or more properties or attributes, which can be derived from the key, even if the key is not present in the query.
   E.g. User "AJ" has start date `Jan 3, 2025`, middle name `John`, current department `Engineering`, and role `Software Engineer`.

Proposal for BSL (v2 / v3?): has-one Relationships between Dimensions.

While I'd love to see a robust "dimension" construct in BSL, I have thought on this a lot and I think the same core objectives can be handled more simply, without necessarily rewriting the core "dimension" primitive.

Instead, we could define "has-one" relationships between dimensions. Such that, if you know one attribute/dimension value, you can deterministically get to the dimension's other properties without cross-multiplying the cost of your query complexity.

Knowing these relationships can improve queries and reduce join complexity. Aka, if I only have "user ID" in my fact table, I can aggregate at that layer and then without physical joins, I can layer on attributes of the user without adding query cost.

I can create these two reports at the same query cost:

User ID | Total Number of Logins
102     | 32
User ID | User Name | Cost Center | Total Number of Logins
102     | AJ        | 1032        | 32

Whereas without the additional semantic info, the second version requires 2 additional elements in my GROUP BY. And the more properties I want to display, the more "GROUP BY" operations I require.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions