Should Morel's collection types indicate that a collection is sorted, unique, or has repeatable iteration order? #232
Replies: 2 comments
-
|
Let's extend the proposal to address the remarks:
We propose that there is an
but its value is not constant. Its value changes from one row to the next, and you will get a compilation error if you access it from an invalid location (such as the
returns 6 as you would expect. And a record can have an To implement To implement If Queries on unordered lists (
Zip join can be expressed by assigning The compiler should be smart enough to use an O(n) join algorithm, similar to ListPair.zip. |
Beta Was this translation helpful? Give feedback.
-
|
Most of the ideas in this discussion have been implemented in Morel 0.7:
I also wrote an article Ordered and unordered data. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Should Morel's collection types indicate that a collection is sorted, unique, or has repeatable iteration order? The relational model and functional programming languages both have collection types, but there is a mismatch. Relations are unordered (unless a query has an
ORDER BYclause), programming language lists are ordered. Uniqueness is complicated (relations are unique in Datalog but not in SQL, and if you choose a unique collection such as Java'sHashSetorTreeSetit has implications for iteration order).We propose to add a
multisettype to Morel, a new clauseunordered. We do not propose to deal with uniqueness or sort order at this time.Specifically:
multiset, an unordered equivalent oflist.fromexpression and ajoinclause can iterate over bothlistandmultisetcollections.fromexpression is alistif it ends withorderfollowed by zero or moreyieldorwhereclauses, or if it reads from alistfollowed by by zero or moreyieldorwhereclauses;multisetotherwise.unordered.Remarks:
list,multisethas alengthmethod that returns the number of elements. You should not assume that its running time is O(1).multisetdoes not have thenth (l, i),take (l, i)ordrop (l, i)methods oflistbecause theith element is not definedmultiset. (Nullable columns becomeoptional, since Morel has no null value.)orderclause returns an ordered collection even if the sort-keys are not exhaustive. In particular,orderwith no arguments returns a list whose order is undefined but whose order will be the same each time the list is read.orderis stable. We will probably declare that it is not stable.listdoes not make it deterministic. If you run it several times, the order of items in thelistmay change.SetandSortedSet, which do precisely that.) It is important to know if a collection's elements are unique or sorted, the key(s) or sort key of a collection, and if there are functional or referential dependencies, but this should be done using constraints rather than by the type system.UNNEST WITH ORDINAL.)Examples of ordered and unordered queries.
Rationale
Let's review collection types in various languages.
Standard ML
Standard ML's main collection type is
list. A list's iteration order is defined and is deterministic. If you add elements in a certain order, iterations will always appear in the same order. It's as if each item had a zero-based ordinal; it doesn't, but there is more information in a list than an unsorted collection containing the same elements.A list can contain duplicate elements.
Like all collections we will be discussing, lists are finite and immutable. Size can be determined in constant time.
There is also
vector, which is similar tolistbut whosesub (v, i)function allows constant time access to theith element.Java
Java has the following collection types. All are finite and have a constant-time
sizeoperation. Each has mutable and immutable implementations.Listhas a defined order and may contain duplicates.Setdoes not have a defined order and may not contain duplicates.SortedSethas a deterministic order (determined by its comparator) but the creator of theSortedSetcannot dictate the order.Relational
In the academic relational model (and Datalog), relations do not contain duplicates, are finite and immutable, have no deterministic iteration order.
Size is not assumed to be constant-time (but that's an implementation concern, and there may be a materialized view or other data structure that computes size quickly).
Because iteration is not deterministic, there is no operation to get the
ith element.The industrial relational model (used by SQL) is the same as the academic relational model, except that relations may contain duplicates.
Comparing the approaches
In the relational model, not everything we know about the data is encoded as part of the type. For example, given the relation employees (empno, ename, deptno, dname) we may know that
empnois primary key, and thatdeptnofunctionally determinesdname. We know that this relation is unique (because of the primary key) but iteration order is still nondeterministic.In the relational model (including in the SQL standard), sorting is generally performed last, just before outputting to a screen. Iteration order remains unknown until that point. (There is one important exception: the
ARRAYcolumn type has a determined order.)In our proposal, a Morel query returns an ordered collection in similar circumstances to a relational query. A SQL query with an
ORDER BYclause is sorted, and a Morel query ending withorderreturns a list.Relational systems make much use of constraints, and constraints are separate from schema (type system). Likewise, we intend to add constraints to Morel, and these will be used to represent sort-order, uniqueness and functional dependencies, and intra-row constraints (e.g.
job = 'SALES' orelse commission = 0). We intend to allow constraints to be explicitly declared, but will also be deduced (e.g. uniqueness following agroupclause, or intra-row constraints following awhereclause), may be checked at run time (using something likeassert), and the programmer can check that they can be deduced (using aprovekeyword that is analogous toassertbut works at compile time).Beta Was this translation helpful? Give feedback.
All reactions