feat: support DDL statements in SqlToSubstrait#432
feat: support DDL statements in SqlToSubstrait#432vbarua merged 18 commits intosubstrait-io:mainfrom
Conversation
daf7120 to
8f00f9e
Compare
isthmus/src/main/java/io/substrait/isthmus/expression/DdlRelBuilder.java
Outdated
Show resolved
Hide resolved
isthmus/src/main/java/io/substrait/isthmus/expression/DdlRelBuilder.java
Outdated
Show resolved
Hide resolved
isthmus/src/main/java/io/substrait/isthmus/expression/DdlRelBuilder.java
Outdated
Show resolved
Hide resolved
isthmus/src/test/java/io/substrait/isthmus/DdlRoundtripTest.java
Outdated
Show resolved
Hide resolved
8f00f9e to
a586031
Compare
isthmus/src/test/java/io/substrait/isthmus/DdlRoundtripTest.java
Outdated
Show resolved
Hide resolved
a586031 to
32459c4
Compare
46bc270 to
108bb43
Compare
|
@vbarua can you also take a look at this. I want to ensure the further refactoring to isthmus is somewhat aligned with these changes. |
| import org.apache.calcite.sql.util.SqlBasicVisitor; | ||
| import org.apache.calcite.sql2rel.SqlToRelConverter; | ||
|
|
||
| public class DdlRelBuilder extends SqlBasicVisitor<Plan.Root> { |
There was a problem hiding this comment.
The one thing that might be at odds with some of the work I've been doing on the API is that this returns a Plan.Root instead of a RelRoot or RelNode.
What I've been trying to do is modularize the API along two lines, roughly:
SQL -> Calcite RelRootCalcite RelRoot -> Isthmus Plan
The aim of this is to enable the workflows of users who are starting from SQL, and users who are starting directly with Calcite RelNodse, but share that majority of code for them. With the API modularized like this, SqlToSubstrait can effectively become the composition of these 2 conversion APIs. However, because what you're proposing here is effectively going from SQL -> Isthmus POJO directly, it makes that a bit trickier.
Aside from that goal though, a better argument for changing this is that this approach only works when going from SQL to POJO. If we assume that we want to be able to read in Substrait plans that create tables, then we'll need to have a first class entity in Calcite to model this. The built-in LogicalTableModify doesn't work for this, but we can add our own TableCreate relation, along with a LogicalTableCreate. That can be used when going from Sql to Calcit, from Calcite to Isthmus, and also from Isthmus back to Calcite.
Does that make sense?
There was a problem hiding this comment.
I think that the overall approach makes sense. The question is how to design this properly on the calcite's side.
I see the following options:
-
Use separate class for DDL relations.
It means we can- add the DdlRel class
- implement required methods for visitors (not sure what's better here - introduce a new abstract visitor class for ddl or extend the RelVisitor)
- add the DdlPlanRoot class (with the semantic similar to the RelRoot)
Probably the DdlPlanRoot class should have the same base class with the RelRoot class for uniform handling in Sql->Calcite conversion.
-
Extend the RelNode class to support DDL. This simplifies the required changes, but I guess it diverges from Calcite conventions and add unnecessary complexity to RelNode
Please let me know your thoughts.
There was a problem hiding this comment.
1 makes the most sense to me. I think CREATE is the only operation that we have in Substrait that's not supported as a Calcite relation. I would re-use Calcite's LogicalModify relation for INSERT, DELETE and UPDATE like we are already doing, and the create a relation specifically for CREATE.
implement required methods for visitors
We should be able to add a visit method for our new relation in the Isthmu sRelNodeVisitor utility. The Calcite RelShuttle also has a
RelNode visit(RelNode other);method for cases like this if we need that.
Would we need a custom RelRoot class, or could we just use the existing RelRoot with the SqlKind set to CREATE_TABLE or OTHER_DDL
There was a problem hiding this comment.
To further elucidate, this conversion would slot into here roughly: https://github.com/substrait-io/substrait-java/pull/474/files#diff-9977fe9108f68631bd554929d8e12242611c49b22534be2b701494ac133aa874R117-R140
There was a problem hiding this comment.
I guess making changes in calcite will take time, what should we do with this PR?
- Merge now as temporary solution
- Hold it off until those changes will be made approved and released
- Wait until feat(isthmus): introduce SubstraitSqlToCalcite and SubstraitStatementParser #474 will be merged and adjust these changes accordingly as a temporary solution
There was a problem hiding this comment.
I guess making changes in calcite will take time,
Ah, I was not suggesting we make the changes in Calcite. We should be able to define our own RelNode in Isthmus with something like:
package io.substrait.isthmus.calcite.relations;
import org.apache.calcite.rel.AbstractRelNode;
public abstract class TableCreate extends AbstractRelNode {
// ...
}
public class LogicalTableCreate extends TableCreate {
// ...
}There was a problem hiding this comment.
ok, I thought we need to extend a hierarchy that's a part of calcite. If we can do that in the isthmus code it will be much easier.
There was a problem hiding this comment.
Ok yeah we can do something much simpler than extending Calcite. I was pointing out the
RelNode visit(RelNode other);because that's the mechanism by which Calcite let's you visit nodes that you define outside of Calcite core.
There was a problem hiding this comment.
It seems with this approach we still have to make DdlRel (Create) a child of RelNode, but I have a couple of concerns with it:
- RelNode represents a relational expression, that processes data (records), while DdlRel does something else (it modifies the database catalog with a new entry for table, view or mqt)
- RelNode has the
getRowTypemethod which represents the type of the rows returned by the relational expression, which is not applicable for DdlRel
So I'm not sure if it is correct to say that DdlRel is RelNode. In my opinion, it would be better if they have the same parent class. But this probably requires changes in calcite.
There was a problem hiding this comment.
@vbarua I introduced a new class CalciteOperation to represent relational algebra and ddl operations. Please take a look when you get a chance. If the suggested approach makes sense I will update SubstraitToCalcite accordingly.
9e9dce2 to
0847d7d
Compare
c3139c4 to
f8041dd
Compare
# Conflicts: # isthmus/src/main/java/io/substrait/isthmus/SqlToSubstrait.java # Conflicts: # isthmus/src/main/java/io/substrait/isthmus/SqlToSubstrait.java # Conflicts: # isthmus/src/main/java/io/substrait/isthmus/SqlToSubstrait.java # isthmus/src/test/java/io/substrait/isthmus/DdlRoundtripTest.java
d0e1cca to
6c7280d
Compare
Makes sense. I've added a validator to SubstraitRelNodeConverter to ensure incoming relations have a supported configuration. If this approach is acceptable, we can move the validation logic for non-DDL and non-DML relations there as well (probably in a separate PR). |
6c7280d to
e31d8d3
Compare
e31d8d3 to
08cd5c5
Compare
08cd5c5 to
0dc6eb3
Compare
|
@ZorinAnton could you avoid rebasing your PR every time you update it. It makes it harder to review because I can't just look at the new commits, and it messes with Github PR view a bit. The end result will be squash merged as well, so we don't need to keep the branch fresh with main. |
|
sure, will do, I just didn't want to have a separate commit per every singe change. |
vbarua
left a comment
There was a problem hiding this comment.
I just didn't want to have a separate commit per every singe change.
Because it gets squashed at the end this won't matter. When I merge as well, I summarize the individual commits into a coherent message.
I've made some suggestions for this a separate PR targetting your PR
ZorinAnton#1
I've added a validator to SubstraitRelNodeConverter to ensure incoming relations have a supported configuration.
I don't think we need that level of machinary for this work. For one thing, in the long-term we shouldn't need those validations at all because we should be able to convert ALL valid Substrait messages to Calcite and back. In my suggestions, I've inlined very basic restrictions.
I'm not opposed to validations in general, but I don't want to make them part of this PR because if we start fiddling with a public API for validations it will extend how long this PR remains open.
isthmus/src/main/java/io/substrait/isthmus/calcite/rel/CreateTable.java
Outdated
Show resolved
Hide resolved
isthmus/src/main/java/io/substrait/isthmus/calcite/rel/CreateTable.java
Outdated
Show resolved
Hide resolved
isthmus/src/main/java/io/substrait/isthmus/calcite/rel/CreateView.java
Outdated
Show resolved
Hide resolved
8151c4f to
be2e6b2
Compare
Thanks, merged it.
before it happens, a dedicated validator would give a quick view of what is supported and what's not, but I agree that it needs to be addressed in a separate PR. |
As discussed, splitting the PR #428 into dml and ddl parts. This PR is for ddl support.
For the ddl sql statements the following relations are created:
NamedWritefor create table asNamedDdlfor create viewAll other types of sql statements are handled as before.
Since DDL statements have no representation in Calcite's relational algebra, these changes conflict with the proposed split of SqlToSubstrait into SqlToCalcite and CalciteToSubstrait: #362. This is a key architectural decision that requires careful design.