-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Me and @jimdale found an issue where viewser would raise a deserialization
error, while there was obviously at least partial Parquet bytes data in the
response:
DeserializationError: DeserializationError:
Description:
Could not deserialize as parquet: "b'PAR1\x15\x04\
x15\xe0D\x15\xf8?L\x15\xcc\x08\x15\x04\x12\x00\x00\x1f\x8b\x08
\x00\x00\x00\x00\x00\x00\x03-Wi8Vk\x1b5e\x1e\xdei\x8f\xafY*\x9
1\xc21'..."
This only seems to happen with certain querysets. The queryset that lead to this error was:
queryset = (Queryset("jim_fatalities_conflict_history_lag_tdecay", "priogrid_month")
# target variable
.with_column(Column("ln_ged_sb", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
.transform.missing.fill()
.transform.ops.ln()
)
# spatial-tree-lagged d^-2 target variable
.with_column(Column("ln_ged_sb_treelag_2_th1_0", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
.transform.missing.fill()
.transform.ops.ln()
.transform.spatial.treelag(thetacrit_tree,2)
)
# 1 tlagged spatial-tree-lagged d^-2 target variable
.with_column(Column("ln_ged_tlag_1_sb_treelag_2_th1_0", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
.transform.missing.fill()
.transform.ops.ln()
.transform.spatial.treelag(thetacrit_tree,2)
.transform.temporal.tlag(1)
.transform.missing.fill()
)
# spatial-tree-lagged d^-1 target variable
.with_column(Column("ln_ged_sb_treelag_1_th1_0", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
.transform.missing.fill()
.transform.ops.ln()
.transform.spatial.treelag(thetacrit_tree,1)
)
# 1 tlagged spatial-tree-lagged d^-1 target variable
.with_column(Column("ln_ged_tlag_1_sb_treelag_1_th1_0", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
.transform.missing.fill()
.transform.ops.ln()
.transform.spatial.treelag(thetacrit_tree,1)
.transform.temporal.tlag(1)
.transform.missing.fill()
)
# spatial-tree-lagged ln(1+d) target variable
.with_column(Column("ln_ged_sb_treelag_0_th1_0", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
.transform.missing.fill()
.transform.ops.ln()
.transform.spatial.treelag(thetacrit_tree,0)
)
# 1 tlagged spatial-tree-lagged ln(1+d) target variable
.with_column(Column("ln_ged_tlag_1_sb_treelag_0_th1_0", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
.transform.missing.fill()
.transform.ops.ln()
.transform.spatial.treelag(thetacrit_tree,0)
.transform.temporal.tlag(1)
.transform.missing.fill()
)
)
To begin diagnosing this, we need to write some tooling for dumping the
erroneous response data to figure out what is being returned that is not
deserializable. This will give us a clue about whether or not the issue is
being caused by something upstream, or is caused by some issue with
deserialization.
A clue is that there is no exception happening upstream, which means that the
data is written to parquet and sent away just fine. This hints towards there
being something wrong with viewser.
Metadata
Metadata
Labels
bugSomething isn't workingSomething isn't working