Inherit from Dask Dataframe classes#43
Conversation
This allows us to remove a large amount of boilerplate code. This requires changes upstream both in Dask DataFrame to enable better subclassing and in PyGDF to better match the Pandas API.
dask_gdf/tests/test_delayed_io.py
Outdated
| raises.match(r"\s+\|\s+".join(['bad', 'float32', 'float64'])) | ||
| # print("out") | ||
| # raises.match(r"^Metadata mismatch found in `from_delayed`.") | ||
| # raises.match(r"\s+\|\s+".join(['bad', 'float32', 'float64'])) |
There was a problem hiding this comment.
This isn't dead. I had to comment it out to get tests to pass. I prefer to leave it here for now until we decide what the plan is on this.
There was a problem hiding this comment.
Ok! Can you please put a comment here to that effect. This is something we do in libgdf etc for "future code"
There was a problem hiding this comment.
I'm happy to actually fix the test. Sorry, I should have mentioned above that this PR is a work-in-progress. I expect that I'll end up doing more changes before it's done. I've pushed up the first round of PRs just so that folks can take a look at what I'm proposing. I don't think that the time has come for scrutiny on the level of style or comments.
|
So, some reasons not to do this:
|
|
|
OK. I'll work on cleaning up things on the dask side and I'll add a couple checks to the pygdf side. Would we want to merge this if both of those go in? That would require us to wait on a release until the next time dask-core releases (probably at least a week or two away) and would pin us above that version for the future. |
|
Does dask-core publish nightlies to Conda? Given the "alpha" state of dask-gdf I'd be happy to have a dependency on pre-release Dask versions as we've done with Numba in the past for PyGDF as we identify functionality that should be added or bugs get fixed. |
No, but it's pure Python, so we typically |
|
I think relying on master until needed changes / features make it into a release is okay at this point. |
|
OK. I'll add a new line to the CI system in this PR once other PRs finish
up.
…On Wed, Sep 26, 2018 at 11:34 PM Keith Kraus ***@***.***> wrote:
I think relying on master until needed changes / features make it into a
release is okay at this point.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszJ5U4YSYai1nJJAya4ggHC_iz9IPks5ufEcsgaJpZM4W7cuD>
.
|
|
Closing in favor of #48 |
This allows us to remove a large amount of boilerplate code.
This requires changes upstream both in Dask DataFrame to enable better
subclassing and in PyGDF to better match the Pandas API.