-
Notifications
You must be signed in to change notification settings - Fork 51
Add Deltalake query support #1023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
some notes:
|
|
ah and based on the failing test for trajectories, I assumed returning the pymatgen object was correct, should the
|
|
@tsmathis think the API was set up to return the Either way yeah I guess it returned the For the AlphaID, to handle either the no prefix/separator ("aaaaaaft") and with prefix/separator ("mp-aaaaaaft") cases, both of these should work, but I can also just save the "padded identifier" as an attr on it to make this cleaner - I'll do that in the PR you linked: or |
either way on this works for me, just want to make sure I stick to the intended usage (edit: or that we're at least consistent across the client)
Was going to say we could stick to whatever the frontend was expecting, but looking now the frontend doesn't even use the |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1023 +/- ##
==========================================
- Coverage 66.85% 65.92% -0.94%
==========================================
Files 50 50
Lines 2767 2870 +103
==========================================
+ Hits 1850 1892 +42
- Misses 917 978 +61 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Looking forward to rolling this out 😄
| has_gnome_access = bool( | ||
| self._submit_requests( | ||
| url=urljoin( | ||
| "https://api.materialsproject.org/", "materials/summary/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use self.endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be self.base_endpoint? I think this tripped me up when I tried using self.endpoint originally
api/mp_api/client/core/client.py
Lines 134 to 135 in 254c7d0
| if self.suffix: | |
| self.endpoint = urljoin(self.endpoint, self.suffix) |
for the tasks rester -> self.endpoint caused the urljoin here to yield something like {base_url}/materials/tasks/materials/summary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, self.base_endpoint should work.
| _flush(accumulator, group) | ||
| group += 1 | ||
| size = 0 | ||
| accumulator = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accumulator.clear() for better memory management?
| description="Threshold number of rows to accumulate in memory before flushing dataset to disk", | ||
| ) | ||
|
|
||
| ACCESS_CONTROLLED_BATCH_IDS: list[str] = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
settings is the right place for the first two. Access-controlled batch ids should probably be hardcoded and change with client releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
akin to one of these?
Lines 475 to 488 in 254c7d0
| def get_database_version(self): | |
| """The Materials Project database is periodically updated and has a | |
| database version associated with it. When the database is updated, | |
| consolidated data (information about "a material") may and does | |
| change, while calculation data about a specific calculation task | |
| remains unchanged and available for querying via its task_id. | |
| The database version is set as a date in the format YYYY_MM_DD, | |
| where "_DD" may be optional. An additional numerical suffix | |
| might be added if multiple releases happen on the same day. | |
| Returns: database version as a string | |
| """ | |
| return get(url=self.endpoint + "heartbeat").json()["db_version"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: the use of self.endpoint? If so, then yes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no, I mean for the access controlled batch ids.
Should those be added to the heartbeat so they aren't defined in the client code/settings? And then the client can just call get_access_controlled_batch_ids()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to start a PR to add it to the heartbeat_meta here.
should just work™️