Fix a date-time bug; drop pointless dependencies; and deprecate utils.#85
Fix a date-time bug; drop pointless dependencies; and deprecate utils.#85dmyersturnbull wants to merge 8 commits intomasterfrom
Conversation
piehld
left a comment
There was a problem hiding this comment.
Thank @dmyersturnbull, definitely appreciate you reviewing that codebase and identifying and proposing fixes to outdated code. I'm sure your changes are reliable and will try to review them soon, but because I'm out the rest of the week, I just want to give you a heads up that we will probably wait until next week to merge this, since I want to be extra careful testing this before pushing out to production for the weekend. Any changes with rcsb.db are pretty impactful with the weekly-update, so I just tend to lean towards extra caution when it comes to pushing things out is all.
While I still need to go through each change, the one thing that stands out to me is the removal of the mock-data submodule. I'm guessing that wasn't intentional...can you add it back in there?
Also FYI @brindakv
Sounds good. The SQL removal doesn't depend on this. By the way, do tests run for PRs?
Ah! Git doesn't normally track empty directories, and I had to |
|
Thanks! And be aware that Michael just merged a PR, so you'll probably want to rebase or merge.
Yes, they do. Every time you push a new commit a new pipeline will run. However, for some reason this repository doesn't show the results from the runs like other repos do, so you'll need to just go to Azure to check the run directly: https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_build?definitionId=33 When you push a new commit, a new run will show up on that link. |
� Conflicts: � rcsb/db/cockroach/CockroachDbUtil.py � rcsb/db/cockroach/Connection.py � rcsb/db/crate/CrateDbUtil.py � rcsb/db/mysql/MyDbUtil.py � requirements.txt � setup.py
e3bc77c to
c6f228d
Compare
|
@piehld I rebased. I also adjusted the readme (no longer mentions MySQL) and cleaned up |
8e67cee to
18dcff2
Compare
piehld
left a comment
There was a problem hiding this comment.
Thanks for bringing mock-data back @dmyersturnbull, and the other changes! I have a couple general questions about the removal of TimeUtil.py, which I left as comments for that file. I definitely agree with removing unnecessary dependencies from requirements.txt, but do wonder if we need to remove that file altogether or instead just update it, since that class is used in code elsewhere that would need to be updated also if we remove it. So, personally I'd kind of prefer keeping the class but just updating it to use datetime instead.
There was a problem hiding this comment.
This class is used in rcsb.exdb and rcsb.workflow, so we would need appropriate updates to those repos too (and first) if we want to get rid of this file.
There was a problem hiding this comment.
Alternatively, instead of getting rid of this file (since it is used in a handful of places), is it possible to just update this file to not use dateutil or pytz? That would be my preference, at least.
There was a problem hiding this comment.
This class is used in
rcsb.exdbandrcsb.workflow
I don't think that's true. rcsb.exdb is importing rcsb.utils.io.TimeUtil (which, incidentally, is identical). The first commit in this PR refactored TimeUtil, but I deleted it when I realized that.
There was a problem hiding this comment.
Hmm, I noticed that difference too, but I also see imports of rcsb.db.utils.TimeUtil, e.g.:
There was a problem hiding this comment.
Oh, good catch. I added stubs to rcsb/db/utils/__init__.py that provide those with deprecation warnings.
| from datetime import datetime | ||
| from typing import Optional | ||
| from zoneinfo import ZoneInfo |
There was a problem hiding this comment.
Likewise, if we keep TimeUtil, we don't have to worry about updating each of these files that currently imports it.
| # TODO | ||
| # What is this function doing???! | ||
| # Its name was castDateTimeToIsoDate, suggesting it outputs YYYY-MM-DD. | ||
| # Contradicting tht, the docstring said: | ||
| # > Cast the input date (optional time) string (yyyy-mm-dd:hh::mm:ss) to a Python DateTime object | ||
| # Instead, it did this: | ||
| # > tS = dateutil.parser.parse(tv).replace(tzinfo=pytz.UTC).isoformat() | ||
| # That's three contradictory claims. | ||
| # What was this function supposed to do? | ||
| # Why did it fill in 00:00:00, and why did it append +00:00? | ||
| # Also, why did the format separate the date and time with `:`? | ||
| # END TODO |
There was a problem hiding this comment.
lol, these are all good questions. I imagine some of the contradictions probably came from John copying and pasting a docstring or changing the method name after already writing an [outdated] docstring. From experience, he wasn't always super careful with docstrings.
So, I think the best way to figure out what the intent of this was would be to run an actual load and add some logging to figure it out. I could look into doing that.
There was a problem hiding this comment.
I simply retained the current behavior and ignored the intention (save for those comments).
@piehld Another PR will remove MySQL. This PR fixes a bug and removes some unnecessary dependencies.
Summary: Fixes a date-time calculation bug; drops pointless date-time dependencies; and deprecates TimeUtil and TextUtil.
TimeUtilincorrectly calleddatetime.replace(tzinfo=utc), which does doesn't actually move timezones. Code usingTimeUtilanduseUtcwas affected.dateutil,pytz, andstrict-rfc3339. Builtindatetimeworks perfectly well. Removes these fromTimeUtiland some other code.TimeUtil. Its methods just delegate todatetimein standard ways. [1]TextUtiland refactored code to avoid using it. It only definedunescapeXmlCharRef, which is justhtml.unescape.unescape.py, which was an unused duplicate ofTextUtil. [2]scandirandconfigparser, which were marked; python<3.0.sixandfuture, removedfrom __future__ import generators, and movedjsondifftotests_require. (2nd commit)[1] The exception is "week signature" because it's non-obvious/nonstandard (i.e. not ISO 8601 week number). However, it's just
strftime("%Y_%V").[2] This is technically a breaking change, but adding
import rcsb.db.utils.TextUtil as unescapetoutils.__init__.pywould make it non-breaking.