Skip to content

update integrative link#31

Merged
j-s-135 merged 7 commits intomasterfrom
dictionary
May 21, 2025
Merged

update integrative link#31
j-s-135 merged 7 commits intomasterfrom
dictionary

Conversation

@j-s-135
Copy link
Contributor

@j-s-135 j-s-135 commented May 16, 2025

syncing with dictionary link from exdb

@j-s-135 j-s-135 requested a review from piehld May 16, 2025 20:39
@piehld
Copy link
Contributor

piehld commented May 17, 2025

Thanks @j-s-135, the changes look good, but oddly the tests are failing with an error message that doesn't seem to be related, and one that @jeremy-rcsb actually encountered in his k8s workflow:

TypeError: Can't mix strings and bytes in path components

I'm wondering if this PR created a new image on Harbor, which is what the k8s workflow started using?

This error is very strange since my PR just two days ago didn't encounter any issues: #30

One thing though— @j-s-135—can you increment the version number in HISTORY and __init__.py? Since you are editing actual code, it would be good to increment the version to keep track of that.

FYI @brindakv

@piehld piehld mentioned this pull request May 17, 2025
@piehld
Copy link
Contributor

piehld commented May 17, 2025

Ugh, I just confirmed (in an essentially empty PR) that this appears to be a real issue now, for some reason. Normally, this is what we see (from our Azure pipelines after merging code on Wednesday):

2025-05-14 16:31:16,871 [INFO]-PdbxLoader.load: First locatorObj: ({'locator': 'https://files.wwpdb.org/pub/pdb/data/structures/divided/mmCIF/ds/1dsr.cif.gz', 'fmt': 'mmcif', 'kwargs': {}}, {'locator': 'https://files.wwpdb.org/pub/pdb/validation_reports/ds/1dsr/1dsr_validation.cif.gz', 'fmt': 'mmcif', 'kwargs': {}})
2025-05-14 16:31:16,872 [INFO]-DictionaryApiProvider.__reload: Fetching url https://raw.githubusercontent.com/rcsb/py-rcsb_exdb_assets/master/dictionary_files/reference/mmcif_ma_ext.dic caching in /home/vsts/work/1/s/CACHE/dictionaries/mmcif_ma_ext.dic
2025-05-14 16:31:17,209 [INFO]-DictionaryApiProvider.__reload: Fetching url https://raw.githubusercontent.com/rcsb/py-rcsb_exdb_assets/master/dictionary_files/reference/vrpt_mmcif_ext.dic caching in /home/vsts/work/1/s/CACHE/dictionaries/vrpt_mmcif_ext.dic
2025-05-14 16:31:19,454 [INFO]-DictMethodResourceProvider.cacheResources: Begin CHECKING cache for 33 candidate resources

But now, making no changes to the code—although, after merging py-rcsb_exdb_assets—re-running is producing this:

2025-05-17 22:14:11,610 [INFO]-PdbxLoader.load: First locatorObj: ({'locator': 'https://files.wwpdb.org/pub/pdb/data/structures/divided/mmCIF/bm/1bmv.cif.gz', 'fmt': 'mmcif', 'kwargs': {}}, {'locator': 'https://files.wwpdb.org/pub/pdb/validation_reports/bm/1bmv/1bmv_validation.cif.gz', 'fmt': 'mmcif', 'kwargs': {}})
2025-05-17 22:14:11,611 [INFO]-DictionaryApiProvider.__reload: Fetching url https://raw.githubusercontent.com/rcsb/py-rcsb_exdb_assets/master/dictionary_files/reference/mmcif_ma_ext.dic caching in /home/vsts/work/1/s/CACHE/dictionaries/mmcif_ma_ext.dic
2025-05-17 22:14:11,628 [ERROR]-PdbxLoader.load: Failing with Can't mix strings and bytes in path components
Traceback (most recent call last):
  File "/home/vsts/work/1/s/.tox/test_coverage-py310/lib/python3.10/site-packages/rcsb/db/mongo/PdbxLoader.py", line 262, in load
    dictApi = dP.getApiByName(databaseName)
  File "/home/vsts/work/1/s/.tox/test_coverage-py310/lib/python3.10/site-packages/rcsb/utils/dictionary/DictionaryApiProviderWrapper.py", line 89, in getApiByName
    return self.__dP.getApi(dictLocators, **kwargs)
  File "/home/vsts/work/1/s/.tox/test_coverage-py310/lib/python3.10/site-packages/rcsb/utils/dictionary/DictionaryApiProvider.py", line 92, in getApi
    dApi = self.__apiMap[dictTup] if dictTup in self.__apiMap else self.__getApi(dictLocators, **kwargs)
  File "/home/vsts/work/1/s/.tox/test_coverage-py310/lib/python3.10/site-packages/rcsb/utils/dictionary/DictionaryApiProvider.py", line 102, in __getApi
    ok = self.__reload(dictLocators, self.__dirPath, useCache=self.__useCache)
  File "/home/vsts/work/1/s/.tox/test_coverage-py310/lib/python3.10/site-packages/rcsb/utils/dictionary/DictionaryApiProvider.py", line 72, in __reload
    cacheFilePath = os.path.join(dirPath, self.__fileU.getFileName(dictLocator))
  File "/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/posixpath.py", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/genericpath.py", line 155, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components

@brindakv, do you have any idea what this might be related to? I'm very confused on how production worked yesterday given that my new blank PR is failing without any actual code changes.

@piehld
Copy link
Contributor

piehld commented May 18, 2025

@j-s-135 @jeremy-rcsb @brindakv Alright, so after banging my head for an hour, I think I figured out the issue. I believe the PDBxLoader was trying to access the IHM dictionary based on the configuration file in mock-data, to which the IHM dictionary setting hadn't yet been added. I just added it here, and re-ran my PR workflow, and I think it is working now.

@j-s-135, so, for your PR to pass, you will need to update mock data. I'll do this for you now since I just updated mock-data myself, but for the record here would be the steps (while within the py-rcsb_workflow directory on your computer):

# to update mock-data in the `py-rcsb_workflow` repo:
git submodule update --recursive --remote

# to check if the update was already added to your next commit
git status

# if it wasn't added yet, please add it:
git add -u

# Then commit and push the updated mock-data directory
git commit
git push

@piehld
Copy link
Contributor

piehld commented May 18, 2025

@j-s-135 So the update of mock-data fixed the original issue this PR was facing, but there seems to be a new issue, possibly due to the handling of how dictionaries are read in for your BCIF workflow. It seems like the workflow is reading in the dictionary file many times in a row. Can you please review your code to see if this is the case and adjust it so that it only reads in the dictionary once?

Also, before I forget, can you please update the the version in the HISTORY.txt and __init__.py file as well?

test_hashed_storage (testBcifWorkflow.TestBcif) ... 2025-05-18 00:17:02,581 [INFO]-testBcifWorkflow.setUp: making temp dir /tmp/tmp1a23_hpl
2025-05-18 00:17:02,582 [INFO]-testBcifWorkflow.setUp: making temp dir /tmp/tmp004fv_h_
2025-05-18 00:17:02,843 @6219 [INFO]-BcifExec: bcif workflow initialized
2025-05-18 00:17:02,843 @6219 [INFO]-BcifWorkflow: running bcif workflow on /tmp/tmp004fv_h_/pdbx_core_ids-1.txt and /home/vsts/work/1/s/rcsb/mock-data/MOCK_CIF_FILES/pdb
2025-05-18 00:17:02,843 @6219 [INFO]-task_functions: reading list file /tmp/tmp004fv_h_/pdbx_core_ids-1.txt and remote path /home/vsts/work/1/s/rcsb/mock-data/MOCK_CIF_FILES/pdb
2025-05-18 00:17:02,843 @6219 [INFO]-task_functions: distributing 10 files across 1 sublists
2025-05-18 00:17:11,759 @6219 [INFO]-task_functions: Done with CIF to BCIF conversion
2025-05-18 00:17:11,795 @6219 [INFO]-BcifExec: completed in 8.95 s
2025-05-18 00:17:12,104 @6227 [INFO]-BcifExec: bcif workflow initialized
2025-05-18 00:17:12,104 @6227 [INFO]-BcifWorkflow: running bcif workflow on /tmp/tmp004fv_h_/pdbx_comp_model_core_ids-1.txt and /home/vsts/work/1/s/rcsb/mock-data/MOCK_CIF_FILES/csm
2025-05-18 00:17:12,105 @6227 [INFO]-task_functions: reading list file /tmp/tmp004fv_h_/pdbx_comp_model_core_ids-1.txt and remote path /home/vsts/work/1/s/rcsb/mock-data/MOCK_CIF_FILES/csm
2025-05-18 00:17:12,105 @6227 [INFO]-task_functions: distributing 10 files across 1 sublists
2025-05-18 00:17:17,269 @6227 [INFO]-task_functions: Done with CIF to BCIF conversion
2025-05-18 00:17:17,306 @6227 [INFO]-BcifExec: completed in 5.20 s
2025-05-18 00:17:17,607 @6239 [INFO]-BcifExec: bcif workflow initialized
2025-05-18 00:17:17,608 @6239 [INFO]-BcifWorkflow: running bcif workflow on /tmp/tmp004fv_h_/pdbx_ihm_ids-1.txt and /home/vsts/work/1/s/rcsb/mock-data/MOCK_CIF_FILES/ihm
2025-05-18 00:17:17,608 @6239 [INFO]-task_functions: reading list file /tmp/tmp004fv_h_/pdbx_ihm_ids-1.txt and remote path /home/vsts/work/1/s/rcsb/mock-data/MOCK_CIF_FILES/ihm
2025-05-18 00:17:17,608 @6239 [INFO]-task_functions: distributing 10 files across 1 sublists
2025-05-18 00:17:19,178 @6239 [ERROR]-BcifExec: failed to create dictionary api: 429 Client Error: Too Many Requests for url: https://raw.githubusercontent.com/rcsb/py-rcsb_exdb_assets/master/dictionary_files/reference/mmcif_ihm_ext.dic
Traceback (most recent call last):
  File "/home/vsts/work/1/s/rcsb/workflow/bcif/task_functions.py", line 151, in getDictionaryApi
    containers += adapter.readFile(inputFilePath=path)
  File "/home/vsts/work/1/s/.tox/test_coverage-py310/lib/python3.10/site-packages/mmcif/io/IoAdapterPy.py", line 205, in readFile
    raise e
  File "/home/vsts/work/1/s/.tox/test_coverage-py310/lib/python3.10/site-packages/mmcif/io/IoAdapterPy.py", line 144, in readFile
    ifh.raise_for_status()
  File "/home/vsts/work/1/s/.tox/test_coverage-py310/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://raw.githubusercontent.com/rcsb/py-rcsb_exdb_assets/master/dictionary_files/reference/mmcif_ihm_ext.dic

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/vsts/work/1/s/rcsb/workflow/cli/BcifExec.py", line 169, in <module>
    main()
  File "/home/vsts/work/1/s/rcsb/workflow/cli/BcifExec.py", line 161, in main
    (BcifWorkflow(args))()
  File "/home/vsts/work/1/s/rcsb/workflow/wuw/BcifWorkflow.py", line 69, in __call__
    convertCifFilesToBcif(
  File "/home/vsts/work/1/s/rcsb/workflow/bcif/task_functions.py", line 91, in convertCifFilesToBcif
    dictionaryApi = getDictionaryApi(pdbxDict, maDict, rcsbDict, ihmDict)
  File "/home/vsts/work/1/s/rcsb/workflow/bcif/task_functions.py", line 154, in getDictionaryApi
    raise FileNotFoundError("failed to create dictionary api: %s" % str(e)) from e
FileNotFoundError: failed to create dictionary api: 429 Client Error: Too Many Requests for url: https://raw.githubusercontent.com/rcsb/py-rcsb_exdb_assets/master/dictionary_files/reference/mmcif_ihm_ext.dic
Traceback (most recent call last):
  File "/home/vsts/work/1/s/rcsb/workflow/bcif/task_functions.py", line 151, in getDictionaryApi
    containers += adapter.readFile(inputFilePath=path)
  File "/home/vsts/work/1/s/.tox/test_coverage-py310/lib/python3.10/site-packages/mmcif/io/IoAdapterPy.py", line 205, in readFile
    raise e
  File "/home/vsts/work/1/s/.tox/test_coverage-py310/lib/python3.10/site-packages/mmcif/io/IoAdapterPy.py", line 144, in readFile
    ifh.raise_for_status()
  File "/home/vsts/work/1/s/.tox/test_coverage-py310/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://raw.githubusercontent.com/rcsb/py-rcsb_exdb_assets/master/dictionary_files/reference/mmcif_ihm_ext.dic

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/vsts/work/1/s/rcsb/workflow/cli/BcifExec.py", line 172, in <module>
    raise e
  File "/home/vsts/work/1/s/rcsb/workflow/cli/BcifExec.py", line 169, in <module>
    main()
  File "/home/vsts/work/1/s/rcsb/workflow/cli/BcifExec.py", line 161, in main
    (BcifWorkflow(args))()
  File "/home/vsts/work/1/s/rcsb/workflow/wuw/BcifWorkflow.py", line 69, in __call__
    convertCifFilesToBcif(
  File "/home/vsts/work/1/s/rcsb/workflow/bcif/task_functions.py", line 91, in convertCifFilesToBcif
    dictionaryApi = getDictionaryApi(pdbxDict, maDict, rcsbDict, ihmDict)
  File "/home/vsts/work/1/s/rcsb/workflow/bcif/task_functions.py", line 154, in getDictionaryApi
    raise FileNotFoundError("failed to create dictionary api: %s" % str(e)) from e
FileNotFoundError: failed to create dictionary api: 429 Client Error: Too Many Requests for url: https://raw.githubusercontent.com/rcsb/py-rcsb_exdb_assets/master/dictionary_files/reference/mmcif_ihm_ext.dic
FAIL

Copy link
Contributor

@piehld piehld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @j-s-135, please see my last comment for change requests (#31 (comment)). Also, please remember to pull my commits before pushing your new changes.

@j-s-135 j-s-135 requested a review from piehld May 19, 2025 19:46
@j-s-135
Copy link
Contributor Author

j-s-135 commented May 19, 2025

I think it's fixed. The other errors seem to relate to fixturePdbxLoader.

Copy link
Contributor

@piehld piehld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @j-s-135. Your change looks good, but I think you may have accidentally undone the updates I made to mock-data which fixed the fixturePdbxLoader issues.

Can you please run the following on your machine:

# navigate to your repo directory
cd py-rcsb_workflow

# update mock-data:
git submodule update --recursive --remote

# if it wasn't added yet, please add it:
git add rcsb/mock-data

# Then commit and push the updated mock-data directory
git commit
git push

Copy link
Contributor

@piehld piehld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates and addressing the requested changes, @j-s-135. LGTM!

@j-s-135 j-s-135 merged commit 828dcf8 into master May 21, 2025
14 of 15 checks passed
@j-s-135 j-s-135 deleted the dictionary branch June 6, 2025 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants