Conversation
scripts/publish.py
Outdated
| for cid in coreids: | ||
| match = CORE_PATTERN.match(cid) | ||
| if match: | ||
| numbers.append(int(match.group(1))) |
There was a problem hiding this comment.
Current regex pattern creates only one group. match.group(1) will error. We should update regex to be like: CORE_PATTERN = re.compile(r"^CORE-(\d{6})$")
RamilCDISC
left a comment
There was a problem hiding this comment.
All looks good to me now. The workflow should work as intended.
There was a problem hiding this comment.
There are a few structural things this PR will need to change:
- when assigning the directory name for the published rule, it puts the rule in root. The planned structure at present has Unpublished and Published as we need to house unpublished rules separately from published as we port from editor.
root/
├── Published/
│ ├── CORE-XXXX/
│ │ └── rule.yaml
│ │ ├── negative/
│ │ ├── positive/
│ └── CORE-XXXX/
├── Unpublished/
│ ├── Standard_Name/
├── mappings/ - I changed my PR to use rule.yml for rule files so that logic is good looking for that name
- I have created mappings to map rule ID to core ID. It acts as a ledger for data
SDTMIG_mapping.csv your publishing script when it has the rule.yml will need to find all applicable standards and their Rule IDs. It will need to add/sort them to each applicable CSV which is named via Standard + '_mappings'. It will also need to grab version. There is a one to many relationship between core ID and rule ID. I also had to add logic for FDA_Business_Rules as that wont be listed as standard but if it has Organization FDA and FB as a Rule ID, I had the script add it to that bucket as well. The status will always be published, CORE ID just needs to use the one found by the algo in the gitaction.
Generate_mappings.py This is the script I used to generate the initial mappings if any of the code helps
- the scan for CORE ids will need to recurse through the Published/ directory in root
and account for the standards that are nested inside.It could be easier to do this in mappings/ and use the CSVs which are going to be a source of truth so maybe use them for the CORE ID algo? I think the most seamless way for this to work is to have authors work in Unpublished/StandardX directory and have the publish script grab the standard directory they are working in and move the rule from the Unpublished dir to the Published but grabbing the standard name off the directory path to know which Standard in Published/ to put it into.
@SFJohnson24, You referring to this structure, but in your PR you used "flat" structure for the |
|
@alexfurmenkov |
scripts/publish.py
Outdated
| def main() -> None: | ||
| parser = argparse.ArgumentParser() | ||
| parser.add_argument("--new-dirs", required=True, | ||
| help="Space-separated rule folders under Rules/Unpublished/") |
There was a problem hiding this comment.
Rules/ doesnt exist anymore. Unpublished and Published sit in root/
scripts/publish.py
Outdated
|
|
||
| CORE_PATTERN = re.compile(r"^CORE-(\d{6})$") | ||
|
|
||
| PUBLISHED_DIR = Path("Rules/Published") |
There was a problem hiding this comment.
see #2 for initial rules and test data structure PR
SFJohnson24
left a comment
There was a problem hiding this comment.
A few changes:
- see #2 for the initial rule/test data structure now that it is complete. Unpublished does have standards, published does not (I updated the diagram above, apologies for any confusion). We likely do not need to parse the path, given that we are no longer nesting published with Standard_name/ directory, the standard organization in unpublished is simply organizational. Further, the folder names are RuleID, not COREID. We want the script to be agnostic to whatever COREID authors may have given their rules or assigned as the folder name. The algorithm should function outside of that--it is likely easiest to walk the folders in Published (now that it is flat and doesnt have Standard/ directories under Published/) looking for the next open CORE-ID, then creating and naming the folder and moving the rule.yaml, and test data into this
The Rules/ directory has been eliminated and I see reference to that throughout the script.
A/C:
- Remove all Rules/ prefixes — PUBLISHED_DIR = Path("Published"), mappings_dir = Path("mappings")
- main loop: stop trying to parse the incoming path to infer anything — just receive rule dir, read rule.yaml for everything meaningful.
- can you add generate_mappings.py as this is not in the repo at this time and your script calls it
Workflow that supports ledger csvs to publish and keep track of the rules.
Refreshes csvs, rule folder names accordingly