Skip to content

ITM 1100: Automated Integration Testing Harness For Dummy ADM Runs#149

Open
NeilDaniel07 wants to merge 25 commits intodevelopmentfrom
ITM-1100
Open

ITM 1100: Automated Integration Testing Harness For Dummy ADM Runs#149
NeilDaniel07 wants to merge 25 commits intodevelopmentfrom
ITM-1100

Conversation

@NeilDaniel07
Copy link
Copy Markdown
Contributor

@NeilDaniel07 NeilDaniel07 commented Sep 10, 2025

Original Ticket: Here

Description:

This PR introduces a robust automated integration testing harness that captures dummy ADM run information across various different server configurations. The idea for this was first introduced in #147, when a large number of configs had to otherwise be tested manually after key changes to server structure. In general, when reviewers were testing new functionality or changes due to a new data collection period, the server had to be manually started and stopped for each possible configuration. Additionally, the client itm_minimal_runner had to be manually entered with a terminal command in the itm-evaluation-client repository. Finally, the output of the ADM run was not captured in an outfile and rather just logged to the terminal, which can be lossy and difficult to compare across runs.

This harness significantly reduces reviewer effort and ensures more consistent checks across server configurations. First, the script validates the test matrix titled GROUPS against the active config file (swagger_server/config.ini). GROUPS contains entries that each define which config options to run the dummy ADM on, whether to do so in testing mode, and what phase the exercised scenarios belong to. The validation protocol ensures that required keys (cfgs, testing, phase) are present and the types of values associated with these keys are correct. It checks for edge cases such as empty cfgs, duplicate entries in GROUPS, phase not being ∈ {1,2}, as well as ensures that every cfg corresponds to a section in config.ini (including DEFAULT).

The testing harness then runs the TA3 server for each cfg in a chosen group (testing or normal mode), waits for the server to be ready by pinging /ui/ in a loop, and invokes the client dummy ADM runner once the server has spun up. To choose the port for the server, the following logic is implemented. A default port of 8080 is chosen. If the user does not want to manually select a port, he or she can enable --auto-port to pick a free local port and to re-pick if a late conflict occurs before an ADM run. To be extra secure, re-validation of port availability takes place before each cfg is run.

For identifying the correct file paths of the client repository root, client Python executable, and dummy ADM runner, command line arguments may be used. Absolute or repo-relative paths are both accepted, with relative paths resolved from the server repo root. If CLI is omitted or invalid, the script falls back to a local automated_testing_config.json, which is intended to be reviewer-specific and ignored by Git. A tracked automated_testing_config.template.json is provided as the starting point. In addition, default GROUPS remain in automated_tester.py, but they may be overridden locally via a groups object in automated_testing_config.json.

Once the file paths have been verified, the script proceeds to start the server for each cfg, run a dummy ADM through the scenario, and record the combined stdout/stderr into branch-scoped output files under automated_test_results/<branch>/, named <cfg>_GROUP_<group>.txt. This keeps results isolated by branch while still making them easy to compare between runs, helping ensure that the dummy ADM was able to exercise scenarios in the desired way consistently across several environments.

Usage:

GROUP Validation Only:

python automated_tester.py --validate-only

Full Execution Pipeline:

python automated_tester.py --group <group_number> --branch <name>  [--port <port>] [--auto-port] [--client-root PATH] [--client-python PATH] [--runner-path PATH]

CLI Flags:

--group {…}: Which test group to run (required unless --validate-only)
--branch NAME: Branch label used in output directory naming (required unless --validate-only)
--port N: Port the server should use (default 8080; must be 1-65535 and available)
--auto-port: Ask the OS for a free port; overrides --port
--client-root PATH: Path to the evaluation client repo (absolute and relative paths both supported)
--client-python PATH: Path to the client venv Python (absolute and relative paths both supported)
--runner-path PATH: Path to the runner script; defaults to <client_root>/itm_minimal_runner.py)
--validate-only: Validate GROUPS against swagger_server/config.ini and exit (no path/port checks or execution)

Disclaimers:

  • There is a race condition in port selection logic, due to the the non-zero time window between checking for availability and binding to a port. This is handled in the simplest way possible, by re-checking for port availability before the dummy ADM is exercised on each configuration in cfgs, With --auto-port, a new port is picked transparently to mitigate the race condition.
  • Currently, the server readiness check assumes /ui/ returns HTTP 200 and contains the words “Swagger UI.” This is how the program determines if the server is ready for the dummy ADM to run on. If this endpoint changes or its contents is updated, the logic here must be modified to reflect this as well.

How To Test:

  1. Preparing For Testing

    • Ensure that you have an active config file at swagger_server/config.ini. Make sure that the SCENARIO_DIRECTORY variable points to the location of the relevant scenario YAML files.
    • You have noted the location of the client repo root, its venv Python executable, and the runner script. Furthermore, you have copied automated_testing_config.template.json to a local automated_testing_config.json and updated it to reflect these correct locations. If needed, groups may also be overridden there for local testing.
  2. Validate Only

    • Run the terminal command python automated_tester.py --validate-only. This will check to make sure that the GROUPS object is valid given the domain defined by your active config.ini file. It should log Validation Passed to the terminal and then exit without any errors.
    • As a good measure, run python automated_tester.py --validate-only --group 1 --branch test. Ensure that the extra, irrelevant arguments have no adverse affect on script functionality.
  3. Path Precedence and Validity

    • Run the automated tester using the command python automated_tester.py --group 1 --branch ITM-1100. Ensure that the ADM runs were successful and output files were generated under automated_test_results/ITM-1100/. This tests the local config-based path option.
    • If you defined your file paths to be relative, change them to absolute. If they were originally absolute, make them relative. Then rerun the same terminal command. This makes sure that both types of file paths are detected and supported.
    • Keep the local groups override in place if your active swagger_server/config.ini differs from the default checked-in groups. Then temporarily remove the path entries from automated_testing_config.json and pass all paths via command line arguments using the command python automated_tester.py --group 1 --branch ITM-1100 [--client-root PATH] [--client-python PATH] [--runner-path PATH]. Check to make sure that there are no issues with the dummy ADMs being able to exercise the relevant scenario files.
    • This time, use CLI again, except pass in an invalid --client-python. A warning should be logged, yet the script should recover by falling back to the local config and proceeding.
    • Break the paths in automated_testing_config.json. Run python automated_tester.py --group 1 --branch ITM-1100. Expect a list of aggregated errors logged to the terminal.
  4. Port Handling Logic

    • Test out of range ports such as 0 or 70000 with the command python automated_tester.py --group 1 --branch ITM-1100 --port 70000. Expect an error message indicating this issue logged to the terminal and an exit.
    • Occupy port 8080 using the command python -m http.server 8080. Then run python automated_tester.py --group 1 --branch ITM-1100 --port 8080. You should see an error message logged to the terminal and graceful failure.
    • Leaving port 8080 busy, now test auto port using python automated_tester.py --group 1 --branch ITM-1100 --auto-port. Another available port should be picked and the pipeline should proceed without errors.
  5. GROUPS Schema Changes
    After each of the following changes to either the default GROUPS in automated_tester.py or a local groups override in automated_testing_config.json, run python automated_tester.py --validate-only. You should ensure that an error is logged to the terminal. This makes sure invalid schema is picked up and the pipeline is killed before the dummy ADM is run on any scenarios.

    • Add an unknown key to one of the entries in GROUPS.
    • Delete a testing or phase parameter from one of the entries.
    • Purposefully use the wrong type for one of the parameters (e.g. "testing": "true").
    • Set cfgs to be an empty list []
    • Have duplicate configuration options in the same cfgs list
    • Use a non-existent section name in a cfgs array.
  6. Functional Runs

    • Run the dummy ADM through all configuration options in Phase 2 using python automated_tester.py --group 1 --branch ITM-1100. Manually inspect the files under automated_test_results/ITM-1100/ to ensure that all scenario IDs/alignment targets were properly exercised for each configuration option and that no runtime errors occurred.
    • Regenerate your phase 1 models with the terminal command gradlew -Pdomain=triage. Run this command in both your server repo directory and client root directory. Next, run python automated_tester.py --group 3 --branch ITM-1100. Inspect the output file(s) under automated_test_results/ITM-1100/ to ensure scenarios were exercised correctly. This confirms that the runner invocation includes --domain triage when required.

@NeilDaniel07 NeilDaniel07 mentioned this pull request Sep 12, 2025
Copy link
Copy Markdown
Contributor

@nextcen-dgemoets nextcen-dgemoets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments:

  • First of all, I'm sorry it took so long for me to look at this!
  • Overall, this is pretty slick! It might even make it easier to set up testing for code reviewers-- I can just configure some groups and tell them to use the automated tester. Maybe that makes it too easy on reviewers? Hmm...
  • Given that automated_testing_paths.json is user-specific, maybe it shouldn't be in GitHub and should be added to .gitignore. Thoughts?
  • I wonder if maybe the GROUPS configuration should be defined in automated_testing_paths.json (which would maybe then be renamed to automated_testing_config.json). GROUPS is something you'll want to change locally and not check in changes, which makes it seem like configuration.
    • On the other hand, it could be that GROUPS stays in automated_tester.py, but can be overridden in automated_testing_config.json. When updating config.ini.template, GROUPS (and the comment that refers to it) would be updated to include current, viable configurations, but users can override that in their config file.
    • Thoughts on any of this?
  • In automated_testing_paths.json, you say the paths may be relative to the server repo root, but then runner_path looks absolute, but relative to the client root. The client_python path you checked in also looks odd. I set mine to "../itm-evaluation-client/venv/Scripts/python.exe", and that worked for me with a default setup.
  • It might keep things cleaner to use the name of the branch under test (via --branch) as a directory name, instead of a file prefix. WDYT?
  • I'm not sure we needed to support phase 1. I didn't test it (yet). I'm not asking you to remove it, though, because it might be useful when there's a phase 3...

I might have more later.

@nextcen-dgemoets
Copy link
Copy Markdown
Contributor

Oh, one more thing-- I merged in the latest from development. You should update GROUPS to match the latest configuration.

@NeilDaniel07 NeilDaniel07 requested review from Garycheng92, ktabasco-prog and patmole99 and removed request for kaitlyn-sharo January 23, 2026 19:21
@NeilDaniel07 NeilDaniel07 marked this pull request as draft January 23, 2026 19:21
@nextcen-dgemoets
Copy link
Copy Markdown
Contributor

Please remove the .DS_Store files and add it to .gitignore.

@NeilDaniel07
Copy link
Copy Markdown
Contributor Author

Please remove the .DS_Store files and add it to .gitignore.

Sorry for the oversight, the existing .DS_Store files have been deleted and they will no longer be tracked now.

@NeilDaniel07
Copy link
Copy Markdown
Contributor Author

  • Given that automated_testing_paths.json is user-specific, maybe it shouldn't be in GitHub and should be added to .gitignore. Thoughts?

I removed the checked-in user-specific config file and replaced that approach with a sample automated_testing_config.template.json file which users can use to create a local automated_testing_config.json that is ignored by Git.

  • I wonder if maybe the GROUPS configuration should be defined in automated_testing_paths.json (which would maybe then be renamed to automated_testing_config.json). GROUPS is something you'll want to change locally and not check in changes, which makes it seem like configuration.

    • On the other hand, it could be that GROUPS stays in automated_tester.py, but can be overridden in automated_testing_config.json. When updating config.ini.template, GROUPS (and the comment that refers to it) would be updated to include current, viable configurations, but users can override that in their config file.
    • Thoughts on any of this?

I took the latter approach here: the tester keeps checked-in default groups in automated_tester.py, but a local groups object in automated_testing_config.json can override or extend them. That lets the repo continue to showcase current, viable defaults while still supporting user-local customization.

  • In automated_testing_paths.json, you say the paths may be relative to the server repo root, but then runner_path looks absolute, but relative to the client root. The client_python path you checked in also looks odd. I set mine to "../itm-evaluation-client/venv/Scripts/python.exe", and that worked for me with a default setup.

I changed the path semantics so all explicit configured paths are either absolute or relative to the server repo root. runner_path is now treated the same way as the others, and if it isn't present it defaults to <client_root>/itm_minimal_runner.py.

  • It might keep things cleaner to use the name of the branch under test (via --branch) as a directory name, instead of a file prefix. WDYT?

I changed output layout to use the branch name as a directory instead of a filename prefix. Results are now generated under automated_test_results/<branch>/, with outfiles named <cfg>_GROUP_<group>.txt.

NEW FEATURES:
I also made a couple of additions while I was working that were not part of the original branch behavior:

  • automated_tester.py no longer executes on import.
  • The tester now performs additional per-config prechecks before launch so bad local scenario paths, missing local scenario files, and unreachable TA1 endpoints fail immediately instead of an error emerging later.
  • The tester also now treats runs that complete without ever exercising any scenarios as failures, which helps the user catch cases where a config runs but does not actually exercise the dummy ADM on any scenario probes.

@NeilDaniel07 NeilDaniel07 marked this pull request as ready for review April 16, 2026 00:27
Copy link
Copy Markdown
Contributor

@nextcen-dgemoets nextcen-dgemoets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could probably approve it as it is now. Great work.

What would be involved (LOE) to saving the server output (stdout+stderr) in addition to the client output in a separate file in automated_test_results/<branch>?

If we did that, then maybe the filename format would be client_JUNE_OPENWORLD_4.txt and server_JUNE_OPENWORLD_4.txt.

Finally, I guess that automated_tester.py should be updated in any PR that changes the relevant server configurations. To that end, could you update it for the current state of development so that there are two groups:

  • "FEB_OPENWORLD", "JUNE_OPENWORLD", "APRIL_OPENWORLD", testing True; and
  • "FEB_OPENWORLD", "JUNE_OPENWORLD", "APRIL_OPENWORLD", testing False

We are in a strange case where the DEFAULT configuration isn't currently supported.

Comment thread automated_tester.py
Comment on lines +59 to +72
'1': {
'cfgs': ["DEFAULT", "FEB_OPENWORLD", "JUNE_OPENWORLD"],
'testing': True,
'phase': 2
},
'2': {
'cfgs': ["DEFAULT"],
'testing': False,
'phase': 2
},
'3': {
'cfgs': ["DEFAULT"],
'testing': True,
'phase': 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use double quotes here instead of single quotes so that it can be pasted into the "groups" section of automated_testing_config.json? We'll still have to change True to true, but it gets us most of the way there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants