[Tools] Improving integration test system #314
Replies: 8 comments 19 replies
-
|
I need to metabolize this (unfortunately my mind is somewhere else right now). However, I have two small comments over the runner. These comments are outside of the scope of the configuration file, but somehow related to the development of the CI/CD.
|
Beta Was this translation helpful? Give feedback.
-
|
I haven't actually read @shundroid's new speculation integration test approach yet, but I think this is probably a very nice example for you guys to road-test this proposal? |
Beta Was this translation helpful? Give feedback.
-
|
I would suggest a few enhancements to how things are captured in the configuration file. In the following, I assume Dynamatic is invoked like I would also like to separate clearly in this discussion capability from policy: the existence of a feature (a capability) does not imply that we will use it for standard testing (a policy); yet, the absence of one of the fundamental and typical features of such a "language" may mean that tomorrow it will be abandoned when someone needs it or when a new policy is required. This post is only about capability, not policy.
"gcd": {
"option": "--buffer-algorithm fpga20",
},
"binary_search": {
"option": "--timeout 1000",
},
"*": {
"option": "--buffer-algorithm fpl22",
"option": "--timeout 200",
},
"memory/*": {
"option": "--timeout 30",
},I guess in JSON there is an issue with two keys named
"option": "--buffer-algorithm fpga20",
"option": "--timeout 200",
"__DEFAULT": {
"option": "--buffer-algorithm fpl22",
"option": "--timeout 200",
},
"__MEMORY_DEFAULT": {
"option": "--timeout 30",
},
"gcd": {
"option": "--buffer-algorithm fpga20",
},
"binary_search": {
"option": "--timeout 1000",
},
"*": {
"macro": "__DEFAULT",
},
"memory/*": {
"macro": "__DEFAULT",
"macro": "__MEMORY_DEFAULT",
},Note that
Whether this is good policy or not, it is open for discussion. I think it may be handy that a missing include is reported as a warning but not considered an error.
"gcd": {
...
"label": "pull",
"label": "performance",
"label": "nightly",
"label": "lsq",
},
"binary_search": {
...
"label": "performance",
"label": "nightly",
},
"memory/*": {
...
"label": "pull",
"label": "nightly",
"label": "lsq",
},The one could run
"__BUFFER_LANA": {
"option": "--buffer-algorithm fpga20",
},
"__BUFFER_CARMINE": {
"option": "--buffer-algorithm fpl22",
},
"__CONVERSION_LANA": {
"option": "--conversion cfg",
},
"__CONVERSION_AYA": {
"option": "--conversion ftd",
},
"__DEFAULT_PULL": {
"macro": "__BUFFER_LANA",
"macro": "__CONVERSION_LANA",
"label": "pull",
},
"gcd": {
"macro": "__DEFAULT_PULL",
"label": "gcd_basic",
},
"gcd": {
"macro": "__BUFFER_CARMINE",
"macro": "__CONVERSION_AYA",
"label": "gcd_advanced",
"label": "test_ftd",
},Since they entries have no name and the source files are the same, one need to rely on the existence of labels to identify them. Is it wise? I guess it is debatable.
"gcd": {
"macro": "__DEFAULT_PULL",
"macro": "__CONVERSION_AYA",
"label": "gcd_paolo_stuff",
"nolabel": "pull",
"nooption": "--timeout",
},The semantic should be defined properly but at worst it could be "if there is a label
"*": {
"command": "dynamatic %option%",
},
"__SPECULATION": {
"command": "my_script %option%",
"label": "speculation",
},
"single_loop": {
"macro": "__SPECULATION",
"label": "shun_private",
} |
Beta Was this translation helpful? Give feedback.
-
|
Hi All! I have a few points, splitting them out to make replies easier.
It is my understanding that, currently, this is not the case. The main The closest thing currently possible to the above is to write all commands that the # Write .dyn script with appropriate source file name
dyn_file = DYNAMATIC_ROOT / "build" / f"test_{id}.dyn"
write_string_to_file(SCRIPT_CONTENT.format(src_path=c_file), dyn_file)
# ... snip ...
with open(Path(out_dir) / "dynamatic_out.txt", "w") as stdout, \
open(Path(out_dir) / "dynamatic_err.txt", "w") as stderr:
# Run test and output result
exit_code = run_command_with_timeout(
DYNAMATIC_COMMAND.format(script_path=dyn_file),
timeout=timeout,
stdout=stdout,
stderr=stderr
)I agree that this is not optimal. |
Beta Was this translation helpful? Give feedback.
-
To be fully JSON "compliant"1, the top level needs to be either a Still, many JSON parses would allow it, and simply implicitly add the enclosing I personally would not rely on it.
Yes this is also unfortunately not allowed in JSON. It would have to be an "*": {
"options": [
"--buffer-algorithm fpl22",
"--timeout 200"
]
},For something that primarily needs to be hand-written, I personally prefer YAML - but deciding on a serialization/config format tends to be as divisive as picking between a certain two text editors, so I am sure that others will have other preferences. Footnotes
|
Beta Was this translation helpful? Give feedback.
-
|
I am slightly worried that forcing everything into a declarative format (like a json file) is not very future proof: What if we want to add a style of integration test in the future that does not fit into the datamodel we define for this setup? We would have to re-write the integration setup from scratch. Also I worry that it would effectively amount to writing a custom json-dsl, which makes the setup very hard to learn for new contributors. While I appreciate the simplicity of adding a new integration test by writing a few lines of JSON, I would prefer a setup that makes adding basic integration tests simple while not preventing more complicated setups. I get the feeling that we are re-inventing the wheel here. There are existing test frameworks out there that have seen a lot of development. Would it be possible to just use one of them? We could simply define each integration test as a small function that calls the relevant tools with the relevant inputs in one such framework. To make this simple, we could define the parameters as classes/types, and provide utility functions that abstract away the repeated code. Typical integration test frameworks would provide the following benefits:
As a rough example, here is what I imagine this could look like. Note that I used GoogleTest because I am familiar with it from XLS, but it is not the only choice. Still - I really would push for a C++ testing framework. Everything that follows is rough pseudo code to give you an idea. I did not test it. TEST(BasicIntegrationTests, GCD) {
DynamaticFlags flags = DynamaticFlags(); // Use default flags
RunIntegrationTest("gcd.c", flags);
}Here is how some of the points mentioned above could map to this framework:
Tests can pull in the default flags and override them at will, or even define their own flags from scratch: TEST(BasicIntegrationTests, gcd) {
DynamaticFlags flags = DynamaticFlags(); // Use default flags
flags.buffer_algo = "fpga20";
RunIntegrationTest("gcd.c", flags);
}
TEST(BasicIntegrationTests, binary_search) {
DynamaticFlags flags = DynamaticFlags(); // Use default flags
flags.timeout = 1000;
RunIntegrationTest("binary_search.c", flags);
}
TEST(BasicIntegrationTests, schilk_special) {
DynamaticFlags flags = {
.buffer_algo = "schilks_secret_buffer_algo",
.timeout = -1,
...
}
RunIntegrationTest("top_secret.c", flags);
}
The simple approach is a loop: TEST(BasicIntegrationTests, gcd_sweep) {
for (auto buffer_algo: { "fpga20", "fpl22" }) {
DynamaticFlags flags = DynamaticFlags(); // Use default flags
flags.buffer_algo = buffer_algo;
RunIntegrationTest("gcd.c", flags);
}
}However, this would run sequentially. Google test provides a framework for parametric tests that allows a set of parameters to be applied to multiple test suites and the individual test/parameter pairs to run in parallel. The following sweeps both class SweepBufferAlgoFixture : public testing::TestWithParam<std::string> {
};
TEST_P(SweepBufferAlgoFixture, gcd) {
std::string buffer_algo = GetParam();
DynamaticFlags flags = DynamaticFlags();
flags.buffer_algo = buffer_algo;
RunIntegrationTest("gcd.c", flags);
}
TEST_P(SweepBufferAlgoFixture, binary_search) {
std::string buffer_algo = GetParam();
DynamaticFlags flags = DynamaticFlags();
flags.buffer_algo = buffer_algo;
RunIntegrationTest("binary_search.c", flags);
}
INSTANTIATE_TEST_SUITE_P(
BufferAlgoSweep,
SweepBufferAlgoFixture,
testing::Values("fpga20", "fpl22")
);
Gtest organizes tests into test suites that can be run separately. By default all tests are run, but if we have a known-broken test, It is also possible to skip tests dynamically. For example a test relying on gurobi could be skipped if gurobi is not available.
This maps nicely to c++ inheritance. Imagine the following structure that roughly mimics the tests you have described: └── integration_tests
├── binary_test.c
├── gcd.c
├── memory
│ ├── some_mem_related_test.c
│ └── TEST_SUITE.c
└── TEST_SUITE.cc(The In TEST(BasicIntegrationTests, gcd) {
DynamaticFlags flags = DynamaticFlags(); // Use default flags
flags.buffer_algo = "fpga20";
RunIntegrationTest("gcd.c", flags);
}In class MemoryDynamaticFlags : public DynamaticFlags {
public:
MemoryDynamaticFlags() {
// All memory tests use timeout 300
timeout = 300;
}
virtual ~MemoryDynamaticFlags() = default;
};
TEST(MemIntegrationTests, some_mem_related_test) {
DynamaticFlags flags = MemoryDynamaticFlags(); // use memory-specific flags
RunIntegrationTest("some_mem_related_test.c", flags);
}
Google test provides a test filter flag:
See loops/parametrics above. Regarding benchmarking: I don't know the specifics well enough to provide a quick example, but I hope you could imagine how defining such test flows can easily be done in gtest. Let me know what you think :)
|
Beta Was this translation helpful? Give feedback.
-
|
@schilkp Thank you for your great idea! I have a quick question about tagging/grouping tests in GoogleTest. I know For example, is there a way for each test to have multiple labels if we use --gtest_filter or other possible methods? Just for reference for other people, people here seem to appreciate flexible tagging/grouping:
|
Beta Was this translation helpful? Give feedback.
-
|
Thank you for your comments! I am glad to see that a great discussion came out of this proposal. First of all, I would say that the most important point given here is to stop reinventing the wheel and use an existing framework. This is my oversight since I have not had experience with such tools in the past. Hence, I cannot really comment on the concrete details of moving to such a framework (although some were mentioned here), but I understand at a high level that it is obviously a better solution. My main concern with the previous approach was the potential feature creep making the tool a nightmare to use. Also JSON configurations are a disaster waiting to happen (apparently it wouldn't be the first time, as @murphe67 mentioned). Another thing that worried me is that fact that @shundroid's testing requires a completely different compilation flow, i.e. overriding
However, I think that this other point given by @schilkp is much more critical:
I will put it this way: currently I have a Python script ( Finally, I would like to say something about "groups of tests", or whatever you want to call them. My problem is related to the following point by @schilkp:
So, if I am developing a feature and I make a set of tests for that purpose ("my tests"), it does indeed make sense that I would only want to run these tests (to save time while testing locally). Then, when I am satisfied, I would also need to make sure that all other (let's call them "regular") tests still pass. I believe that we all understand that this is the purpose of testing. And now when everything works and I merge my feature to main, I would say that it makes sense that "my tests" are added to the set of "regular" tests, since my feature is now a part of main. This is because these tests will need to be "regular" for everyone else, i.e. they will need to be run by everyone else developing other features, to make sure that their feature does not break my feature. For this reason, I do not understand the need to have many different groups of tests. I understand that this is a matter of policy and not capability, as @paolo-ienne said, but I find it hard to justify going out of our way to add capabilities that enable policies that defeat the purpose of (integration) testing. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone!
We recently introduced an Actions CI workflow and along with it, a Python script for running integration tests in a nicer way than before. However, this is not enough, because the new integration script only runs integration tests using "vanilla" Dynamatic, i.e. using only the default buffering algorithm when compiling.
This begs the question of how to make the script more flexible without making it complicated and tedious for use. For example, if I am developing some feature and (for whatever reason) I need to run a set of 10 tests with
--buffer-algorithm fpga20, I now need to run both the "official" set of tests (to make sure that I did not break anything) and my "custom" set of tests. So, not only does there need to be a way to include/exclude tests (that is better than the one we have right now with--listand--ignore), but it also needs to allow having various options for each test. However, this needs to be done in a way that would not ruin the usability of the integration system by making it horribly convoluted and requiring lots of manual intervention.Below I will give my suggestion, so please comment on it and propose your own if you have any. My absolute priority is usability; so I would be really happy if you suggested something that better minimizes the amount of manual labor required for using this system. So, I would like you, after reading this, to answer the question: Would you use this in your day-to-day work, or would you consider it tedious and write your own specific shell script instead?
Proposal
In order to allow running tests in with custom settings, there would be a way to define them in a configuration file. For example, having a file
integration-test/config.jsonwith the following content:{ "gcd": { "buffer-algorithm": "fpga20" }, "binary_search": { "timeout": 1000 }, "*": { "buffer-algorithm": "fpl22", "timeout": 200 }, "memory/*": { "timeout": 300 } }I hope that the meaning is self-explanatory; the point is that individual tests could be made to have custom properties and wildcards would be supported to avoid unnecessary repetition. This would also allow managing tests that are grouped in a subfolder, as shown in the last example. Also, at some point in the future, we plan to add a way to consistently benchmark tests in the Actions workflow (so you would get feedback that your PR ruined/improved the performance), so this would be easily extended to define benchmarking-related settings (e.g. "run test
binary_searchwith all buffering algorithms, but only compare performance using--buffer-algorithm fpga20"; this is just an example that I randomly thought about, please forgive me if it is nonsense).Such a configuration would allow the tests to be run very simply, the configuration file would be written and put in the
integration-testfolder and the script would read it when running and apply the correct settings. However, one concern is that this file may grow too large, and a proposed solution is to have configuration files per-test, for example, a fileintegration-test/gcd/config.jsonthat only contains settings for thegcdtest. On the other hand, my opinion is that opening different folders and editing different files is more tedious than scrolling through a single file, so I am not a big fan of this.Another goal would be to allow defining different sets of tests, in a more flexible way than the current run/ignore lists. For example, having a file
integration-test/sets.jsonwith the following contents:{ "ignore": ["gcd", "memory/test_memory_1"], "default": ["*", "!ignore"], "paolos_tests": ["kernel_3mm_float", "binary_search"] }With this, sets of tests could be run using:
python run_integration.pywould run thedefaultgrouppython run_integration.py paolos_testswould run thepaolos_testsgrouppython run_integration.py ignorewould run theignoregroup (i.e. the "ignored" tests can also be run as any other set).A problem that arises is how to reconcile the proposed
config.jsonandsets.json, since, for example, what if Paolo wanted to runbinary_searchwithtimeout=80, buffer-algorithm=fpga20when running the setpaolos_tests? One thing that comes to mind is to allow aliases to be defined for tests inconfig.json, e.g.{ "kernel_3mm_float-paolo": { "actual-path": "kernel_3mm_float", "timeout": 80, "buffer-algorithm": "fpga20" } }and then add
kernel_3mm_float-paoloto the list insets.json.Finally, there is a problem of the consistency of these configuration files between main and the current branch. If I change them because I want to run a bunch of tests, what happens when I want to merge? Do I have to just never commit the changes? This is also something that was brought up in a discussion with Paolo: with these integration tests, there should be a policy on what happens when someone adds new tests on their branch. In my opinion, if my feature required additional tests, it would make sense that these tests are added to the "default" set of tests and as such pushed to main. This is because these tests would make sure that no one breaks my feature later.
Conclusion
I understand that the implementation of these wildcards and whatever I mentioned above might be a bit tedious and not pretty, but this is just a suggestion that showcases its user experience, since that is my priority as I said before. So finally, I would like to ask you one more time to be very critical towards my idea and any other proposal that may be given by other people, since the goal is that this system could be used by even the laziest programmer, without him or her getting discouraged due to complexity. Honestly, being a lazy programmer myself, this seems like a bit too much manual labor, so I am looking forward to suggestions and improvements :)
Beta Was this translation helpful? Give feedback.
All reactions