Skip to content

Enforce sample numbering (controversial?)#30

Open
Matistjati wants to merge 1 commit intoKodsport:masterfrom
Matistjati:sample-numbering
Open

Enforce sample numbering (controversial?)#30
Matistjati wants to merge 1 commit intoKodsport:masterfrom
Matistjati:sample-numbering

Conversation

@Matistjati
Copy link
Member

Closes #25.

First, a minor sanity check: check that every .ans file in sample has a corresponding .in file.

For the controversial part: Kattis is now exposing more details about testcases in their UI.
image

Here, it seems like the participant was 3 test cases off from AC. However, they are actually only one testcase off: the two testcases at the end are the sample, because i named them 1.in and 2.in. I think that this is somewhat common, and that the fix is to number samples.

Because testdata_tools uses sample both as the "input" and "output" folder for samples, we don't really have any choice but to rename the samples.

Thus, this is not backwards-compatible, in the sense that you if you use the new gen.sh on old problems, you will be forced to do some cleanup.

Sample run:

matistjati@DESKTOP-BJV27M6:~/po/swedish-olympiad-2026/online/absolutbio/data$ rm -rf sample && git restore sample && ./generator.sh
ERROR: Some samples did not start with three digits followed by -
ERROR: Please change your samples to use the following the sample commands:
sample 001-1
sample 002-2
sample 003-3
sample 004-4
matistjati@DESKTOP-BJV27M6:~/po/swedish-olympiad-2026/online/absolutbio/data$

@Matistjati
Copy link
Member Author

What do you think? Pinging some people who actively use problemtools: @hairez @Tagl . Any others may also chime in.

@Matistjati
Copy link
Member Author

Matistjati commented Dec 22, 2025

For reference, Kattis has implemented the correct behavior for Legacy: https://www.kattis.com/problem-package-format/spec/legacy.html#test-data-groups
Test cases and groups will be used in lexicographical order on file base name

But this will be fixed in 2025-09. https://www.kattis.com/problem-package-format/spec/2025-09.html#test-cases
Here, base name is defined to be the relative path from the data directory to the test case input file, without extensions. This is the name of the test case.

I'm considering closing this.

@simonlindholm
Copy link
Collaborator

IIRC we use 1.in, 2.in, ... because it is required by some other tool, maybe problem2pdf? We could solve this by using a different name for the symlink than for the actual file.

@Matistjati
Copy link
Member Author

Matistjati commented Dec 22, 2025

IIRC we use 1.in, 2.in, ... because it is required by some other tool, maybe problem2pdf? We could solve this by using a different name for the symlink than for the actual file.

Can't believe I missed that. That's a great solution!

Also, problemtools already warns for ans files without .in, so that part isn't needed either (ERROR No matching input file for answer '/home/matistjati/po/swedish-olympiad-2026/online/absolutbio/data/sample/x.ans')

@Matistjati
Copy link
Member Author

I'm not sure that that's the solution we want either. If we do this, we lose the property that two files should have the same content iff they have the same name. For example, this triggers Problemtool's warning:
WARNING Identical input files: '['data/sample/1.in', 'data/secret/test/002-test.in']'
I guess this could be fixed by making the check symlink-aware, but I just don't think that this change is worth it, considering that it will automatically be fixed in the new problem format version.

@simonlindholm
Copy link
Collaborator

I thought that check was already symlink-aware.

            hashes = collections.defaultdict(list)
            for root, dirs, files in os.walk(self._datadir):
                for filename in files:
                    filepath = os.path.join(root, filename)
                    if filepath.endswith('.in') and not os.path.islink(filepath):
                        ...
                        hashes[filehash].append(os.path.relpath(filepath, self._problem.probdir))
            for _, files in hashes.items():
                if len(files) > 1:
                    self.warning(f"Identical input files: '{str(files)}'")

@Matistjati
Copy link
Member Author

...right, I didn't even my own error message...

What do you think, should we make this change?

@simonlindholm
Copy link
Collaborator

Sure, why not.

@jsannemo
Copy link
Member

jsannemo commented Dec 22, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

zero-pad sample names

3 participants