Skip to content

feat: added solution#1

Open
In-Saiyan wants to merge 2 commits intoiiitl:masterfrom
In-Saiyan:master
Open

feat: added solution#1
In-Saiyan wants to merge 2 commits intoiiitl:masterfrom
In-Saiyan:master

Conversation

@In-Saiyan
Copy link

Summary

This is the best solution in the language_wars hosted by FOSS wing of Axios, The Technical Club of IIITL.

Parallel Partitioning:

  • Files are partitioned in parallel, and large files are processed in parallel chunks.

  • Each chunk is aligned to word boundaries to avoid splitting words.

Controlled Disk Usage:

  1. Only PARTITIONS files are open at any time.

  2. No temp file explosion, even with many files or huge inputs.

  3. Disk usage and parallelism can be tuned via PARTITIONS and CHUNK_SIZE.

Efficient Deduplication & Sorting:

Each partition is deduplicated and sorted in parallel using FxHashSet and sort_unstable.

Robust Multiway Merge:

Final output is produced via a memory-efficient multiway merge of sorted partitions.

Automatic Cleanup:

All temporary files are removed after processing.

Usage

  1. Tune PARTITIONS and CHUNK_SIZE at the top of the file for your hardware and dataset.
  2. Place the compiled binary in the same location as the test_case folder containing the test files.

Example

const PARTITIONS: usize = 328;
const CHUNK_SIZE: usize = 256 * 1024 * 1024;

Dependencies

  • Uses rayon for parallelism.
  • Uses fxhash for fast hash sets.

Potential Improvements

  1. Implementation can be a bit better.
  2. Later it was realized that using tree like data structures like tree set or btree set would be better for smaller testcases, but overall FxHashSets + Sorting seems to be the most efficient anyways.

@rootCircle
Copy link
Member

Can you benchmark it against https://github.com/rootCircle/language_wars? (It's based on your implementation with minor changes)

@Thunder-Blaze
Copy link

Thunder-Blaze commented May 10, 2025

image

demn, 300ms faster on my machine

@rootCircle
Copy link
Member

A good way to test is running ./judging.sh as is, (It will be slow for first run tho)

@rootCircle
Copy link
Member

Also, please check with bigger files as well!

@Thunder-Blaze
Copy link

Thunder-Blaze commented May 10, 2025

A good way to test is running ./judging.sh as is, (It will be slow for first run tho)

Sure, I'll have Aryan do that ig, my storage will run out with these test files lol

@rootCircle
Copy link
Member

Run ./clean.sh if you want to remove temporary files!

@rootCircle
Copy link
Member

Any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants