Submit final solution for Code Challenge 2025 #7

bahman-farhadian · 2025-12-19T00:04:57Z

Implemented robust parent-child kinship matching using likelihood ratios
Added allele frequencies for all 21 loci
Per-locus classification: identical, partial, mutation, mismatch, missing
CLR based on allele frequencies with penalties for mutations/mismatches
Filter: require ≥3 partial matches to confirm parent-child relationship
Prune unrelated candidates (>2 mismatches)
Score boosted by number of partial matches
Added detailed explanation in Hossein_Hamzehei_Bahman_Farhadian_Explanation.md
Included test results and data batch

- Implemented robust parent-child kinship matching using likelihood ratios - Added allele frequencies for all 21 loci - Per-locus classification: identical, partial, mutation, mismatch, missing - CLR based on allele frequencies with penalties for mutations/mismatches - Filter: require ≥3 partial matches to confirm parent-child relationship - Prune unrelated candidates (>2 mismatches) - Score boosted by number of partial matches - Added detailed explanation in Hossein_Hamzehei_Bahman_Farhadian_Explanation.md - Included test results and data batch

- Implement caching and an inverted index to pre-filter candidates, reducing runtime from ~60s to ~5s. - Calculate allele frequencies dynamically from the database instead of using hardcoded values for improved robustness.

bahman-farhadian · 2025-12-19T13:11:16Z

Hi Ali,

Thanks for reviewing my PR! The CI result was:

Correct matches: 27/35
Accuracy       : 77.1%
Execution time : 3.86 seconds
Final score    : 97.1/120

However, my local testing shows better results:

Metric	Highest	Average	Lowest
Accuracy	100% (35/35)	88.5% (31/35)	71.4% (25/35)
Execution time	8.55 seconds	~9.22 seconds	10.77 seconds
Final score	120/120	108.5/120	91.4/120

Is this variance due to random dataset generation, or could my approach be improved? If there's room for improvement, I'd appreciate any suggestions on how to make the function more robust and reliable with less variance.

Also, would a CI re-run be possible?

Thank you.

tavallaie · 2025-12-19T14:33:38Z

Did you run it over 500k dataset, like the problem requiring?
CI only runs over 5k

bahman-farhadian · 2025-12-19T14:47:41Z

Yes, I ran it using the make all command, which generates a data directory containing the str_database.csv file. Each time, this file includes 500,000 rows. Did I make a mistake here during execution?

tavallaie · 2025-12-20T13:49:38Z

You made several unrelated changes that introduced errors:

First, you should not have modified the function that calls the match function.
Second, according to the challenge requirements, you should not perform simple pairwise matching. Instead, you must build an index and perform matching using that index.
You should not have changed the README or uploaded any ZIP files. Your pull request should include only the updated match function.

…matching

bahman-farhadian · 2025-12-20T14:24:59Z

Hi Ali,
Sorry about that. I thought including my AI interaction logs and a short report would be helpful for transparency, but I understand now that was out of scope.
I've cleaned up my commit and addressed all points:

✓ Removed ZIP files
✓ Only participant_solution.py is changed
✓ No modifications to find_matches()
✓ Uses index-based matching with allele_index for fast candidate lookup

Thanks for the feedback!

…rior

bahman-farhadian · 2025-12-20T14:58:35Z

I also noticed my posterior probability calculation wasn't following the Bayesian formula with a 50% prior — I've fixed that now.

I tested the function 100 times against the 50k dataset:

Max Accuracy	Average Accuracy	Min Accuracy	Variance
100.0%	89.5%	77.1%	23.93

The average accuracy looks promising, but I'm still seeing quite a bit of variance across runs (min 77.1%, max 100%). I'm trying to make the matching behavior more consistent and reliable.

I know you're busy, but if you have any suggestions or ideas on how to stabilize the performance and reduce the variance, I would really appreciate your guidance.

Thanks a lot for your time and help!

bahman-farhadian · 2025-12-20T21:12:50Z

Just a quick update, Ali: I've updated the matching algorithm. I added a filter to reject self-matches and identical twins using a ratio-based approach: if a candidate has >90% identical loci AND fewer than 3 partial matches, it's rejected as likely same-person/twin rather than parent-child.

The filter logic:

if identical_ratio > 0.9 and partial_count < 3:
    return None  # Reject self-match/twin

This is based on the fact that true parent-child pairs have many partial matches (different strings but shared allele), while self-matches have almost all identical matches.

Hossein Hamzehei added 2 commits December 19, 2025 03:32

perf: Rewrite solution with caching and indexing (2nd attempt)

f4f4d48

- Implement caching and an inverted index to pre-filter candidates, reducing runtime from ~60s to ~5s. - Calculate allele frequencies dynamically from the database instead of using hardcoded values for improved robustness.

Hossein Hamzehei added 2 commits December 20, 2025 17:46

fi: remove unnecessary files.

482dbe8

fix: clean up PR, keep only participant_solution.py with index-based …

96e5d8a

…matching

fix: correct posterior probability to use Bayesian formula with 50% p…

49ac310

…rior

feat: add self-match/twin filter with ratio-based detection

6521fb7

tavallaie merged commit 6521fb7 into pyday-iran:main Dec 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submit final solution for Code Challenge 2025 #7

Submit final solution for Code Challenge 2025 #7

Uh oh!

bahman-farhadian commented Dec 19, 2025

Uh oh!

bahman-farhadian commented Dec 19, 2025

Uh oh!

tavallaie commented Dec 19, 2025

Uh oh!

bahman-farhadian commented Dec 19, 2025

Uh oh!

tavallaie commented Dec 20, 2025

Uh oh!

bahman-farhadian commented Dec 20, 2025

Uh oh!

bahman-farhadian commented Dec 20, 2025

Uh oh!

bahman-farhadian commented Dec 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Submit final solution for Code Challenge 2025 #7

Submit final solution for Code Challenge 2025 #7

Uh oh!

Conversation

bahman-farhadian commented Dec 19, 2025

Uh oh!

bahman-farhadian commented Dec 19, 2025

Uh oh!

tavallaie commented Dec 19, 2025

Uh oh!

bahman-farhadian commented Dec 19, 2025

Uh oh!

tavallaie commented Dec 20, 2025

Uh oh!

bahman-farhadian commented Dec 20, 2025

Uh oh!

bahman-farhadian commented Dec 20, 2025

Uh oh!

bahman-farhadian commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bahman-farhadian commented Dec 20, 2025 •

edited

Loading