Skip to content

[FEATURE]: Duplicate Issue Detection and labeler as Possible Duplicate using semantic analysis #65

@aniket866

Description

@aniket866

Feature and its Use Cases

I explored two possible approaches for implementing a duplicate issue detector that labels new issues as “possible duplicate” with an X% similarity score.

This issue is to compare both approaches before finalizing implementation at the org level.

Problem Statement

  • **Currently, contributors must manually search through existing issues before opening a new one.

  • This is time-consuming and still leads to duplicate issues.**

  • We need an automated way to:

  1. Detect similar issues when a new issue is opened

  2. Suggest possible duplicates (without auto-closing)

  3. Label them clearly.

  4. The system should assist, not enforce.

### Approach 1: Using CodeRabbit (Issue Enrichment)

  • Org template: Template-Repo/.coderabbit.yaml

How it Works

  • Configure duplicate detection via .coderabbit.yaml
  • CodeRabbit enriches issues automatically
  • Can be rolled out across org via template repo

Pros:

  • Centralized configuration (via template repo)

  • Org-wide scalability

  • No need to maintain custom workflows per repo

  • AI-powered enrichment (context-aware analysis)

  • Cleaner long-term maintenance

Cons

  • May not allow fine-grained control over scoring logic

  • Requires org-level alignment before rollout

Approach 2: Using GitHub Action Bot (Custom Workflow)

  • Implemented example in PictoPy testing issue.

How it Works

  • Custom GitHub Action triggers on issues: opened

  • Compares issue title + body with existing issues

  • Calculates similarity score

  • Comments with similar issues

  • Labels as possible duplicate

Pros

  • Full control over logic and scoring

  • Customizable similarity threshold

  • Can mark as “X% duplicate”

  • No dependency on third-party enrichment

  • Transparent workflow

Cons

  • Needs per-repo workflow unless added to template repo

  • Maintenance burden on org

  • AI capability depends on implementation quality

  • Might require API rate handling

  • Less intelligent compared to AI-native enrichment (unless enhanced)

Additional Context

this is for duplicate issue detector and labelling as possible duplicate ,using github action bot

Image

so

Code of Conduct

  • I have joined the Discord server and will post updates there
  • I have searched existing issues to avoid duplicates

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions