Skip to content

feat: Expose bcf_hdr_remove via remove_header#323

Open
fxwiegand wants to merge 2 commits intobrentp:mainfrom
fxwiegand:expose_bcf_hdr_remove
Open

feat: Expose bcf_hdr_remove via remove_header#323
fxwiegand wants to merge 2 commits intobrentp:mainfrom
fxwiegand:expose_bcf_hdr_remove

Conversation

@fxwiegand
Copy link

We are trying to modify some headers in a project of ours and would love to keep using cyvcf2 for that. I hope this fixes #237 and allows removing and then re-adding a header record.

I have tried to make the new remove_header function as similar to get_header_type and the newly added test runs fine on my machine so i can at least say it works. Still, would appreciate a careful look to make sure I didn't screw up somewhere.

@gemini-code-assist
Copy link

Summary of Changes

Hello @fxwiegand, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the cyvcf2 library by enabling the programmatic removal of header records from VCF/BCF files. This feature addresses a user requirement to modify VCF headers, specifically by allowing the removal and subsequent re-addition of header records, which provides greater flexibility for advanced VCF manipulation workflows.

Highlights

  • New remove_header method: A new remove_header method has been added to the VCF class, allowing users to remove header records (INFO, FMT, FLT) by their ID.
  • Exposing bcf_hdr_remove: This new functionality exposes the underlying bcf_hdr_remove C function from the htslib library, providing direct control over VCF header manipulation.
  • New Test Case: A dedicated test, test_bcf_hdr_remove, has been added to ensure the correct behavior of the new method, verifying that header entries can be removed and that attempting to retrieve a removed entry raises a KeyError.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • cyvcf2/cyvcf2.pxd
    • Added the bcf_hdr_remove function declaration from htslib/vcf.h to expose the C-level header removal functionality.
    • Corrected minor whitespace inconsistencies in cdef extern from "string.h" and ctypedef struct htsFormat definitions.
  • cyvcf2/cyvcf2.pyx
    • Implemented the cpdef remove_header method within the VCF class, which takes a header key and an optional order of header types (INFO, FMT, FLT) to search for and remove the specified header record.
    • The remove_header method utilizes bcf_hdr_get_hrec to locate the header record and bcf_hdr_remove followed by bcf_hdr_sync to perform the removal and update the header.
    • Added error handling to raise a KeyError if the specified header key is not found.
    • Applied minor whitespace adjustments in comments and property definitions for consistency.
  • cyvcf2/tests/test_reader.py
    • Added a new test function test_bcf_hdr_remove to verify the functionality of the remove_header method. This test adds a custom INFO field, asserts its presence, removes it, and then asserts that it can no longer be retrieved, expecting a KeyError.
Activity
  • The pull request was created by fxwiegand.
  • The author states the intent is to modify headers in a project using cyvcf2 and hopes this fixes issue modify VCF header #237.
  • The author mentions that the new remove_header function was designed to be similar to get_header_type and that the new test runs successfully on their machine.
  • The author requests a careful review of the changes.
  • There are no other recorded comments or reviews on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new remove_header method to the VCF class, exposing the bcf_hdr_remove function from htslib. This is a useful addition for manipulating VCF headers. The implementation looks correct, and it's great that you've included a test for the new functionality. I have one suggestion to make the test more comprehensive by covering all header types that remove_header is designed to handle (INFO, FORMAT, and FILTER). This will increase confidence in the new feature.

Comment on lines 652 to 662
def test_bcf_hdr_remove():
"""Test that remove_header works to remove INFO and FORMAT fields from header"""
v = VCF(VCF_PATH)
v.add_info_to_header({'ID': 'TESTINFO', 'Description': 'Test INFO field',
'Type': 'String', 'Number': '1'})

assert v.get_header_type("TESTINFO")

v.remove_header('TESTINFO')
with pytest.raises(KeyError):
v.get_header_type('TESTINFO')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The added test is a great start and covers the removal of an INFO field. To make the test more robust, I suggest expanding it to also cover the removal of FORMAT and FILTER fields, as remove_header supports these types as well. This will ensure all aspects of the new function are tested.

def test_bcf_hdr_remove():
    """Test that remove_header works to remove INFO, FORMAT and FILTER fields from header"""
    # Test INFO
    v = VCF(VCF_PATH)
    v.add_info_to_header({'ID': 'TESTINFO', 'Description': 'Test INFO field',
        'Type': 'String', 'Number': '1'})

    assert v.get_header_type("TESTINFO")

    v.remove_header('TESTINFO')
    with pytest.raises(KeyError):
        v.get_header_type('TESTINFO')
    v.close()

    # Test FORMAT
    v = VCF(VCF_PATH)
    v.add_format_to_header({'ID': 'TESTFORMAT', 'Description': 'Test FORMAT field',
        'Type': 'String', 'Number': '1'})

    assert v.get_header_type("TESTFORMAT")

    v.remove_header('TESTFORMAT')
    with pytest.raises(KeyError):
        v.get_header_type('TESTFORMAT')
    v.close()

    # Test FILTER
    v = VCF(VCF_PATH)
    v.add_filter_to_header({'ID': 'TESTFILTER', 'Description': 'Test FILTER field'})

    def filter_exists(vcf, flt_id):
        for h in vcf.header_iter():
            if h.type == 'FILTER' and h.info().get('ID') == flt_id:
                return True
        return False

    assert filter_exists(v, 'TESTFILTER')
    v.remove_header('TESTFILTER')
    assert not filter_exists(v, 'TESTFILTER')
    v.close()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

modify VCF header

1 participant