Skip to content

Inquiry Regarding Vocabulary Size in CodonBERT PyTorch Model #6

@jiaqihuang01

Description

@jiaqihuang01

I would like to express my gratitude for your excellent work on CodonBERT. I have been thoroughly impressed by your research and the accompanying code.

However, I have encountered a discrepancy that I would like to clarify. In your paper and code, the vocabulary size is mentioned as 555+5=130, based on the characters 'A', 'U', 'G', 'C', and 'N'. Yet, in the CodonBERT PyTorch model you provided, the vocabulary size is set to 69.

Could you please explain the rationale behind this difference in vocabulary size? Understanding this would greatly help me in comprehending and utilizing your model more effectively.

Thank you in advance for your assistance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions