I would like to express my gratitude for your excellent work on CodonBERT. I have been thoroughly impressed by your research and the accompanying code.
However, I have encountered a discrepancy that I would like to clarify. In your paper and code, the vocabulary size is mentioned as 555+5=130, based on the characters 'A', 'U', 'G', 'C', and 'N'. Yet, in the CodonBERT PyTorch model you provided, the vocabulary size is set to 69.
Could you please explain the rationale behind this difference in vocabulary size? Understanding this would greatly help me in comprehending and utilizing your model more effectively.
Thank you in advance for your assistance.
I would like to express my gratitude for your excellent work on CodonBERT. I have been thoroughly impressed by your research and the accompanying code.
However, I have encountered a discrepancy that I would like to clarify. In your paper and code, the vocabulary size is mentioned as 555+5=130, based on the characters 'A', 'U', 'G', 'C', and 'N'. Yet, in the CodonBERT PyTorch model you provided, the vocabulary size is set to 69.
Could you please explain the rationale behind this difference in vocabulary size? Understanding this would greatly help me in comprehending and utilizing your model more effectively.
Thank you in advance for your assistance.