Skip to content

[QUESTION] Why is output form dialect id system different from the ADIDA online interface? #141

@fadhleryani

Description

@fadhleryani

camel_tools 1.5.2 on mac 14.1.1

Using the preloaded example sentences in the ADIDA interface, for instance:
"بدي دوب قلي قلي بجنون بحبك انا مجنون ما بنسى حبك يوم"
I get a score of 95.9% for Beirut
When I try to predict the same sentence using camel_tools, I get a different result. For example, using model26 which I assume is the same as in ADIDA

from camel_tools.dialectid import DIDModel26
did = DIDModel26.pretrained()
did.predict(['بدي دوب قلي قلي بجنون بحبك انا مجنون ما بنسى حبك يوم'])

I get the following scores: [DIDPred(top='ALE', scores={'ALE': 0.2744463749182225, 'ALG': 0.0019964477414507772, 'ALX': 0.0017124356871910278, 'AMM': 0.04793813798943018, ...

Similarly using model6, I also get different and lower scores than the online interface (but at least dialect is correct).

from camel_tools.dialectid import DIDModel6
did = DIDModel6.pretrained()
did.predict(['بدي دوب قلي قلي بجنون بحبك انا مجنون ما بنسى حبك يوم'])

I get the following scores: [DIDPred(top='BEI', scores={'BEI': 0.5475092868164938, 'CAI': 0.05423997031019218, 'DOH': 0.018378809169102468, 'MSA': 0.003793013408907513, 'RAB': 0.0018751946461352397, 'TUN': 0.37420372564916876})]

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions