You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Pushing update for MetaCAT
- Addressing the multiple zero-division-error warnings per epoch while training
- Accommodating the variations in category name and class name across NHS sites
* Adding comments
* Pushing requested changes
* Pushing type fix
* Pushing updates to metacat config
In the case that one specified in self.general.category_name parameter does not match the data, this ensures no error is raised and it is automatically mapped
36
+
"""
30
37
category_value2id: Dict= {}
31
38
"""Map from category values to ID, if empty it will be autocalculated during training"""
39
+
alternative_class_names: List[List] = [[]]
40
+
"""List of lists that stores the variations of possible class names for each class mentioned in self.general.category_value2id
41
+
42
+
Example: For Presence task, the class names vary across NHS sites.
43
+
To accommodate for this, alternative_class_names is populated as: [["Hypothetical (N/A)","Hypothetical"],["Not present (False)","False"],["Present (True)","True"]]
44
+
Each sub list contains the possible variations of the given class.
45
+
"""
32
46
vocab_size: Optional[int] =None
33
47
"""Will be set automatically if the tokenizer is provided during meta_cat init"""
logger.info("The category name provided in the config - '%s' is not present in the data. However, the corresponding name - '%s' from the category_name_mapping has been found. Updating the category name...",category_name,*category_matching)
252
+
g_config['category_name'] =category_matching[0]
253
+
category_name=g_config['category_name']
254
+
else:
255
+
raiseException(
256
+
"The category name does not exist in this json file. You've provided '{}', while the possible options are: {}. Additionally, ensure the populate the 'alternative_category_names' attribute to accommodate for variations.".format(
"The number of classes set in the config is not the same as the one found in the data: %d vs %d",self.config.model['nclasses'], len(category_value2id))
273
-
logger.warning("Auto-setting the nclasses value in config and rebuilding the model.")
logger.info("Class name '%s' does not exist in the data; however a variation of it '%s' is present; updating it...",_class,class_name_matched)
206
+
else:
207
+
raiseException(
208
+
f"The classes set in the config are not the same as the one found in the data. The classes present in the config vs the ones found in the data - {set(category_value2id.keys())}, {category_values}. Additionally, ensure the populate the 'alternative_class_names' attribute to accommodate for variations.")
209
+
else:
210
+
raiseException(f"The classes set in the config are not the same as the one found in the data. The classes present in the config vs the ones found in the data - {set(category_value2id.keys())}, {category_values}. Additionally, ensure the populate the 'alternative_class_names' attribute to accommodate for variations.")
# Else throw an exception since the labels don't match
215
+
else:
216
+
raiseException(
217
+
f"The classes set in the config are not the same as the one found in the data. The classes present in the config vs the ones found in the data - {set(category_value2id.keys())}, {category_values}. Additionally, ensure the populate the 'alternative_class_names' attribute to accommodate for variations.")
218
+
219
+
# Else create the mapping from the labels found in the data
220
+
else:
221
+
forcincategory_values:
222
+
ifcnotincategory_value2id:
223
+
category_value2id[c] =len(category_value2id)
224
+
logger.info("Categoryvalue2id mapping created with labels found in the data - %s", category_value2id)
0 commit comments