-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Test the event lUwwjfR.json, where the model raw output of Admin_areas is "Administrative_Areas": [
"Philippines",
"Taiwan",
"China"
],
the parse_events.py output is
['Tukchenzam', 'Valley County, Nebraska, United States', 'Taiwan', 'Philippines', 'Aksai Chin', 'China']
and I add logger in the parse_events.py for the steps below
- the Admin_Area_Norm process, the output is fine
parse_events: 2025-09-18 11:49:11 INFO Ensuring that all admin area data in Administrative_Areas is of type <list>
parse_events: 2025-09-18 11:49:11 INFO Normalizing administrative areas...
parse_events: 2025-09-18 11:49:11 INFO Processing area: Philippines
parse_events: 2025-09-18 11:49:11 INFO Processing area: Taiwan
parse_events: 2025-09-18 11:49:11 INFO Processing area: China
- in the infer the location to country step
parse_events: 2025-09-18 11:49:11 INFO STEP: Infer country from list of locations
parse_events: 2025-09-18 11:49:11 INFO Getting GID from GADM for Administrative Areas
parse_events: 2025-09-18 11:49:12 INFO STEP: Infer country result ['Philippines', 'Pa-li-chia-ssu', 'Taiwan', 'Shaksgam Valley', 'China', 'Aksai Chin']
parse_events: 2025-09-18 11:49:12 INFO Processing GID area: Philippines
parse_events: 2025-09-18 11:49:12 INFO Processing GID area: Pa-li-chia-ssu
parse_events: 2025-09-18 11:49:12 INFO Processing GID area: Taiwan
parse_events: 2025-09-18 11:49:12 INFO Processing GID area: Shaksgam Valley
parse_events: 2025-09-18 11:49:12 INFO Processing GID area: China
parse_events: 2025-09-18 11:49:12 INFO Processing GID area: Aksai Chin
Because China has four GIDs, apart from China Mainland, other GIDs are for some conflict areas between border of India ect. ['Z03', 'CHN', 'Z08', 'Z02']
So, I update the get_gid_0 function in normalize_locations.py where only isalpha GIDs allows in this country infering process.
then I tested event OzSN6a4.json, where the model raw output "Administrative_Areas": [
"Bangladesh",
"India",
"Sri Lanka",
"Yemen",
"Pakistan",
"Vietnam",
"Thailand",
"Burma",
"Nepal"
],
parsed output is ['San Francisco, California, United States', 'Pakistan', 'Sri Lanka', 'Jammu and Kashmir', 'Kaurik', 'Vietnam', 'Bangladesh', 'Arunachal Pradesh', 'India', 'Azad Kashmir', 'Myanmar', 'Nepal', 'Yemen', 'Thailand', 'Lapthal post'] , since India also has several GIDs apart from the Main land
and now, the output is ['Thailand', 'Sri Lanka', 'Bangladesh', 'Yemen', 'Vietnam', 'Myanmar', 'India', 'Nepal', 'Pakistan']
for the GIDs contain digits , 8 GIDs in total, and they are not countries, all of them are conflict areas.
