Fix Multiple Artifacts Processing Errors#1400
Fix Multiple Artifacts Processing Errors#1400michaelgrace72 wants to merge 5 commits intoabrignoni:mainfrom
Conversation
…checking and validation
There was a problem hiding this comment.
Pull request overview
This PR fixes 5 critical errors that caused iOS data extraction artifact processing to fail, improving the robustness of the code through defensive programming techniques.
Summary: The changes add proper error handling, type checking, and bounds checking to prevent runtime errors during iOS artifact data extraction. All fixes address specific crash scenarios encountered with iOS 15 Public Image data.
Key Changes:
- Added comprehensive bounds checking for database record access to prevent IndexError exceptions
- Implemented type validation and safe dictionary access for CloudKit participant data structures
- Fixed duplicate column header issue in Chrome login data processing
- Added table existence verification before querying associations table
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/artifacts/health.py | Added bounds checking for all 28 record indices to handle varying database schemas |
| scripts/artifacts/cloudkitSharing.py | Implemented type checking and safe dictionary access for participant data to handle string/dict type variations |
| scripts/artifacts/cloudkitParticipants.py | Added similar type validation and error handling for participant data structures |
| scripts/artifacts/chrome.py | Removed duplicate 'Browser Name' column addition to fix column header error |
| scripts/artifacts/FitnessWorkoutsLocationData.py | Added associations table existence check to prevent query errors |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| data_list.append( | ||
| (start_timestamp, end_timestamp, str(record[2]).title(), record[3], | ||
| record[4], record[5], record[6], record[7], record[8], record[9], | ||
| record[10], record[11], record[12], record[13], record[14], | ||
| record[15], celcius_temp, record[16], record[17], record[18], | ||
| record[19], record[20], record[21], record[22], device_model, | ||
| record[23], record[24], record[25], added_timestamp) | ||
| (start_timestamp, end_timestamp, str(record[2]).title() if len(record) > 2 else '', | ||
| record[3] if len(record) > 3 else '', record[4] if len(record) > 4 else '', | ||
| record[5] if len(record) > 5 else '', record[6] if len(record) > 6 else '', | ||
| record[7] if len(record) > 7 else '', record[8] if len(record) > 8 else '', | ||
| record[9] if len(record) > 9 else '', record[10] if len(record) > 10 else '', | ||
| record[11] if len(record) > 11 else '', record[12] if len(record) > 12 else '', | ||
| record[13] if len(record) > 13 else '', record[14] if len(record) > 14 else '', | ||
| record[15] if len(record) > 15 else '', celcius_temp, | ||
| record[16] if len(record) > 16 else '', record[17] if len(record) > 17 else '', | ||
| record[18] if len(record) > 18 else '', record[19] if len(record) > 19 else '', | ||
| record[20] if len(record) > 20 else '', record[21] if len(record) > 21 else '', | ||
| record[22] if len(record) > 22 else '', device_model, | ||
| record[23] if len(record) > 23 else '', record[24] if len(record) > 24 else '', | ||
| record[25] if len(record) > 25 else '', added_timestamp) | ||
| ) |
There was a problem hiding this comment.
[nitpick] The tuple construction spans multiple lines without proper formatting. The opening parenthesis on line 546 starts the tuple, but line 547 begins the first element without proper indentation alignment. This makes the code harder to read and maintain.
Consider formatting it as:
data_list.append((
start_timestamp, end_timestamp, str(record[2]).title() if len(record) > 2 else '',
record[3] if len(record) > 3 else '', record[4] if len(record) > 4 else '',
# ... rest of elements
record[25] if len(record) > 25 else '', added_timestamp
))| (start_timestamp, end_timestamp, str(record[2]).title() if len(record) > 2 else '', | ||
| record[3] if len(record) > 3 else '', record[4] if len(record) > 4 else '', | ||
| record[5] if len(record) > 5 else '', record[6] if len(record) > 6 else '', | ||
| record[7] if len(record) > 7 else '', record[8] if len(record) > 8 else '', | ||
| record[9] if len(record) > 9 else '', record[10] if len(record) > 10 else '', | ||
| record[11] if len(record) > 11 else '', record[12] if len(record) > 12 else '', | ||
| record[13] if len(record) > 13 else '', record[14] if len(record) > 14 else '', | ||
| record[15] if len(record) > 15 else '', celcius_temp, | ||
| record[16] if len(record) > 16 else '', record[17] if len(record) > 17 else '', | ||
| record[18] if len(record) > 18 else '', record[19] if len(record) > 19 else '', | ||
| record[20] if len(record) > 20 else '', record[21] if len(record) > 21 else '', | ||
| record[22] if len(record) > 22 else '', device_model, | ||
| record[23] if len(record) > 23 else '', record[24] if len(record) > 24 else '', | ||
| record[25] if len(record) > 25 else '', added_timestamp) |
There was a problem hiding this comment.
The bounds checking logic is inconsistent with DRY (Don't Repeat Yourself) principle. Each index is checked individually with len(record) > N, resulting in significant code duplication across 26+ checks. This approach is error-prone and makes the code harder to maintain.
Consider refactoring to use a helper function or list comprehension:
def safe_get(record, index, default=''):
return record[index] if len(record) > index else default
# Then use: safe_get(record, 3), safe_get(record, 4), etc.Or pad the record to expected length:
# Pad record to expected length
padded_record = list(record) + [''] * (28 - len(record))| if record_id: # Only add valid entries | ||
| user_dictionary[record_id] = [record_id, email_address, phone_number, name_prefix, first_name, middle_name, last_name, name_suffix, nickname] | ||
| except (KeyError, TypeError, AttributeError) as e: | ||
| logfunc(f'Error processing participant data: {str(e)}') |
There was a problem hiding this comment.
The error logging uses an f-string with str(e) which is redundant. When an exception is formatted in an f-string, it's automatically converted to a string.
Change:
logfunc(f'Error processing participant data: {str(e)}')To:
logfunc(f'Error processing participant data: {e}')| logfunc(f'Error processing participant data: {str(e)}') | |
| logfunc(f'Error processing participant data: {e}') |
| if record_id: | ||
| user_dictionary[record_id] = [record_id, email_address, phone_number, name_prefix, first_name, middle_name, last_name, name_suffix, nickname] | ||
| except (KeyError, TypeError, AttributeError) as e: | ||
| logfunc(f'Error processing participant data: {str(e)}') |
There was a problem hiding this comment.
The error logging uses an f-string with str(e) which is redundant. When an exception is formatted in an f-string, it's automatically converted to a string.
Change:
logfunc(f'Error processing participant data: {str(e)}')To:
logfunc(f'Error processing participant data: {e}')| logfunc(f'Error processing participant data: {str(e)}') | |
| logfunc(f'Error processing participant data: {e}') |
| if not isinstance(participant, dict) or 'UserIdentity' not in participant: | ||
| continue | ||
|
|
||
| # must be dict and have UserIdentity | ||
| user_identity = participant.get('UserIdentity', {}) | ||
| if not isinstance(user_identity, dict): | ||
| continue | ||
|
|
||
| user_record_id = user_identity.get('UserRecordID', {}) | ||
| if not isinstance(user_record_id, dict): | ||
| continue | ||
|
|
||
| record_id = user_record_id.get('RecordName', '') | ||
| lookup_info = user_identity.get('LookupInfo', {}) | ||
| email_address = lookup_info.get('EmailAddress', '') if isinstance(lookup_info, dict) else '' | ||
| phone_number = lookup_info.get('PhoneNumber', '') if isinstance(lookup_info, dict) else '' | ||
|
|
||
| name_components = user_identity.get('NameComponents', {}) | ||
| name_private = name_components.get('NS.nameComponentsPrivate', {}) if isinstance(name_components, dict) else {} | ||
| first_name = name_private.get('NS.givenName', '') if isinstance(name_private, dict) else '' | ||
| middle_name = name_private.get('NS.middleName', '') if isinstance(name_private, dict) else '' | ||
| last_name = name_private.get('NS.familyName', '') if isinstance(name_private, dict) else '' | ||
| name_prefix = name_private.get('NS.namePrefix', '') if isinstance(name_private, dict) else '' | ||
| name_suffix = name_private.get('NS.nameSuffix', '') if isinstance(name_private, dict) else '' | ||
| nickname = name_private.get('NS.nickname', '') if isinstance(name_private, dict) else '' |
There was a problem hiding this comment.
[nitpick] The nested type checks are repetitive and could be simplified. Each .get() call is followed by an isinstance() check with early return, which adds unnecessary verbosity.
Consider using a more Pythonic approach with try-except or simplifying the nested checks:
try:
user_identity = participant.get('UserIdentity', {})
record_id = user_identity.get('UserRecordID', {}).get('RecordName', '')
lookup_info = user_identity.get('LookupInfo', {})
# ... continue with safe gets
except (AttributeError, TypeError):
continueThis is more maintainable and follows Python's "easier to ask for forgiveness than permission" (EAFP) principle.
| if not isinstance(participant, dict) or 'UserIdentity' not in participant: | |
| continue | |
| # must be dict and have UserIdentity | |
| user_identity = participant.get('UserIdentity', {}) | |
| if not isinstance(user_identity, dict): | |
| continue | |
| user_record_id = user_identity.get('UserRecordID', {}) | |
| if not isinstance(user_record_id, dict): | |
| continue | |
| record_id = user_record_id.get('RecordName', '') | |
| lookup_info = user_identity.get('LookupInfo', {}) | |
| email_address = lookup_info.get('EmailAddress', '') if isinstance(lookup_info, dict) else '' | |
| phone_number = lookup_info.get('PhoneNumber', '') if isinstance(lookup_info, dict) else '' | |
| name_components = user_identity.get('NameComponents', {}) | |
| name_private = name_components.get('NS.nameComponentsPrivate', {}) if isinstance(name_components, dict) else {} | |
| first_name = name_private.get('NS.givenName', '') if isinstance(name_private, dict) else '' | |
| middle_name = name_private.get('NS.middleName', '') if isinstance(name_private, dict) else '' | |
| last_name = name_private.get('NS.familyName', '') if isinstance(name_private, dict) else '' | |
| name_prefix = name_private.get('NS.namePrefix', '') if isinstance(name_private, dict) else '' | |
| name_suffix = name_private.get('NS.nameSuffix', '') if isinstance(name_private, dict) else '' | |
| nickname = name_private.get('NS.nickname', '') if isinstance(name_private, dict) else '' | |
| user_identity = participant.get('UserIdentity', {}) | |
| user_record_id = user_identity.get('UserRecordID', {}) | |
| record_id = user_record_id.get('RecordName', '') | |
| lookup_info = user_identity.get('LookupInfo', {}) | |
| email_address = lookup_info.get('EmailAddress', '') | |
| phone_number = lookup_info.get('PhoneNumber', '') | |
| name_components = user_identity.get('NameComponents', {}) | |
| name_private = name_components.get('NS.nameComponentsPrivate', {}) | |
| first_name = name_private.get('NS.givenName', '') | |
| middle_name = name_private.get('NS.middleName', '') | |
| last_name = name_private.get('NS.familyName', '') | |
| name_prefix = name_private.get('NS.namePrefix', '') | |
| name_suffix = name_private.get('NS.nameSuffix', '') | |
| nickname = name_private.get('NS.nickname', '') |
| cursor = db.cursor() | ||
|
|
||
| if does_table_exist_in_db(healthdb_secure, 'location_series_data') == True: | ||
| if does_table_exist_in_db(healthdb_secure, 'location_series_data') == True and does_table_exist_in_db(healthdb_secure, 'associations') == True: |
There was a problem hiding this comment.
The boolean comparison == True is redundant. In Python, boolean values can be used directly in conditional statements.
Change:
if does_table_exist_in_db(healthdb_secure, 'location_series_data') == True and does_table_exist_in_db(healthdb_secure, 'associations') == True:To:
if does_table_exist_in_db(healthdb_secure, 'location_series_data') and does_table_exist_in_db(healthdb_secure, 'associations'):| if does_table_exist_in_db(healthdb_secure, 'location_series_data') == True and does_table_exist_in_db(healthdb_secure, 'associations') == True: | |
| if does_table_exist_in_db(healthdb_secure, 'location_series_data') and does_table_exist_in_db(healthdb_secure, 'associations'): |
| if does_table_exist_in_db(healthdb_secure, 'location_series_data') == True: | ||
| if does_table_exist_in_db(healthdb_secure, 'location_series_data') == True and does_table_exist_in_db(healthdb_secure, 'associations') == True: | ||
|
|
||
| # Fitness Workouts Location Data Analysis |
There was a problem hiding this comment.
The comment on line 33 is not properly indented. It should be indented to match the block it belongs to (inside the if statement).
Current:
if does_table_exist_in_db(...):
# Fitness Workouts Location Data AnalysisShould be:
if does_table_exist_in_db(...):
# Fitness Workouts Location Data Analysis| # Fitness Workouts Location Data Analysis | |
| # Fitness Workouts Location Data Analysis |
Summary
This PR fixes 5 errors that caused artifact processing to fail during iOS data extraction. All fixes have been tested successfully with iOS 15 Public Image
Artifacts Fixed
chromeLoginData - Duplicate Column Error
The all_data_headers was adding 'Browser Name' even though it already exist in data_header.
Before:
all_data_headers = lava_data_headers + ['Browser Name']After:
all_data_headers = lava_data_headerscloudkitParticipants - Type Error
Code assumed participant['UserIdentity'] was always a dictionary, but it could be a string
Fix: Added type checking and safe dictionary access
cloudkitSharing - Type Error
Same issue as cloudkitParticipants - participant data structure varied.
FitnessWorkoutsLocationData - Missing Table
Code queried associations table without checking if it exists.
Fix: Added table existence check
health - Index Out of Range
Query returns varying column counts depending on database schema, but code assumed all indices exist.
Fix: Added bound checking