-
Notifications
You must be signed in to change notification settings - Fork 230
Description
A few months ago we got issue #1306 that called our attention to a couple of artifacts that were doing a big upfront memory allocation (cursor.fetchall() over a query) and in their specific use case that resulted in an out-of-memory crash. The proposed solution was to change the code style to avoid calling cursor.fetchall(), as cursors can be iterated upon, and used as generators instead, which is memory efficient as values are produced only as needed.
Digging a bit deeper into the issue, rather than just changing the specific artifacts that were identified as problematic, I've been thinking to instead change the ilapfuncs.get_sqlite_db_records() function to be a generator instead. Doing so, every artifact that uses that function will reduce upfront memory allocation (maybe they'll still generate a large list of values instead, but it'll be only once instead of twice as it's happening right now).
I've been reviewing all the instances where ilapfuncs.get_sqlite_db_records() is called, and the majority of uses either do:
- An immediate iteration over the results, in which case the generator will work as-is
- Return the values as the data_list return value that the
ilapfuncs.artifact_processordecorator expects, in which case we'd have to convert the generator to a list in-place
There are a few artifacts that will need further work though:
- iLEAPP/scripts/artifacts/filesApp.py is problematic on several cases because it checks data_list (the result of calling
get_sqlite_db_records()) for truthiness, and generators are always True (even if depleted or would give empty results) - iLEAPP/scripts/artifacts/foursquareSwarm.py calls
len()over db_records (where the results of calling the function are stored), but generators don't have a length. This is done to initialize the HTML report in the case of having data, so we could change the logic to first iterate over db_records, and then if we have successfully built a data_list, we can then initialize the HTML report. - iLEAPP/scripts/artifacts/potatoChat.py in several instances (lines 130-136) gets the first element returned by the function via bracket indexing
[0], which can be replaced withnext(generator). If the generator is empty, that's raise a StopIteration exception, but as it is now, the query returns empty that raises an IndexError, so we'd be in a similar situation as we are now. Similar issues happen later on in the file in two more places later on. - In the same artifact,
media_records(assigned in line 193) is consumed inside a for-loop (line 351) which would require it to be converted to a list, because the generator would be exhausted the first time it's looped over. Something similar happens withconv_records(assigned line 195, consumed in line 220) andu_cid_records(assigned in line 196, consumed in line 226). Similar issues happen later on in the file in two more places later on. - iLEAPP/scripts/artifacts/viber.py uses the same
if db_records:idiom discussed for filesApp on line 439. We would have to think a way to rearrange the logic in this artifact. - iLEAPP/scripts/artifacts/whatsApp.py on line 243 does boolean testing over the results of the function, and then accesses the first element by index
[0].
Those are all the issues that would block us from changing ilapfuncs.get_sqlite_db_records() into a generator. I'm open to discussion, but in general I think it'd be a move in the right direction.