memory optimization
given the memory problems we have detected, one possible optimization is to remove a document once it has been processed
|
for index in range(num_docs): |
other optimizations
this part in here is quadratic, it is also making python list work extra hard by doing list.pop() -- cause python will have to reshuffle the lists
https://github.com/adsabs/export_service/blob/master/exportsrv/utils.py#L92
for better results:
1. turn the docs into a dict d
2. then do:
for bibcode in bibcodes:
if bibcode in d:
new_docs.append(d.pop(bibcode))
this is another quadratic issue (and all of the similar)
in Python, a string is copied every time += is used -- which is problematic in here because export is building large textual output; so it gets more expensive with every added string
https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/bibTexFormat.py#L262
https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/bibTexFormat.py#L522
better to keep appending to a list; and then return ''.join(list)
memory optimization
given the memory problems we have detected, one possible optimization is to remove a document once it has been processed
export_service/exportsrv/formatter/fieldedFormat.py
Line 621 in 9eab937
other optimizations
this part in here is quadratic, it is also making python list work extra hard by doing
list.pop()-- cause python will have to reshuffle the listshttps://github.com/adsabs/export_service/blob/master/exportsrv/utils.py#L92
for better results:
this is another quadratic issue (and all of the similar)
in Python, a string is copied every time
+=is used -- which is problematic in here because export is building large textual output; so it gets more expensive with every added stringhttps://github.com/adsabs/export_service/blob/master/exportsrv/formatter/bibTexFormat.py#L262
https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/bibTexFormat.py#L522
better to keep appending to a list; and then return
''.join(list)