Skip to content

Commit bbfe5ef

Browse files
committed
next 2 sections, its only 2am and i'm on a roll
1 parent 834d45c commit bbfe5ef

File tree

1 file changed

+263
-1
lines changed

1 file changed

+263
-1
lines changed

_posts/2019-12-23-fixing-opcodes.md

Lines changed: 263 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -805,4 +805,266 @@ Unlikely to be the same thing but decided to humour myself and go check, and it'
805805

806806
#### Building Handler Trees
807807

808-
Coming soon™
808+
This shit is actually pretty cursed, so what I'm going to do is paste a heap of code and then make it slightly more digestible. I also don't want to spend more time on this because it hurts the soul and maybe with what we have, we might be able to get somewhere somewhat reliably.
809+
810+
```python
811+
def ea_to_rva(ea):
812+
return ea - idaapi.get_imagebase()
813+
814+
def get_bytes_str(start_ea, end_ea):
815+
size = end_ea - start_ea
816+
817+
bytes = []
818+
for ea in range(start_ea, end_ea):
819+
b = '{:02x}'.format(ida_bytes.get_byte(ea))
820+
bytes.append(b)
821+
822+
return ' '.join(bytes)
823+
824+
def get_func_name(ea):
825+
name = ida_funcs.get_func_name(ea)
826+
demangled = ida_name.demangle_name(name, idc.get_inf_attr(idc.INF_LONG_DN))
827+
828+
return demangled or name
829+
830+
def postprocess_func(fn, depth = 0):
831+
func = {
832+
'ea': fn.startEA,
833+
'rva': ea_to_rva(fn.startEA),
834+
'body': get_bytes_str(fn.startEA, fn.endEA)
835+
}
836+
837+
# total aids
838+
switch_ea, switch = find_switch(fn.startEA)
839+
840+
if switch and switch_ea != main_jumptable:
841+
sw = func['switch'] = {}
842+
843+
res = idaapi.calc_switch_cases(switch_ea, switch)
844+
845+
case_ids = []
846+
for case in res.cases:
847+
for i in case:
848+
case_ids.append(int(i))
849+
850+
sw['cases'] = [i for i in set(case_ids)]
851+
852+
else:
853+
func['switch'] = None
854+
855+
return func
856+
857+
def process_func(func, start_ea, end_ea):
858+
for head in idautils.Heads(start_ea, end_ea):
859+
flags = idaapi.getFlags(head)
860+
if idaapi.isCode(flags):
861+
862+
mnem = idc.GetMnem(head)
863+
864+
if mnem == 'call' or mnem == 'jmp':
865+
op_ea = idc.GetOperandValue(head, 0)
866+
fn = ida_funcs.get_func(op_ea)
867+
868+
if fn:
869+
fn_info = postprocess_func(fn)
870+
871+
if fn_info:
872+
func['xrefs'][get_func_name(op_ea)] = fn_info
873+
874+
def process_case(case, id):
875+
func = case['func'] = {}
876+
body = func['body'] = get_bytes_str(case['start_ea'], case['end_ea'])
877+
func['xrefs'] = {}
878+
879+
process_func(func, case['start_ea'], case['end_ea'])
880+
881+
882+
883+
def run():
884+
# [same as before]
885+
886+
# find switch
887+
head, switch = find_switch(func_ea)
888+
889+
global main_jumptable
890+
main_jumptable = head
891+
892+
# [also same as before]
893+
894+
for k, v in enumerate(case_infos):
895+
process_case(v, k)
896+
```
897+
898+
Don't say I didn't warn you. Anyway, `run()` is basically the same thing with a few minor changes.
899+
900+
* We store the EA of the jumptable inside `ZoneDownHandler` so we don't duplicate it in the event that we are inside a case that refers to itself. Mainly because its just more junk to output that we really don't need
901+
* We loop over each `case_info` dictionary that we created before and do things...
902+
903+
... so we'll start from `process_case(...)` and go from there:
904+
905+
```python
906+
def process_case(case, id):
907+
func = case['func'] = {}
908+
body = func['body'] = get_bytes_str(case['start_ea'], case['end_ea'])
909+
func['calls'] = {}
910+
911+
process_func(func, case['start_ea'], case['end_ea'])
912+
```
913+
914+
`process_case(...)` is pretty self explanatory, pretty much just sets up a dictionary and passes the ref through with the start and end EA of the segment of code we'll look at. We also get all the bytes of the case segment as a string, meaning this disassembly:
915+
916+
```
917+
loc_140F6ED28: ; CODE XREF: Client__Network__ZoneDownHandler+46↑j
918+
; DATA XREF: Client__Network__ZoneDownHandler:jpt_140F6ED26↓o
919+
xor r8d, r8d ; jumptable 0000000140F6ED26 case 125
920+
mov rdx, rdi
921+
lea ecx, [r8+8]
922+
mov rbx, [rsp+58h+arg_0]
923+
mov rsi, [rsp+58h+arg_10]
924+
add rsp, 50h
925+
pop rdi
926+
jmp net__somegenericweirdshit
927+
```
928+
929+
Becomes this in the output:
930+
931+
```json
932+
"body":"45 33 c0 48 8b d7 41 8d 48 08 48 8b 5c 24 60 48 8b 74 24 70 48 83 c4 50 5f e9 7a 40 00 00"
933+
```
934+
935+
Nothing too complex, but there's a possible 'improvement' with this. Currently all references to data and so on is preserved as is, so in the event of the executable being rebuilt, it's very likely that some of the bytes in here will change. What's probably a good idea to do is to replace references to data and code with wildcards, so we know that during the processing step wildcards can be completely ignored and subsequently if then any of the remaining bytes change, there's either a code change or it's not the same thing. But we can cross that bridge later.
936+
937+
Moving on...
938+
939+
```python
940+
def process_func(func, start_ea, end_ea):
941+
for head in idautils.Heads(start_ea, end_ea):
942+
flags = idaapi.getFlags(head)
943+
if idaapi.isCode(flags):
944+
945+
mnem = idc.GetMnem(head)
946+
947+
if mnem == 'call' or mnem == 'jmp':
948+
op_ea = idc.GetOperandValue(head, 0)
949+
fn = ida_funcs.get_func(op_ea)
950+
951+
if fn:
952+
fn_info = postprocess_func(fn)
953+
954+
if fn_info:
955+
func['calls'][get_func_name(op_ea)] = fn_info
956+
```
957+
958+
This is where it starts getting fucked. So, again, this is how it goes:
959+
960+
1. Loop over every instruction in the range `start_ea ... end_ea`
961+
2. Check if it's actually code, though the check is probably redundant in this case and I think something I left in from before, its all a blur now
962+
3. Get the mnemonic by name and check if it's a `call` or `jmp` instruction
963+
4. If it is, we get the first operand value, or the instructions parameter -- in this case it should be the EA of a function
964+
5. Call `get_func` on it and check if it actually is a function -- it returns `None` if its not
965+
6. Do more shit with that function (see below)
966+
7. Store the result in the dictionary keyed by the function name
967+
968+
Not totally indigestible, but it's pretty gnarly. So lets make it even worse and check out `postprocess_func`!
969+
970+
```python
971+
def postprocess_func(fn, depth = 0):
972+
func = {
973+
'ea': fn.startEA,
974+
'rva': ea_to_rva(fn.startEA),
975+
'body': get_bytes_str(fn.startEA, fn.endEA)
976+
}
977+
978+
# total aids
979+
switch_ea, switch = find_switch(fn.startEA)
980+
981+
if switch and switch_ea != main_jumptable:
982+
sw = func['switch'] = {}
983+
984+
res = idaapi.calc_switch_cases(switch_ea, switch)
985+
986+
case_ids = []
987+
for case in res.cases:
988+
for i in case:
989+
case_ids.append(int(i))
990+
991+
sw['cases'] = [i for i in set(case_ids)]
992+
993+
else:
994+
func['switch'] = None
995+
996+
return func
997+
```
998+
999+
There's not anything 'new' here but it's pretty gross nonetheless. For the most part though, this is simply an isolated function where we can do everything later without being trapped in 60 layers of indentation. Check if we have a switch in the function, if we do, grab some info about it and then attach it to the `func` dictionary.
1000+
1001+
Something we could do here is grab the bytes of each case in the nested switches, so we can then distinguish nested switches at the same time but we'll come back to this later. I don't want to be battling this stupid shit without the easier stuff working properly first.
1002+
1003+
#### I Can't Believe That Writing JSON to the Clipboard Deserves It's Own Section
1004+
1005+
Now we'll export all this garbage and throw it into the clipboard so you can do things with it. Luckily this is actually pretty easy:
1006+
1007+
```python
1008+
from PyQt5.Qt import QApplication
1009+
1010+
def set_clipboard(data):
1011+
QApplication.clipboard().setText(data)
1012+
1013+
def set_clipboard_json(data):
1014+
set_clipboard(json.dumps(data, indent=2, separators=(',', ':')))
1015+
log('copied parsed data to clipboard')
1016+
```
1017+
1018+
[Wow](https://www.youtube.com/watch?v=TRIwAHX3aHM). At the end of `run()`, just insert `set_clipboard_json(output)` and away you go. You'll get something like this, or maybe better if you're less retarded than I am:
1019+
1020+
```json
1021+
{
1022+
"rva":16182568,
1023+
"func":{
1024+
"body":"45 33 c0 48 8b d7 41 8d 48 08 48 8b 5c 24 60 48 8b 74 24 70 48 83 c4 50 5f e9 7a 40 00 00",
1025+
"calls":{
1026+
"net::somegenericweirdshit":{
1027+
"body":"48 89 5c 24 08 48 89 74 24 10 57 48 83 ec 20 8b f1 41 8b d8 48 8b 0d 4d 87 b9 00 48 8b fa e8 8d ce 11 ff 48 85 c0 74 15 4c 8b 10 4c 8b cf 44 8b c3 8b d6 48 8b c8 41 ff 92 90 02 00 00 48 8b 5c 24 30 48 8b 74 24 38 48 83 c4 20 5f c3",
1028+
"rva":16199104,
1029+
"ea":5384908224,
1030+
"switch":null
1031+
}
1032+
}
1033+
},
1034+
"rel_ea":72,
1035+
"opcodes":[
1036+
125
1037+
],
1038+
"start_ea":5384891688,
1039+
"end_ea":5384891718,
1040+
"size":30
1041+
}
1042+
```
1043+
1044+
And just to compare, here's the 5.15 equivalent:
1045+
1046+
```json
1047+
{
1048+
"rva":16818872,
1049+
"func":{
1050+
"body":"b9 08 00 00 00 48 8b d7 45 33 c0 48 8b 5c 24 60 48 8b 74 24 70 48 83 c4 50 5f e9 39 47 00 00",
1051+
"calls":{
1052+
"sub_14100EA10":{
1053+
"body":"48 89 5c 24 08 48 89 74 24 10 57 48 83 ec 20 8b f1 41 8b d8 48 8b 0d 7d 3c bd 00 48 8b fa e8 ed 16 08 ff 48 85 c0 74 15 4c 8b 10 4c 8b cf 44 8b c3 8b d6 48 8b c8 41 ff 92 a0 02 00 00 48 8b 5c 24 30 48 8b 74 24 38 48 83 c4 20 5f c3",
1054+
"rva":16837136,
1055+
"ea":5385546256,
1056+
"switch":null
1057+
}
1058+
}
1059+
},
1060+
"rel_ea":72,
1061+
"opcodes":[
1062+
893
1063+
],
1064+
"start_ea":5385527992,
1065+
"end_ea":5385528023,
1066+
"size":31
1067+
}
1068+
```
1069+
1070+
As mentioned already, there's a few ways this can be improved but for now this should work as a proof of concept.

0 commit comments

Comments
 (0)