Skip to content

Commit e8aa985

Browse files
committed
apparently I can write actually comprehensible shit at 3am
1 parent bbfe5ef commit e8aa985

File tree

1 file changed

+255
-1
lines changed

1 file changed

+255
-1
lines changed

_posts/2019-12-23-fixing-opcodes.md

Lines changed: 255 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1067,4 +1067,258 @@ And just to compare, here's the 5.15 equivalent:
10671067
}
10681068
```
10691069

1070-
As mentioned already, there's a few ways this can be improved but for now this should work as a proof of concept.
1070+
As mentioned already, there's a few ways this can be improved but for now this should work as a proof of concept.
1071+
1072+
#### Automating Client Opcode Correction
1073+
1074+
This is the part where it gets spicy. But we'll quickly refresh a couple things we learnt a while back:
1075+
1076+
* The order of switch cases is mostly preserved
1077+
* Switch case code doesn't change much (or at all)
1078+
1079+
For a quick proof of concept, we'll only attempt to do the first one now and see if we can get something usable. Make sure you've got the JSON that the script spits out saved somewhere. I'm gonna do the thing where I post a bunch of ~~surprisingly~~ working code and then go through it.
1080+
1081+
```python
1082+
import json
1083+
import sys
1084+
1085+
import requests
1086+
import CppHeaderParser
1087+
1088+
#### config/settings/garbage
1089+
1090+
fucked_distance = 0xffffffff
1091+
max_size_diff = 10
1092+
1093+
#### end config shit
1094+
1095+
if len(sys.argv) != 3:
1096+
print('missing args: [old exe schema] [new exe schema]')
1097+
sys.exit(1)
1098+
1099+
with open(sys.argv[1]) as f:
1100+
old_schema = json.load(f)
1101+
1102+
with open(sys.argv[2]) as f:
1103+
new_schema = json.load(f)
1104+
1105+
# print revs
1106+
print('old client rev: %s' % old_schema['clean_rev'])
1107+
print('new client rev: %s' % new_schema['clean_rev'])
1108+
1109+
# fetch name hinting file for old rev
1110+
if old_schema['ipcs_file']:
1111+
print('have ipcs_file in client schema, downloading symbols: %s' % old_schema['ipcs_file'])
1112+
ipcs_data = requests.get(old_schema['ipcs_file'])
1113+
1114+
if ipcs_data.status_code == 200:
1115+
header = CppHeaderParser.CppHeader(ipcs_data.text, argType="string")
1116+
1117+
opcodes_found = []
1118+
1119+
# newlines for the autism
1120+
print()
1121+
1122+
def get_opcode_by_val(enum_name, opcode):
1123+
for enum in header.enums:
1124+
if enum['name'] == enum_name:
1125+
# find enum value
1126+
for val in enum['values']:
1127+
if val['value'] == opcode:
1128+
return val['name']
1129+
1130+
return "Unknown"
1131+
1132+
def find_close_numeric(objs, dest, getter):
1133+
closest = fucked_distance
1134+
closest_obj = None
1135+
1136+
for obj in objs:
1137+
val = getter(obj)
1138+
1139+
num = abs(val - dest)
1140+
1141+
if num < closest:
1142+
closest = num
1143+
closest_obj = obj
1144+
1145+
return (closest, closest_obj)
1146+
1147+
def find_in(objs, expr):
1148+
for obj in objs:
1149+
if expr(obj):
1150+
return obj
1151+
1152+
return None
1153+
1154+
def get_opcodes_str(opcodes):
1155+
return ', '.join([hex(o) for o in opcodes])
1156+
1157+
def add_match_case(cases, case):
1158+
# check if case already exists
1159+
1160+
for c in cases:
1161+
if c['rel_ea'] == case['rel_ea']:
1162+
return
1163+
1164+
cases.append(case)
1165+
1166+
for k, case in enumerate(old_schema['cases']):
1167+
old_opcodes = case['opcodes']
1168+
print('finding opcode(s): %s' % get_opcodes_str(old_opcodes))
1169+
1170+
matched_handlers = []
1171+
1172+
# see if we can get a match for the relative ea first
1173+
dist, dist_match = find_close_numeric(new_schema['cases'], case['rel_ea'], lambda obj : obj['rel_ea'])
1174+
size_diff = abs(dist_match['size'] - case['size'])
1175+
#print(' os: %d ns: %d d: %d' % (dist_match['size'], case['size'], size_diff))
1176+
1177+
if dist == fucked_distance:
1178+
print(' got fucked distance, what?')
1179+
continue
1180+
1181+
order_match = new_schema['cases'][k]
1182+
1183+
# see if the rva matches for the cases found by the distance and order
1184+
if dist_match['rel_ea'] == order_match['rel_ea'] and size_diff < max_size_diff:
1185+
1186+
print(' got order match, size diff: %d < %d' % (size_diff, max_size_diff))
1187+
1188+
# check if calls count match in found match & og code
1189+
if len(case['func']['calls']) == len(order_match['func']['calls']):
1190+
print(' has nested callcount match')
1191+
1192+
add_match_case(matched_handlers, order_match)
1193+
1194+
1195+
matched = len(matched_handlers)
1196+
if matched == 1:
1197+
# holy shit
1198+
opcodes_found.append((old_opcodes, matched_handlers[0]['opcodes']))
1199+
elif matched > 1:
1200+
print(' found %d matching handlers' % matched)
1201+
1202+
1203+
#break
1204+
1205+
# dump found shit
1206+
print()
1207+
1208+
for k, v in enumerate(opcodes_found):
1209+
old, new = v
1210+
print('branch %d' % k)
1211+
1212+
old = ', '.join(['%s (%s)' % (hex(o), get_opcode_by_val('ServerZoneIpcType', o)) for o in old])
1213+
1214+
print(' - old: %s' % old)
1215+
print(' - new: %s' % get_opcodes_str(new))
1216+
1217+
print('found %d/%d opcode branches!' % (len(opcodes_found), len(old_schema['cases'])))
1218+
```
1219+
1220+
Something to note quickly, if you set the `ipcs_file` key in the old client schema, it'll fetch the header file, parse it and give you symbols.
1221+
1222+
```json
1223+
"ipcs_file": "https://raw.githubusercontent.com/SapphireServer/Sapphire/v5.08/src/common/Network/PacketDef/Ipcs.h"
1224+
```
1225+
1226+
With that out of the way, we can talk about cool shit now. Does it work? Well, yes and no. What it gives you already is pretty accurate and I think there's only a couple that it 'finds' that aren't correct, though I haven't gone and actually checked said opcodes myself.
1227+
1228+
Anyway, so lets go through how it works.
1229+
1230+
```python
1231+
dist, dist_match = find_close_numeric(new_schema['cases'], case['rel_ea'], lambda obj : obj['rel_ea'])
1232+
1233+
size_diff = abs(dist_match['size'] - case['size'])
1234+
```
1235+
1236+
This uses patent pending technology to find a case in the new executable schema which has the smallest delta between RVAs. One of the differences between the 5.0 and 5.15 executable is that some cases grew by 1~5 bytes, so you can't look up a case by its RVA alone, so we need to find the closest one. We also calculate the size difference between the distance match and the origin case in the original executable. It's _probably_ unlikely that something that grew by more than a few bytes is the same handler.
1237+
1238+
```python
1239+
order_match = new_schema['cases'][k]
1240+
```
1241+
1242+
This is pretty dumb but it's in there so why not, but basically we grab the case in the new executable which is in the same place, so order still matters but nothing else does. This probably needs fixing though because I'm not sure if python will reliably spit out arrays in the same order. Seems to work though. Probably crashes if you have less elements in the new schema, but I like to live on the edge.
1243+
1244+
```python
1245+
# see if the rva matches for the cases found by the distance and order
1246+
if dist_match['rel_ea'] == order_match['rel_ea'] and size_diff < max_size_diff:
1247+
1248+
print(' got order match, size diff: %d < %d' % (size_diff, max_size_diff))
1249+
1250+
# check if calls count match in found match & og code
1251+
if len(case['func']['calls']) == len(order_match['func']['calls']):
1252+
print(' has nested callcount match')
1253+
1254+
add_match_case(matched_handlers, order_match)
1255+
```
1256+
1257+
This is the first _real_ check we do and it's actually pretty decent in terms of it giving you good results. First we check if the distance match is also the order match and discard any others (for now) and that the size hasn't changed more than 10 bytes. Reason for this is that I was getting slightly more inconsistent results both just by checking the distance alone, so I figured it'd be a decent idea to check the order as well. `size_diff` also filters out a couple bad cases that looked obviously wrong.
1258+
1259+
Following that there's a quick check to see if the nested call count matches, though it doesn't serve any purpose at the moment other than a quick test. Currently, everything that this finds has an exact nested call count match, which is pretty nice. The idea I have in the back of my mind is that you have a bunch of isolated checks which will find the 'best' matching candidates with a confidence score, then you loop over each matching candidate, pick the best scoring one and then print those opcodes out.
1260+
1261+
Example:
1262+
1263+
```
1264+
branch 2
1265+
- old: 0x77 (Logout)
1266+
- new: 0x12d
1267+
branch 3
1268+
- old: 0x100 (Playtime)
1269+
- new: 0x2fa
1270+
branch 4
1271+
- old: 0x104 (Chat)
1272+
- new: 0x1d0
1273+
...
1274+
branch 10
1275+
- old: 0x17f (PlayerSpawn)
1276+
- new: 0xdc
1277+
branch 11
1278+
- old: 0x180 (NpcSpawn)
1279+
- new: 0x219
1280+
branch 12
1281+
- old: 0x181 (NpcSpawn2)
1282+
- new: 0x304
1283+
branch 13
1284+
- old: 0x191 (ActorFreeSpawn)
1285+
- new: 0x32b
1286+
branch 14
1287+
- old: 0x165 (PersistantEffect)
1288+
- new: 0x339
1289+
branch 15
1290+
- old: 0x184 (ActorSetPos)
1291+
- new: 0x296
1292+
branch 16
1293+
- old: 0x182 (ActorMove)
1294+
- new: 0x1ad
1295+
...
1296+
branch 27
1297+
- old: 0x15e (Effect)
1298+
- new: 0x2aa
1299+
branch 28
1300+
- old: 0x161 (AoeEffect8)
1301+
- new: 0xb3
1302+
branch 29
1303+
- old: 0x162 (AoeEffect16)
1304+
- new: 0xe6
1305+
branch 30
1306+
- old: 0x163 (AoeEffect24)
1307+
- new: 0x10a
1308+
branch 31
1309+
- old: 0x164 (AoeEffect32)
1310+
- new: 0x1c8
1311+
branch 32
1312+
- old: 0x142 (ActorControl)
1313+
- new: 0x12f
1314+
branch 33
1315+
- old: 0x144 (ActorControlTarget)
1316+
- new: 0x1b3
1317+
branch 34
1318+
- old: 0x143 (ActorControlSelf)
1319+
- new: 0x201
1320+
...
1321+
found 73/401 opcode branches!
1322+
```
1323+
1324+
Hand picked quite a few here, but these ones are actually correct which is honestly pretty impressive for such a naive implementation. That said, the ones I think are 'wrong' probably need to be manually checked, but I'd say that 90% of the ones it finds, so ~67 or so opcode cases are correct and it's selected the right handler which is honestly impressive for how awful this garbage is.

0 commit comments

Comments
 (0)