纸上得来终觉浅。尝试一下通过ida脚本解ollvm混淆。样本来自看雪

前言

首先放上经典图片

图片描述

  1. 函数的开始地址为序言(Prologue)的地址
  2. 序言的后继为主分发器(Main dispatcher)
  3. 后继为主分发器的块为预处理器(Predispatcher)
  4. 后继为预处理器的块为真实块(Relevant blocks)
  5. 无后继的块为retn块
  6. 剩下的为无用块与子分发器(Sub dispatchers)

举个例子,ollvm混淆前,程序的执行流程如下:

image-20250724192638218

混淆后,变成:

image-20250724193722543

我们要做的就是将其还原成:

image-20250724194004380

分析

接下来看样本。序言块如图所示。除了初始化W8外还初始化了W20-W28用于分发。

image-20250717172536322

解ollvm混淆首先要找到主分发器,这里我们可以认为入度大于某个阈值的块即为主分发器。

针对这个样本,真实块的后继节点就是主分发器,通过一个无条件跳转跳转到主分发器。所以只需要通过寻找主分发器的前驱节点即可找到所有真实块。

image-20250717173019826

真实块会在最后更新W8,可以以此作为根据来还原控制流。其更新的W8的Index所对应的真实块即为其真实后继。

对于真实块的index,可以通过真实块的前驱分发块来获取到真实块的index。这个样本中主要有两种模式:

一种是通过临时对W9赋值来和W8做比较

image-20250717174529751

另一种是将W8与序言块中初始化的W20-W28作比较。

image-20250717174624145

对于第一种可以直接跟踪W9的值来获取index,对于第二种,则通过预先保存序言块中初始化的值来获取index。

针对这个样本,可以粗暴的认为包含ret语句的块即为ret块。最后,根据index重构跳转即可。我这里的做法是直接将真实块的末尾跳转语句跳转到后继真实块,对于包含分支的真实块则将csel改成条件跳转。

脚本

最终实现脚本如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
import idaapi
import idautils
import idc
import keystone
import logging

# 正确配置logging输出到命令行
logging.basicConfig(level=logging.DEBUG, format='%(message)s')

#初始化Ks
ks = keystone.Ks(keystone.KS_ARCH_ARM64, keystone.KS_MODE_LITTLE_ENDIAN)

# 寄存器与地址映射表 (基于提供的汇编代码)
REGISTER_ADDRESS_MAP = {
'W20': 0x3455F110, # MOV W20, #0xF110 + MOVK W20, #0x3455,LSL#16
'W21': 0xA9D4543A, # MOV W21, #0x543A + MOVK W21, #0xA9D4,LSL#16
'W22': 0xBD9FBB9, # MOV W22, #0xFBB9 + MOVK W22, #0xBD9,LSL#16
'W23': 0xC7AC1F5E, # MOV W23, #0x1F5E + MOVK W23, #0xC7AC,LSL#16
'W24': 0xE4DBC33F, # MOV W24, #0xC33F + MOVK W24, #0xE4DB,LSL#16
'W25': 0x4E30550C, # MOV W25, #0x550C + MOVK W25, #0x4E30,LSL#16
'W26': 0xA9D4543B, # MOV W26, #0x543B + MOVK W26, #0xA9D4,LSL#16
'W27': 0x665797A4, # MOV W27, #0x97A4 + MOVK W27, #0x6657,LSL#16
'W28': 0xE4DBC33E, # MOV W28, #0xC33E + MOVK W28, #0xE4DB,LSL#16
'W8': 0x665797A5
}


def findDispatchers(func_start,num = 10):
"""
找到主分发块。当入度大于num时,认为是主分发块。
"""
func = idaapi.get_func(func_start)
blocks = idaapi.FlowChart(func)

pachers = []
for block in blocks:
preds = block.preds()
preds_list = list(preds)
if len(preds_list) > num:
pachers.append(block)
return pachers

def findRealBlocks(dispatchers):

"""
找到真实块。这里认为后继节点是 主分发器 的块为真实块
"""
real_blocks = []
for dispatcher in dispatchers:
# 获取dispatcher的所有前驱块
preds = dispatcher.preds()
for pred in preds:
# 获取前驱块的最后一条指令地址
last_insn = idc.prev_head(pred.end_ea)

# 检查是否为B指令(无条件跳转)
mnem = idc.print_insn_mnem(last_insn)
if mnem == "B":
# 获取跳转目标地址
target = idc.get_operand_value(last_insn, 0)

# 检查跳转目标是否指向当前dispatcher
if target == dispatcher.start_ea:
# 确保这个块不是dispatcher本身
if pred not in dispatchers:
real_blocks.append(pred)

# 去重
real_blocks = list(set(real_blocks))
return real_blocks

def findRetBlock(func_start):
"""
找到返回块。返回块是以RET指令结束的基本块。
"""
func = idaapi.get_func(func_start)
blocks = idaapi.FlowChart(func)

ret_blocks = []
for block in blocks:
# 获取块的最后一条指令
last_insn = idc.prev_head(block.end_ea)

# 检查是否为RET指令
mnem = idc.print_insn_mnem(last_insn)
if mnem == "RET":
ret_blocks.append(block)

return ret_blocks

def getRealOrderOfBlocks(realBlocks):
"""
找到每个块的index。具体做法是根据前驱分发块的MOV指令获取index。
例如:
MOV W9, #0xF5C370CA
CMP W8, W9
B.NE loc_43120

通过跟踪W9的值来获取真实块的index

result = [
(BB1, index1,[succ]),
(BB2, index2,[succ1, succ2, "CC"]),
...
]

"""

def getIndex(block):
"""
输入前驱块,即子分发器
获取Index
主要有这两种模式:
0x43138: MOV W9, #0xC7AC1F5F
0x43140: CMP W8, W9
0x43144: B.EQ loc_433F8


0x43288: CMP W8, W24
0x4328c: B.EQ loc_4340C
"""
index = 0
eaBCC = idc.prev_head(block.end_ea)
eaCMP = idc.prev_head(eaBCC)
CMPreg = idc.print_operand(eaCMP, 1) # 获取寄存器名称字符串
if(CMPreg == "W9"):
# 0x43138: MOV W9, #0xC7AC1F5F
eaMOV = idc.prev_head(eaCMP)
index = idc.get_operand_value(eaMOV, 1)
else:
# 0x43288: CMP W8, W24
index = REGISTER_ADDRESS_MAP.get(CMPreg, None)
return index



def getSuccIndex(block):
"""
输入当前块
获取后继块的index
主要有这三种模式:

0x43278: LDR X8, [X19,#0x40]
0x4327c: MOV W8, #0x5338AB80
0x43284: B loc_43120

0x434cc: LDRB W8, [X19,#0x1F]
0x434d0: MOV W9, #0x1B166FED
0x434d8: CMP W8, #0
0x434dc: MOV W8, #0x146E0C87
0x434e4: CSEL W8, W8, W9, NE
0x434e8: B loc_43120

0x4340c: BL sub_AC28
0x43410: MOV W8, #0xA6FB
0x43414: STR W0, [X19,#0x24]
0x43418: MOVK W8, #0x7986,LSL#16
0x4341c: B loc_43120
核心区别在于倒数第二条指令
"""

result = []
eaAssign = idc.prev_head(idc.prev_head(block.end_ea))
asmAssign = idc.print_insn_mnem(eaAssign)
if asmAssign == "MOV":
succ = idc.get_operand_value(eaAssign, 1)
result.append(succ)

elif asmAssign == "MOVK":
succHigh = idc.get_operand_value(eaAssign, 1)
eaCurrent = idc.prev_head(eaAssign)

# 向前查找 MOV W8, #0xA6FB
while eaCurrent > block.start_ea:
asmCurrent = idc.print_insn_mnem(eaCurrent)
if asmCurrent == "MOV":
reg = idc.print_operand(eaCurrent, 0)
if reg == "W8":
succLow = idc.get_operand_value(eaCurrent, 1)
succ = (succHigh << 16) | succLow
result.append(succ)
break
eaCurrent = idc.prev_head(eaCurrent)

elif asmAssign == "CSEL":
flag = idc.print_operand(eaAssign, 3)
trueReg = idc.print_operand(eaAssign, 1)
falseReg = idc.print_operand(eaAssign, 2)
trueRegIndex = 0
falseRegIndex = 0
if(trueReg in REGISTER_ADDRESS_MAP):
trueRegIndex = REGISTER_ADDRESS_MAP[trueReg]
if(falseReg in REGISTER_ADDRESS_MAP):
falseRegIndex = REGISTER_ADDRESS_MAP[falseReg]

eaCurrent = idc.prev_head(eaAssign)
while eaCurrent > block.start_ea:
asmCurrent = idc.print_insn_mnem(eaCurrent)

if asmCurrent == "MOV":
reg = idc.print_operand(eaCurrent, 0)
if reg == trueReg:
trueRegIndex = idc.get_operand_value(eaCurrent, 1)
elif reg == falseReg:
falseRegIndex = idc.get_operand_value(eaCurrent, 1)
elif asmCurrent == "MOVK":
trueRegIndex = 0x4E30550D
falseRegIndex = 0xBEE4A4C9
break

if(trueRegIndex != 0 and falseRegIndex != 0):
break
eaCurrent = idc.prev_head(eaCurrent)

result.append(trueRegIndex)
result.append(falseRegIndex)
result.append(flag)
return result

result = []
for realBlock in realBlocks:
realBlockPred = realBlock.preds()
predList = list(realBlockPred)
predCount = len(predList)
succCount = len(list(realBlock.succs()))
if(succCount == 0):
#RET块
result.append((realBlock, 0x146E0C87, []))

elif(predCount == 0 and succCount != 0):
#序言块
result.append((realBlock,None,[REGISTER_ADDRESS_MAP['W8']]))

elif(predCount == 1 and succCount != 0):
#一个前驱块
pred = predList[0] # 使用已经转换的列表

Index = getIndex(pred)
SuccIndex = getSuccIndex(realBlock)
result.append((realBlock, Index, SuccIndex))

else:
print("多前驱块真实块{}".format(hex(realBlock.start_ea)))

return result

def showAllBlocks(blocks):
"""
调试用
显示所有块的起始地址和结束地址以及所有指令
"""
for i, block in enumerate(blocks):
print(f"\n=== Block {i+1}: {hex(block.start_ea)} - {hex(block.end_ea)} ===")

# 遍历块中的每条指令
ea = block.start_ea
while ea < block.end_ea:
# 获取指令的反汇编代码
disasm = idc.GetDisasm(ea)
print(f"{hex(ea)}: {disasm}")

# 移动到下一条指令
ea = idc.next_head(ea)
print(f"=== End of Block {i+1} ===")

def printPatchInfo(patchInfo):
"""
打印补丁信息
"""
print("\n=== 补丁信息 ===")
for block, index, succ in patchInfo:
if index is not None:
print(f"Block {hex(block.start_ea)}: Index = {hex(index)}, Successors = {succ}")
else:
print(f"Block {hex(block.start_ea)}: No Index, Successors = {succ}")
print("=== 补丁信息结束 ===")

def patch(patchInfo):
"""
重构程序流
"""

def findBlockByIndex(patchInfo, targetIndex):
"""
根据索引查找对应的块地址
"""
for block, index, succ in patchInfo:
if index == targetIndex:
return block.start_ea
print(f"[-] No block found for index {hex(targetIndex) if isinstance(targetIndex, int) else targetIndex}")
return None


for block, index, succ in patchInfo:
print("succ is:"+ str(succ)+" len: "+str(len(succ)))
if succ is not None:
if len(succ) == 1:
# 单个后继块
succIndex = succ[0]
succAddress = findBlockByIndex(patchInfo, succIndex)
if succAddress:
# patch最后一条指令 - 使用b指令而不是br
try:
patch_code = f"b #{hex(succAddress)}"
print(f"[+] Patching {hex(idc.prev_head(block.end_ea))} with: {patch_code}")
encoding, count = ks.asm(patch_code, idc.prev_head(block.end_ea))
# 使用正确的IDA API
for i, byte_val in enumerate(encoding):
idc.patch_byte(idc.prev_head(block.end_ea) + i, byte_val)
except Exception as e:
print(f"[-] Error patching block {hex(block.start_ea)}: {e}")

elif len(succ) == 3:
# 两个后继块 - 条件跳转情况
succTrue, succFalse, flag = succ
succTrueAddress = findBlockByIndex(patchInfo, succTrue)
succFalseAddress = findBlockByIndex(patchInfo, succFalse)

if succTrueAddress and succFalseAddress:
Baddr = idc.prev_head(block.end_ea)
BCCaddr = idc.prev_head(Baddr)
try:
# 条件跳转
patch_code1 = f"b.{flag.lower()} #{hex(succTrueAddress)}"
print(f"[+] Patching {hex(BCCaddr)} with: {patch_code1}")
encoding1, count1 = ks.asm(patch_code1, BCCaddr)
for i, byte_val in enumerate(encoding1):
idc.patch_byte(BCCaddr + i, byte_val)

# 无条件跳转
patch_code2 = f"b #{hex(succFalseAddress)}"
print(f"[+] Patching {hex(Baddr)} with: {patch_code2}")
encoding2, count2 = ks.asm(patch_code2, Baddr)
for i, byte_val in enumerate(encoding2):
idc.patch_byte(Baddr + i, byte_val)
except Exception as e:
print(f"[-] Error patching block {hex(block.start_ea)}: {e}")
else:
print(f"[-] Cannot find target addresses for conditional jump in block {hex(block.start_ea)}")
print(f" succTrue: {hex(succTrue) if isinstance(succTrue, int) else succTrue}")
print(f" succFalse: {hex(succFalse) if isinstance(succFalse, int) else succFalse}")

def deOllvm(addr):
realBlocks = []
dispatchers = findDispatchers(addr)
retBlocks = findRetBlock(addr)
realBlocks = findRealBlocks(dispatchers)
for retBlock in retBlocks:
if retBlock not in realBlocks:
realBlocks.append(retBlock)
patchInfo = getRealOrderOfBlocks(realBlocks)
printPatchInfo(patchInfo)
patch(patchInfo)


deOllvm(0x43058)

总结

实际写了一遍也算是加深印象了。但这脚本写的实在是过于case by case,直接基于汇编的脚本容易陷入针对特定指令模式的死板匹配,有时间应该分析一下d810是如何在微码层面实现的。