Skip to content

Commit 8fbaf97

Browse files
committed
Initial version
1 parent 970715e commit 8fbaf97

29 files changed

+52145
-1
lines changed

README.md

+101-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,101 @@
1-
# go_moduledata_parser
1+
# go_moduledata_parser
2+
3+
Personal project to parse and extract function and type metadata from Go binaries
4+
as JSON. Since the parser output is JSON, this allows integration with different tools.
5+
6+
Currently, only generation of an IDC for annotating an IDA dissassembly is supported. This IDC
7+
can be applied to the disassembly to rename functions and types.
8+
9+
[[_TOC_]]
10+
11+
# Example Usage
12+
13+
```bash
14+
pip3 install pyelftools pefile
15+
python3 parser.py win64s.exe > win64s.json
16+
```
17+
18+
A sample JSON can be viewed [here](integrations/ida/sample.json)
19+
20+
## IDA integration
21+
```
22+
python3 integrations/ida/generate_go_idc.py win64s.json annotate_win64s.idc
23+
```
24+
25+
To use the IDC (tested on IDA Freeware only!)
26+
* **File > Load File > Parse C header file ...** *(Ctrl+F9)* and select `go_32.h` or `go_64.h` depending on the bitness of the binary to import the necessary structs
27+
* **View > Open subviews > Local types** *(Shift+F1)*, select all types (Ctrl+A) and right click **Synchronize to IDB**
28+
* **File > Script File ...** *(Alt+F7)* and select the IDC file
29+
30+
### Before
31+
32+
![before](imgs/before.png)
33+
34+
### After
35+
36+
Types and functions are annotated.
37+
38+
![after](imgs/after.png)
39+
40+
# Limitations
41+
42+
* Assumes that only one moduledata struct in use, Go binaries can contain more than one
43+
* Assumes that there is only one text section
44+
* Assumes that architecture is little-endian
45+
* Assumes that binary is built with later Go versions (currently 1.15), moduledata struct is not the same for binaries built with earlier versions
46+
* Only Windows and Linux (both x86/x64) supported (same architectures and OSes supported by IDA Freeware)
47+
* Code is not very Pythonic.:sweat_smile:
48+
49+
# Project Organization
50+
51+
## root
52+
53+
All the Python code needed for parsing the moduledata and related structures
54+
within a Golang binary. Start from `parser.py`
55+
56+
## go_files
57+
58+
* [custom.go](go_files/custom.go) contains the sample Go code to test the parsing
59+
and annotation
60+
* [build.sh](go_files/build.sh) builds the Go code into stripped, unstripped versions for x86/x64 Windows and Linux
61+
* Prebuilt binaries. Stripped binaries are suffixed with **s**.
62+
63+
## integrations
64+
65+
Currently only IDA is supported.
66+
67+
### ida
68+
69+
* [generate_go_idc.py](integrations/ida/generate_go_idc.py) generates a IDC script from JSON for types and functions (useful for stripped binaries)
70+
* [generate_go_idc_types.py](integration/ida/generate_go_idc_types.py) generates a IDC script from JSON for types only (useful for unstripped binaries so that you don't rename what IDA already generated for you)
71+
* [go_32.h](integrations/ida/go_32.h) contains the 32 bit version of the Golang structs
72+
* [go_64.h](integrations/ida/go_64.h) contains the 64 bit version of the Golang structs
73+
* [go_structs.h](integrations/ida/go_64.h) contains the bitness-independent
74+
structs for selected Golang types. Primitive data types are defined in `go_32.h` and `go_64.h`
75+
* [sample](integrations/ida/sample) contains the generated JSON and IDC for a stripped Windows x64 binary
76+
77+
# Background
78+
79+
This started out as an attempt to convert the structs in Golang's source code (mainly in runtime/type.go, runtime/symtab.go, runtime/runtime2.go, reflect/type.go and reflect/value.go) into a header file usable by IDA and then manually identify the types from the disassembly and apply each struct by hand.
80+
81+
A good way to learn this stuff was to write a simple Go program using the different features of the language (see [custom.go](go_files/custom.go)) and then building with and without stripping the symbols (see [build.sh](go_files/build.sh)).
82+
83+
By comparing the disassembly of both binaries side-by-side, it is possible to manually identify the main functions, recognize types. etc.
84+
85+
After spending enough time manually annotating a stripped binary, I decided to automate most of these tasks and hence this project :smiley:
86+
87+
Some of the excellent articles and tools that I referred to during this project are listed below.
88+
89+
# References
90+
91+
## Reverse Engineering Articles
92+
93+
* https://lekstu.ga/tags/go/
94+
* https://x0r19x91.gitlab.io/categories/golang/
95+
96+
## Other Good :thumbsup: Tools
97+
98+
* https://go-re.tk/redress/ (Windows and Linux, r2 integration)
99+
* https://github.com/alexander-hanel/gopep (Windows only, good notes and references as well)
100+
* https://github.com/getCUJO/ThreatIntel/tree/master/Scripts/Ghidra (Ghidra integration)
101+

go_files/build.sh

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/bin/bash
2+
GOOS=windows GOARCH=amd64 go build -ldflags="-s -w" -o "$1win64s.exe"
3+
GOOS=windows GOARCH=amd64 go build -o "$1win64.exe"
4+
GOOS=windows GOARCH=386 go build -ldflags="-s -w" -o "$1win32s.exe"
5+
GOOS=windows GOARCH=386 go build -o "$1win32.exe"
6+
GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o "$1lnx64s"
7+
GOOS=linux GOARCH=amd64 go build -o "$1lnx64"
8+
GOOS=linux GOARCH=386 go build -ldflags="-s -w" -o "$1lnx32s"
9+
GOOS=linux GOARCH=386 go build -o "$1lnx32"
10+

go_files/custom.go

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
package main
2+
3+
import "fmt"
4+
5+
type Updater interface {
6+
UpdateQty(qty int, rounds int) int
7+
}
8+
9+
type custom struct {
10+
name string
11+
qty int
12+
legit bool
13+
toys map[int]string
14+
}
15+
16+
func (c *custom) UpdateQty(qty int, rounds int) int {
17+
for i := 0; i < rounds; i++ {
18+
c.qty = c.qty + qty
19+
}
20+
return c.qty
21+
}
22+
23+
func (c custom) PrintName(greeting string) {
24+
if len(c.name) != 0 {
25+
fmt.Printf("%s %s ", greeting, c.name)
26+
}
27+
}
28+
29+
func main() {
30+
c := &custom{
31+
name: "Wheee",
32+
qty: -33,
33+
legit: false,
34+
toys: map[int]string{1: "thiss", 2: "thatt"},
35+
}
36+
37+
fmt.Printf("%s\n", c.name)
38+
fmt.Printf("%d\n", c.qty)
39+
fmt.Printf("%t\n", c.legit)
40+
fmt.Printf("%#v\n", c.toys)
41+
42+
c.PrintName("Hello World!")
43+
c.UpdateQty(111, 3)
44+
45+
var u Updater
46+
u = c
47+
u.UpdateQty(11, 8)
48+
49+
fmt.Printf("%v\n", c)
50+
}

go_files/lnx32

1.8 MB
Binary file not shown.

go_files/lnx32s

1.24 MB
Binary file not shown.

go_files/lnx64

1.99 MB
Binary file not shown.

go_files/lnx64s

1.41 MB
Binary file not shown.

go_files/win32.exe

1.87 MB
Binary file not shown.

go_files/win32s.exe

1.29 MB
Binary file not shown.

go_files/win64.exe

2.07 MB
Binary file not shown.

go_files/win64s.exe

1.48 MB
Binary file not shown.

go_func.py

+70
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
from struct import *
2+
3+
class Parser:
4+
5+
def __init__(self, md_parser):
6+
self.md_parser = md_parser
7+
8+
def parse_ftabs(self):
9+
mdp = self.md_parser
10+
mdp.f.seek(mdp.moduledata_raw, 0)
11+
12+
FTABS_OFFSET = 3 * mdp.ptr_size
13+
FTABS_SIZE = 3 * mdp.ptr_size
14+
15+
mdp.f.seek(FTABS_OFFSET, 1)
16+
ftabs = mdp.f.read(FTABS_SIZE)
17+
18+
ftabs_va, ftabs_len, _ = unpack('<' + 3 * mdp.ptr_type, ftabs)
19+
ftabs_raw = mdp.va2raw(ftabs_va)
20+
21+
FTAB_SIZE = mdp.ptr_size * 2
22+
23+
funcs = {}
24+
25+
for i in range(0, ftabs_len - 1, 1):
26+
mdp.f.seek(ftabs_raw + (i * FTAB_SIZE), 0)
27+
ftab = mdp.f.read(FTAB_SIZE)
28+
func_addr, _func_offset = unpack("<" + 2 * mdp.ptr_type, ftab)
29+
_func_raw = mdp.pclntab_off( _func_offset)
30+
func = self._parse_func(_func_raw)
31+
funcs[hex(func_addr)] = func
32+
return funcs
33+
34+
def _parse_func(self, _func_raw):
35+
# https://golang.org/src/runtime/runtime2.go
36+
# type _func struct {
37+
# entry uintptr // start pc
38+
# nameoff int32 // function name
39+
#
40+
# args int32 // in/out args size
41+
# deferreturn uint32 // offset of start of a deferreturn call instruction from entry, if any.
42+
#
43+
# pcsp int32
44+
# pcfile int32
45+
# pcln int32
46+
# npcdata int32
47+
# funcID funcID // set for certain special runtime functions
48+
# _ [2]int8 // unused
49+
# nfuncdata uint8 // must be last
50+
# }
51+
52+
mdp = self.md_parser
53+
mdp.f.seek(_func_raw)
54+
55+
# _func is bigger than this, but we only want to read the offset to the
56+
# function name for now
57+
FUNC_SIZE = mdp.ptr_size + 4
58+
_func = mdp.f.read(mdp.ptr_size + 4)
59+
_func_va, _func_name_offset = unpack("<" + mdp.ptr_type + "L", _func)
60+
_func_name_raw = mdp.pclntab_off(_func_name_offset)
61+
62+
mdp.f.seek(_func_name_raw)
63+
64+
# TODO: Not sure if there's a max length for function names
65+
data = mdp.f.read(512)
66+
for i in range(0, len(data), 1):
67+
if data[i] == 0:
68+
break
69+
_func_name = data[:i].decode("Utf-8")
70+
return { "name": _func_name }

go_itab.py

+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
from struct import *
2+
from go_type import Kind
3+
4+
class Parser:
5+
6+
def __init__(self, md_parser):
7+
self.md_parser = md_parser
8+
9+
def parse_itablinks(self, types):
10+
mdp = self.md_parser
11+
mdp.f.seek(mdp.moduledata_raw, 0)
12+
13+
ITABLINKS_OFFSET = 33 * mdp.ptr_size
14+
ITABLINKS_SIZE = 3 * mdp.ptr_size
15+
16+
mdp.f.seek(ITABLINKS_OFFSET, 1)
17+
itablinks_data = mdp.f.read(ITABLINKS_SIZE)
18+
19+
itablinks_va, itablinks_len, _ = unpack("<" + 3 * mdp.ptr_type, itablinks_data)
20+
itablinks_raw = mdp.va2raw(itablinks_va)
21+
22+
ITABLINK_SIZE = mdp.ptr_size
23+
ITAB_SIZE = 3 * mdp.ptr_size + 8
24+
25+
itablinks = {}
26+
itabs = {}
27+
28+
for i in range(0, itablinks_len, 1):
29+
itl_raw = itablinks_raw + (i * ITABLINK_SIZE)
30+
itl_va = itablinks_va + (i * ITABLINK_SIZE)
31+
itablinks[hex(itl_va)] = { "name": "moduledata_itablink." + str(i+1) }
32+
33+
mdp.f.seek(itl_raw, 0)
34+
35+
itl_data = mdp.f.read(ITABLINK_SIZE)
36+
it_va = unpack("<" + mdp.ptr_type, itl_data)[0]
37+
it_raw = mdp.va2raw(it_va)
38+
39+
mdp.f.seek(it_raw)
40+
itab_data = mdp.f.read(ITAB_SIZE)
41+
itab_inter_va, itab_type_va , _, _ , itab_func_ptrs = unpack("<" +
42+
2 * mdp.ptr_type + "LL" + mdp.ptr_type, itab_data)
43+
44+
itab_inter = types.get(hex(itab_inter_va))
45+
itab_type = types.get(hex(itab_type_va))
46+
47+
if itab_inter is None:
48+
itab_inter_name = "undefined_interface_" + hex(itab_inter_va)
49+
else:
50+
itab_inter_name = itab_inter["name"]
51+
52+
if itab_type is None:
53+
itab_type_name = "undefined_type_" + hex(itab_type_va)
54+
else:
55+
itab_type_name = itab_type["name"]
56+
57+
itabs[hex(it_va)] = {
58+
"interface_name": itab_inter_name,
59+
"type_name": itab_type_name
60+
}
61+
62+
return itablinks, itabs

0 commit comments

Comments
 (0)