Like busybox and toybox, abox is a single binary. It can be
built statically or dynamically and can be run either directly as a
multi-call binary (abox $command), or it can run via a sym-link
(assuming the name of the symlink is a valid abox command.
This project does not aim for the smallest possible binary or the fastest possible implementation.
The main goal of this project is didactic: this is designed to be an educational project, both for the authors and the readers of the source. It also places a strong emphasis on correctness.
Learning assembly language programming is hard, particularly if you only have cryptically terse uncommented source to read!
Hence, the primary concrete objective is to create a functional and correct tool with a clear and easily understandable and maintainable codebase.
Initially, the code will make use of libc calls for convenience and development speed.
Once the codebase is functionally complete and all tests are in place, the intention is to remove the need for libc by switching to using only system calls and assembly language implementations of libc functions.
Although the code is 100% 64-bit Intel x86_64 assembly code, the
source is structured in a way that it is possible to add additional
architectures.
-
No crazy register shuffling (aka keeping all values in registers)
Yes, it's incredibly efficient, but only a compiler can keep track of the code and it makes debugging very hard.
-
No crazy stack manipulations.
Overuse of
pushandpopis just as bad as register shuffling in terms of making the code difficult to understand.However, a single push and then referencing the value on the stack (via a macro /
equfor clarity) is fine. -
No huge static buffers.
Either allocate storage dynamically, or use the BSS segment with named variables (labels).
-
Comments should be used as much as possible to explain the code.
Unlike higher level languages all assembly is cryptic so explain what you're doing!
-
Use the stack (rather than the BSS section) for local variables.
-
Ensure the stack is 16-byte aligned.
-
All commands must have a unit test.
-
Labels should have meaningful names.
No cryptic "compiler-generated" names!
-
Commands should make all labels and variables private by using a dot prefix for all labels.
-
Commands should use the
rodatasection where possible for defining constant strings. -
Commands should use an assembler directive (such as
equ,%define,%assign, etc) for defining constants. -
Rather than calling a function using the
callinstruction, use thedcallmacro. This detects stack misalignment issues for non-release builds and can be used to set a breakpoint on everycallinstruction.
-
Create a new
.asmfile (arch/x86_64/src/cmds/${command}.asm). -
Create a
globalsymbol for a function calledcommand_${command}. -
Ensure the symbol/function creates and destroys a stack frame.
-
The return code (in the
raxregister) should be:0on success.-1if the command failed.-2if an option or argument was invalid.
Notes:
-
See
command.incfor the symbolic names for standard error return values. -
If an error occurs, the command should generally display an error message to
stderrin addition to returning a negative value.
-
The command will be passed it's arguments in the normal SysV ABI manner:
rdicontains the argument count (argc)rsicontains containing the address of the argument array (argv).
-
argv[0]will be set to the name of the command, not the name of the multi-call binary. -
Create a second
globalsymbol calledcommand_help_${cmd}.This should be a null-terminated string defined in the
.rodatasection that describes the command briefly and lists all available options (aka a usage statement).
For example, to add a new foo command create
arch/x86_64/src/cmds/foo.asm containing:
global command_foo
global command_help_foo
section .rodata
command_help_foo: db "This command creates unicorns, ",10 \
db "fairies and pixies.",10, \
db 10, \
db "Options:",10, \
db "-a : ...",10, \
db "-b : ...",10, \
db "-z : ...",10, \
db "See echo(1)",0
section .text
;---------------------------------------------------------------------
; Description: Implement the standard `foo` command.
;
; C prototype equivalent:
;
; int command_foo(int argc, char *argv[]);
;
; Parameters:
;
; - Input: RDI (integer) - argc.
; - Input: RSI (address) - argv.
; - Output: RAX (integer) - 0 on success, -1 on error.
;
; Notes:
;
; Limitations:
;
; See: `foo(1)`.
;
;---------------------------------------------------------------------
command_foo:
; Create stack frame
push rbp
mov rbp, rsp
; Preserve register value (callee saved)
; XXX: Also preserve r12, r13, r14, r15!
push rbx
;-----------------------------------
; FIXME: function body goes here.
;-----------------------------------
.out:
; Restore callee saved register
pop rbx
; Destroy stack frame
leave
retTo simplify stack handling and local variables, the convention is for all functions to:
- Always create a stack frame.
- Always push and pop
rbx. - Adjust the stack pointer using a multiplication expression that represents the number of variables.
- Assume all fundamental type variables (at least int, pointer, char, even bool) occupy 8 bytes. This isn't efficient but it makes the code clearer.
Use the prologue_with_vars and epilogue_with_vars macros to simplify
the code. Calls to these macros must be paired and called with the same
numeric values.
Example:
; Create stack frame, preserves rbx on the stack and "allocates" space
; on the stack for the specified number of 64-bit variables.
prologue_with_vars 3
;--------------------
; Stack offsets.
.var1 equ 0 ; int
.var2 equ 8 ; "char *"
.var3 equ 16 ; ssize_t
;--------------------
; ...
.out:
; "deallocates" stack space for specified number of variables (which
; obviously needs to be the same value passed to `prologue_with_vars`),
; destroys the stack frame, and restores rbx.
epilogue_with_vars 3
; Return from function.
retThe assemblers do not create debug symbols for structures (DWARF DW_TAG_structure_type).
This means that you cannot cast to compound (C/C++ struct) types in gdb(1). However, a
workaround is to define a C type with the same layout as the ASM
struc (macro!) type but with a different name. You can then cast the ASM type
to the C type in gdb(1).
For example, to build with the C definition of the head commands struc Block:
$ make EXTRA_C_SOURCES="extra/c_block.c"Then, assuming rax contains a pointer to a struc Block, you can run:
$ gdb abox
(gdb) p (CBlock *)$rax