|
| 1 | + |
| 2 | + A brief overview of JWasm's source code |
| 3 | + |
| 4 | + 1. Source files |
| 5 | + |
| 6 | + file comment |
| 7 | + ----------------------------------------------------------------------- |
| 8 | + main.c contains main() |
| 9 | + cmdline.c parse command line |
| 10 | + assemble.c assembler (generic) |
| 11 | + input.c read source file, |
| 12 | + preprocessor (generic) |
| 13 | + calling preprocessor directives |
| 14 | + expans.c (text) macro expansion |
| 15 | + tokenize.c tokenizer, COMMENT directive |
| 16 | + condasm.c preprocessor conditional directives (IFx, ELSEx, ENDIF ) |
| 17 | + loop.c preprocessor loop directives (FOR, FORC, REPT, WHILE, ...) |
| 18 | + equate.c (preprocessor) EQU and '=' directives |
| 19 | + string.c preprocessor string directives (TEXTEQU, CATSTR, SUBSTR, ...) |
| 20 | + macro.c preprocessor MACRO and PURGE directives |
| 21 | + parser.c parser (generic) |
| 22 | + branch.c parsing of branch instructions (JMP, Jcc, CALL, JxxxZ, LOOPx ) |
| 23 | + expreval.c expression evaluator |
| 24 | + |
| 25 | + assume.c parsing of ASSUME directive |
| 26 | + context.c parsing of directives PUSHCONTEXT, POPCONTEXT |
| 27 | + cpumodel.c parsing of .MODEL and cpu (.8086, .80186, ...) directives |
| 28 | + data.c parsing of data directives (DB, DW, ... ), data labels |
| 29 | + handles data generation (+fixups) |
| 30 | + directiv.c parsing of various directives which have no other home |
| 31 | + end.c parsing of END, .STARTUP and .EXIT directives |
| 32 | + extern.c parsing of EXTERN, EXTERNDEF, COMM, PUBLIC, PROTO |
| 33 | + hll.c parsing of hll directives (.IF, .ELSE, .WHILE, .REPEAT, ...) |
| 34 | + invoke.c parsing of INVOKE directive |
| 35 | + labels.c parsing of LABEL directive, code labels |
| 36 | + listing.c parsing of listing directives (.LIST, .CREF, ...) |
| 37 | + writing of listing file |
| 38 | + option.c parsing of OPTION directive |
| 39 | + posndir.c parsing of ORG, ALIGN, EVEN directives |
| 40 | + proc.c parsing of PROC, ENDP, LOCAL directives |
| 41 | + generates procedure prologues and epilogues |
| 42 | + safeseh.c parsing of .SAFESEH directive |
| 43 | + segment.c parsing of SEGMENT (+ENDS) and GROUP directives |
| 44 | + simsegm.c parsing of simplified segment directives (.CODE, .DATA, ...) |
| 45 | + types.c parsing of STRUCT (+ENDS), UNION, TYPEDEF, RECORD directives |
| 46 | + |
| 47 | + omf.c handles OMF output format |
| 48 | + omffixup.c handles OMF fixup generation |
| 49 | + omfint.c handles OMF I/O |
| 50 | + coff.c handles COFF output format (32- and 64-bit) |
| 51 | + elf.c handles ELF output format (32- and 64-bit) |
| 52 | + bin.c handles binary and DOS MZ output format |
| 53 | + dbgcv.c handles output of CodeView symbolic debugging info |
| 54 | + |
| 55 | + reswords.c handles access to table of reserved words |
| 56 | + symbol.c handles access to - global and local - symbol (hash) table |
| 57 | + backptch.c handles backpatching (jump distance optimization) |
| 58 | + codegen.c handles instruction code generation (+fixups) |
| 59 | + fixup.c fixup creation |
| 60 | + fpfixup.c 16-bit floating-point fixup creation |
| 61 | + errmsg.c handles assembler error messages (non-fatal) |
| 62 | + fatal.c handles fatal assembler errors |
| 63 | + memalloc.c handles dynamic memory allocations |
| 64 | + tbyte.c handles TBYTE data format (10-byte floating-point format) |
| 65 | + queue.c handles internal queues |
| 66 | + mangle.c handles symbol name mangling (name decoration) |
| 67 | + apiemu.c handles C compiler peculiarities and bugs |
| 68 | + |
| 69 | + |
| 70 | + 2. Calling hierarchy |
| 71 | + |
| 72 | + main |
| 73 | + - main_init |
| 74 | + - main_fini |
| 75 | + - AssembleModule |
| 76 | + - AssembleInit |
| 77 | + - AssembleFini |
| 78 | + - OnePass |
| 79 | + - GetPreprocessedLine ( if pass == 1 ) |
| 80 | + - GetTextLine |
| 81 | + - Tokenize ( if pass > 1 ) |
| 82 | + - ParseLine |
| 83 | + - directive |
| 84 | + - data_init |
| 85 | + - EvalOperand |
| 86 | + - EvalOperand() |
| 87 | + - codegen() |
| 88 | + |
| 89 | + 1. main() |
| 90 | + |
| 91 | + cmdline parsing, wildcards |
| 92 | + calls AssembleModule() for each source module. |
| 93 | + |
| 94 | + 2. AssembleModule() |
| 95 | + |
| 96 | + assembles one module in at least 2 passes. |
| 97 | + |
| 98 | + 3. OnePass() |
| 99 | + |
| 100 | + Executes one pass for a module. Pass one is handled |
| 101 | + differently than the others, because the preprocessed |
| 102 | + lines are saved in this pass and then read in the |
| 103 | + consecutive passes. |
| 104 | + |
| 105 | + 4. Tokenize() |
| 106 | + |
| 107 | + The tokenizer. Scans a source line and detects reserved words, |
| 108 | + numbers, IDs, operators, literals. Converts the items to tokens |
| 109 | + stored in array tokenarray[]. |
| 110 | + |
| 111 | + 5. GetPreprocessedLine() |
| 112 | + |
| 113 | + This is the preprocessor. It |
| 114 | + - reads a line from the current source, |
| 115 | + - converts in into tokens ( function Tokenize() ) |
| 116 | + - calls macro expansion. |
| 117 | + - checks if the line contains a preprocessor directive |
| 118 | + preprocessor directives are IF, WHILE, REPEAT, INCLUDE, |
| 119 | + TEXTEQU, CATSTR, INSTR, ... |
| 120 | + - if yes, handles the directive and returns 0. |
| 121 | + - if no, returns the number of tokens found in the line |
| 122 | + |
| 123 | + 6. ParseLine() |
| 124 | + |
| 125 | + The parser. It does: |
| 126 | + - checks if first item is a code label (ID followed by ':'). If yes, |
| 127 | + a label is created ( function LabelCreate() ). |
| 128 | + - checks if current item is a directive. If yes, calls function |
| 129 | + directive() or - if directive is a "data definition directive" - |
| 130 | + function data_item() |
| 131 | + - checks if current item is predefined type or an arbitrary type. |
| 132 | + If yes, calls function data_item(). |
| 133 | + - if current item is an instruction, it calls the expression |
| 134 | + evaluator ( function EvalOperand() ), up to 3 times, to get |
| 135 | + the operands. |
| 136 | + - if more than 1 operand has been read, function check_sizes() |
| 137 | + is called, which verifies that the sizes of the operands will |
| 138 | + "match". |
| 139 | + - the code generator ( function codegen() ) is called. |
| 140 | + |
| 141 | + 7. codegen() |
| 142 | + |
| 143 | + The code generator. This part |
| 144 | + - scans the instruction table to find an entry which matches |
| 145 | + the number of operands and their types. |
| 146 | + - if an entry is found, the code bytes and fixups are generated |
| 147 | + and written into a buffer. |
| 148 | + |
| 149 | + 8. data_init() |
| 150 | + |
| 151 | + Handles data lines. This is either a data directive ( DB, DW, DD, ...), |
| 152 | + a predefined type ( BYTE, WORD, DWORD, ... ) or an arbitrary type, defined |
| 153 | + with TYPEDEF, STRUCT, UNION, .... |
| 154 | + |
| 155 | + 9. GetTextLine() |
| 156 | + |
| 157 | + Reads a line from the current source. This is either a macro, a source |
| 158 | + file or the global line queue, which is used to store generated code. |
0 commit comments