This repository was archived by the owner on Sep 27, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathlog08.txt
777 lines (602 loc) · 22 KB
/
log08.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
Well, I think the next thing to do is make the
interpreter take input from STDIN. With that, I'll be
able to not only play with it interactively, but also
redirect and pipe instructions into it, which will mean
being able to have some regression tests as well.
[ ] Get input string from 'read' syscall
[ ] Add some testing (shell script?)
Testing is one of those things that a lot of people have
strong opinions about. But I absolutely love having a
reasonable number of tests in place to let me know that
I haven't broken something. Tests let me be _more_
creative and _more_ brave with the code because I can
try something and know right away whether or not it
works.
I have come to _loathe_ manual testing because I've been
in the Web dev world forever and testing on the web
SUCKS. In a lot of cases, the state of the art is still
refreshing a browser and clicking through a bunch of
pages. When you're used to that, it is a _delight_ to be
able to easily set up some STDIN/STDOUT tests on a
command line!!!
Okay, that's more than enough about that. It's time to
get real input!
Right off the bat, I know I'm going to need to handle
this in three places:
* get_token
* eat_spaces
* quote
On the plus side, I'm happy with my input functionality
(especially the string literals) and I don't regret
how they've turned out. But now's when I have to pay the
price for four separate methods that read input.
This is definitely going to test my resolve to stick
with the "inline all the things" redundant code. But I
don't think "get_input" or whatever I end up calling it
will be very long. Might even be under 100 bytes. So
three copies shouldn't be too bad. :-)
(When I wrote the above paragraphs, I thought I was
going to have up to six copies, but I kept realizing
that most of those places weren't actually reading a
whole stream of input - they were relying on one of
these three to do it.)
Okay, stripped of comments, this is get_input:
mov ebx, [input_file]
mov ecx, input_buffer
mov edx, INPUT_SIZE
mov eax, SYS_READ
int 0x80
cmp eax, INPUT_SIZE
jge %%done
mov byte [input_buffer + eax], 0
%%done:
mov dword [input_buffer_pos], input_buffer
It's tiny - just the Linux 'read' syscall to get more
input into the input_buffer. The only interesting thing
is that if we read more than an entire buffer's worth,
it null-terminates the string.
Now I gotta use it in (at least?) three places. Two of
them also needed to be updated now that I understand
what the esi and edi registers are for, ha ha. Anyway,
here's a typical example from 'eat_spaces':
cmp esi, input_buffer_end ; need to get more input?
jl .continue ; no, keep going
GET_INPUT_CODE ; yes, get some
jmp .reset ; got more input, reset and continue
I kept simplifying until I got down to those four lines.
But does it work? Just a couple dumb mistakes and
then...
$ mr
hello
Could not find word "hello" while looking in IMMEDIATE mode.
Exit status: 1
Wow! I don't know why I typed "hello" as my first live
input into this thing. But it totally worked. Ha ha, I
probably don't need an unfound word to be a fatal error
anymore. :-)
How about something that *will* work:
$ mr
"Hello world!\n" print
Hello world!
Goodbye.
Exit status: 0
Yay! My first live "Hello world" in this interpreter!
I'm still exiting after one line of input. I'll have to
figure that out. Do I read until an actual EOF character
is encountered? I can't remember. I'm just super excited
this works!
So after all that hand-wringing about having a couple
copies of this code, how much impact has that actually
had?
Here's the relevant bits from 'inspect_all':
get_input: 45 bytes IMMEDIATE COMPILE
get_token: 108 bytes IMMEDIATE
eat_spaces: 80 bytes IMMEDIATE COMPILE
quote: 348 bytes IMMEDIATE COMPILE
Let's compare with the previous log07.txt results:
get_token: 55 bytes IMMEDIATE
eat_spaces: 38 bytes IMMEDIATE COMPILE
quote: 247 bytes IMMEDIATE COMPILE
Since I cleaned up some of the words, there wasn't an
across-the board increase of 45 * 3 bytes.
I'd like to see what the grand total has become. And
I'll probably want to do that often. So I'll make a new
option in my build.sh script:
if [[ $1 == 'bytes' ]]
then
AWK='/^.*: [0-9]+/ {t=t+$2} END{print "Total bytes:", t}'
echo 'inspect_all' | ./$F | awk -e "$AWK"
exit
fi
Okay, let's see the damage:
$ ./build.sh bytes
Total bytes: 2816
The last run in log07.txt was 2655 bytes, so the
difference is:
2816 (current)
- 2655 (previous)
------
161
Ha ha, only 161 bytes difference, and since one of the
three copies is needed, I only gained 116 bytes of
"bloat". I think I can live with that on the x86
platform. :-)
Now I gotta figure out how to continue reading after one
line of input.
Oh, wait! One last thing. I had also set the input
buffer to an artificially tiny size so I could make sure
it was being refilled as needed. I'll add a DEBUG
statement to see where that's happening.
The buffer size is 16 bytes.
$ mr
GET_INPUT00000000
"This is a jolly long string to make sure we read plenty into input buffer a couple times.\n" print
GET_INPUT00000000
GET_INPUT00000000
GET_INPUT00000000
GET_INPUT00000000
GET_INPUT00000000
GET_INPUT00000000
This is a jolly long string to make sure we read plenty into input buffer a couple times.
Goodbye.
Exit status: 0
Okay, perfect, that long line of input required 7 calls
to 'get_input' to refill the input_buffer. Now I'll set
it to a reasonable size. I've seen some conflicting
stuff online, so I'll just take the coward's way out:
%assign INPUT_SIZE 1024 ; size of input buffer
Now to figure out how to keep reading after the first
line (or token?) of input.
Okay, so I do need to check the return value from 'read'
because that's the only way I can know if I've really
got an EOF instead of just "no more input at this
moment" - as would be the case between when the user
hits enter and types the next line of input.
I also added a new eof global that I can trip as soon as
any of the 'get_input' instances hits the end of input:
cmp eax, 0 ; 0=EOF, -1=error
jge %%normal
mov dword [input_eof], 1 ; set EOF reached
%%normal:
dave@cygnus~/meow5$ mr
"Hello world!\n" print
Hello world!
: loud_meow "MEOW!\n" print ;
loud_meow
MEOW!
exit
Exit status: 12
Heh, that's so cool. I can finally interact with this
thing for real. But CTRL+D doesn't exit. I had to type
'exit' to make that happen.
I'll add a debug to 'get_input' to see what 'read' is
returning...
dave@cygnus~/meow5$ mr
"goodbye cruel world" print
read bytes: 0000001c
goodbye cruel world <---- I typed ENTER here
read bytes: 00000001
<---- ENTER again here
read bytes: 00000001
read bytes: 00000000 <---- CTRL+D
read bytes: 00000001 <---- ENTER again
exit
read bytes: 00000005
Exit status: 1
Okay, so I guess I'm not checking the input_eof flag
correctly in my interpreter loop?
No! Ha, perhaps you spotted it before I did in the
assembly snippet? Here it is again:
cmp eax, 0 ; 0=EOF, -1=error
jge %%normal
mov dword [input_eof], 1 ; set EOF reached
%%normal:
Silly mistake:
jge %%normal
should be
jg %%normal
so that 0 will trigger EOF!
Okay, that pretty much worked. But there's still some
inelegant code in the interpreter where I feel like I'm
checking for input too many times and it's somehow still
not enough.
I was null-terminating it and I think I would be better
off setting an upper bound on it.
Two nights later: Okay, just about have the kinks worked
out. I've got two new global variables to keep track of
the input buffer:
input_buffer: resb INPUT_SIZE
input_buffer_pos: resb 4
input_buffer_end: resb 4 <--- new
input_eof: resb 4 <--- new
Now I can check input_eof in any input words and in the
outer interpreter.
Okay, I'm stuck in 'eat_spaces'. I'm peppering it with
DEBUG macro calls to see what's up. esi contains the
current character in the input buffer (if it's a space,
we want to advance past it). ebx contains the last
position filled in the buffer by 'read'.
$ mr
eat_spaces pos: 0804c774
eat_spaces RESET, pos: 0804c774
ES more input! esi: 0804c774
ES more input! ebx: 0804c774
45 234 "hello!" meow <----- I typed this
read bytes: 00000015
eat_spaces RESET, pos: 0804c774
ES more input! esi: 0804c774
ES more input! ebx: 0804c774
read bytes: 00000000 <----- I typed CTRL+D here
get_input EOF! 00000001
eat_spaces RESET, pos: 0804c774
get_next_token checking for EOF 0804ace2
Goodbye.
Exit status: 0
Well, that would be a problem. Looks like esi and ebx
are always the same value. Oops!
LOL, that's exactly it. I forgot to save the new end of
buffer pointer in 'get_input'. Here we are:
mov dword [input_buffer_end], ebx ; save it
Do you like super verbose logging? You'll love this.
Here I am printing "hello" and then quitting with
CTRL+D. It's hard to even find the interaction amidst
all the noise:
eat_spaces pos: 0804c7d5
eat_spaces RESET, pos: 0804c7d5
eat_spaces looking at char... 0000000a
ES more input! esi: 0804c7d6
ES more input! ebx: 0804c7d6
"hello" print
read bytes: 0000000e
eat_spaces RESET, pos: 0804c7cc
eat_spaces looking at char... 00000022
get_next_token checking for EOF 0804ad12
get_next_token looking at chars. 0804ad12
quote0804c7cc
eat_spaces pos: 0804c7d3
eat_spaces RESET, pos: 0804c7d3
eat_spaces looking at char... 0804c320
eat_spaces looking at char... 0804c370
eat_spaces pos: 0804c7d4
eat_spaces RESET, pos: 0804c7d4
eat_spaces looking at char... 00000070
get_next_token checking for EOF 0804ad12
get_next_token looking at chars. 0804ad12
get_token0804c7d4
helloeat_spaces pos: 0804c7d9
eat_spaces RESET, pos: 0804c7d9
eat_spaces looking at char... 0000000a
ES more input! esi: 0804c7da
ES more input! ebx: 0804c7da
read bytes: 00000000
get_input EOF! 00000001
eat_spaces RESET, pos: 0804c7cc
get_next_token checking for EOF 0804ad12
Goodbye.
Exit status: 0
But it works. I'll clean this up tomorrow night and see
if I can add a simple test script.
Next night: The DEBUGs are cleaned up. Now a couple
housekeeping things. First, I want to complete that TODO
item from the last log, a word to print all defined
words (just the names, not the entire 'inspect' output.
I think I'll call it 'all'.
[ ] New word: 'all' to list all current word names
Well, that was easy:
$ mr
all
all inspect_all inspect ps printmode printnum number decimal bin oct hex radix str2num quote num2str ; return : copystr get_token eat_spaces get_input find is_runcomp get_flags inline print newline strlen exit
Goodbye.
Exit status: 0
I also added a non-destructive stack printing word last
log and I never actually got it working. So I'd like to
fix that.
[ ] Finish 'ps' (non-destructive stack print)
And since I have string escape sequences for
runtime newline printing and NASM can include newlines
in string literals with backticks, I'd like to remove
the 'newline' word. I'm only using it in a couple places
anyway.
[ ] Remove word 'newline' (replace with `\n`)
That one was super-easy too. I didn't really need a TODO
item for it. But it'll feel good to show that checked
box at the end of the log, so why not?
Now for that print stack:
$ mr
42 ps
1 4290881940 0 4290881948 4290881964 4290881982
4290882002 4290882040 4290882048 4290882106 ...
It just keeps going on and on. And then ends in a
Segmentation fault. So clearly I've got something wrong.
When the interpreter starts, I save the stack pointer to
a variable.
mov dword [stack_start], esp
I want to do a sanity check, so I'll push two values:
push dword 555
push dword 42
Let's see this in action to confirm how x86 stacks work:
$ mb
Reading symbols from meow5...
(gdb) break 877
Breakpoint 1 at 0x8049f92: file meow5.asm, line 877.
(gdb) r
Starting program: /home/dave/meow5/meow5
Breakpoint 1, _start () at meow5.asm:877
Okay, let's see what the stack register current points
to (and by using GDB's 'display', this will always print
after every command):
(gdb) disp $esp
1: $esp = (void *) 0xffffd780
(gdb) disp *(int)$esp
2: *(int)$esp = 1
I've noticed that 1 (one) when I was trying to debug the
stack before. I have no idea why that's there. That's
something else to figure out.
Anyway, we can see that the "first" stack address:
0xffffd780
And as I push values onto the stack, esp should
decrement by 4 since the x86 stack writes to memory
backward. (By the way, I feel a rant about how we
describe this coming on, stay tuned for that in a
moment.)
-------------------------------------------------------
NOTE
-------------------------------------------------------
By the way, I often manually manipulate these GDB
sessions here in my logs so that the instruction I'm
executing shows up right before I start examining
memory. Sorry if that confuses people who are
well-versed in GDB and are wondering what the heck is
going on.
-------------------------------------------------------
Now I'll just verify that my stack_start variable indeed
holds the same value as esp and it points to that '1' at
the beginning of the stack:
877 mov dword [stack_start], esp
(gdb) s
1: $esp = (void *) 0xffffd780
2: *(int)$esp = 1
(gdb) x/a (int)stack_start
0xffffd780: 0x1
Yup. No surprises so far.
Now when I push, we should see esp decrement and point
to the newly pushed value:
879 push dword 555
(gdb) s
1: $esp = (void *) 0xffffd77c
2: *(int)$esp = 555
880 push dword 42
(gdb) s
1: $esp = (void *) 0xffffd778
2: *(int)$esp = 42
Looks good so far!
0xffffd780 1
0xffffd77c 555
0xffffd778 42
...I think. I'm really no good at hex calculations in my
head. Even easy ones. Let's confirm with 'dc', the old
RPN desk calculator on UNIX systems since forever:
$ dc
16 i 10 o <--- set input and output base to 16 (get it?)
1A 5 + p
1F <--- just making sure it's set up okay
D780 p
D780 <--- 0xffffd780
4 - p
D77C <--- 0xffffd77c
4 - p
D778 <--- 0xffffd778
dc is crazy. Anyway, those addresses are right. Every
push subtracts 4 from esp and writes the pushed value to
that address.
So when I examine the stack area of memory, I should be able to
subtract 4 from my stack_start variable and see each
value. When I hit the current value of esp, that's the
last value on the stack and I'm done:
(gdb) x/d (int)stack_start
0xffffd780: 1
(gdb) x/d (int)stack_start -4
0xffffd77c: 555
(gdb) x/d (int)stack_start -8
0xffffd778: 42
Great! So the computer is doing what I think it's doing.
Always a good sign. :-)
*****************************************************
* RANT ALERT * RANT ALERT * RANT ALERT * RANT ALERT *
*****************************************************
Okay, so my issue with how we talk about stacks is the
use of terms like "top" and "bottom".
If we start with the stack of plates analogy, it's
perfectly fine to talk about the top of the stack
because it makes physical sense:
===== <--- top plate
=====
=====
=====
But where's the "top" of this memory?
+-----+
| | 0x0000
+-----+
| | ...
+-----+
| | 0xFFFF
+-----+
Okay, now where's the "top" of this memory?
+-----+
| | 0xFFFF
+-----+
| | ...
+-----+
| | 0x0000
+-----+
Where's the "top" of the stack in this memory?
+-----+
| === | 0xFFFF } stack start
+-===-+ } stack
| === | ... } stack
+-----+
| | 0x0000
+-----+
And the "top" of the stack in this memory?
+-----+
| === | 0x0000 } stack start
+-===-+ } stack
| === | ... } stack
+-----+
| | 0xFFFF
+-----+
Or this?
+-----+
| | 0xFFFF
+-----+
| === | ... } stack
+-===-+ } stack
| === | 0x0000 } stack start
+-----+
Or this?
+-----+
| | 0x0000
+-----+
| === | ... } stack
+-===-+ } stack
| === | 0xFFFF } stack start
+-----+
I've seen ALL of these representations over the years
and the person making the diagram just passes it off
like their own personal mental model is completely
obvious.
This situation is nuts.
And I know Intel's official docs for x86 use the "top"
and "bottom" terms. But guess what? Intel's "word" size
on 64-bit processors is 16 bits, so I think we can
safely ignore their advice on terminology.
Personally, I don't picture ANY of the diagrams above.
Instead, I imagine the stack as horizontal memory and
the stack grows to the right:
+--------------------
| A | B | C | D | E --->
+--------------------
^ ^
oldest current
But you'll notice that I don't say "rightmost" or
"leftmost". That would be ridiculous. Especially since
x86 has a stack that grows from a high-numbered address
to a lower-numbered address. So it's really more like
this:
--------------------+
<--- E | D | C | B | A |
--------------------+
^ ^
0xE4 0xFF
(current) (oldest)
Anyway, the point is that using directional descriptions
as if we were all looking at the same physical object is
super confusing.
I prefer stack descriptions such as:
* current / newest / recent
* older / previous
* oldest
* hot vs cold
* surfaced / buried
And so on. I'm sure you can think of some better ones.
Actually, please do.
*****************************************************
* RANT ALERT * RANT ALERT * RANT ALERT * RANT ALERT *
*****************************************************
Sorry about that. I do feel better now. So, I've made
some changes in how I do the stack printing (I needed to
basically reverse everything I was doing, ha ha) and
let's see if it works now:
$ mr
ps
1
42 555 97 33
ps
1 42 555 97 33
"Hello $ $ $" print
Hello 33 97 555
ps
1 42
"I put $ on there, but where does the $ come from???\n" print
I put 42 on there, but where does the 1 come from???
Goodbye.
Exit status: 0
I don't know if that's hard to follow or not? It's
tempting to make some sort of prompt in the interpreter
just so it's easier to see the commands I type versus
the responses.
Anyway, it works great. I just don't understand why
there's a 1 on the stack when I start?
I guess it doesn't really matter. It occurs to me that I
should consider the start of the stack to be the *next*
available position. I'll update that now.
From:
mov dword [stack_start], esp
To:
lea eax, [esp - 4]
mov [stack_start], eax
Did that fix it?
ps
42 16 ps
42 16
8 ps
42 16 8
Yup! Now we start with nothing on the stack and adding
items to the stack only shows those items.
Now how about a test script? I'm a big fan of simple
tests that are just enough to give me the peace-of-mind
that I haven't broken anything that used to work.
One thing that works just fine now that I take input on
STDIN is piping input:
$ echo "42 13 ps" | ./meow5
42 13
Goodbye.
And I can grep/ag the results to make they contain what
I want.
But I remember 'expect' from back when I was heavy into
Tcl. I think I'll give that a shot to interactively
drive the interpreter and test it.
Expect is so cool. Here's my whole test script so far:
#!/usr/bin/expect
spawn ./meow5
# Print a string
send -- "\"Meow\\n\" print\r"
expect "Meow"
# Consruct meow and test it
send -- ": meow \"Meow. \" print ;\r"
send -- "meow\r"
expect "Meow. "
# Consruct meow5 and test it
send -- ": meow5 meow meow
meow meow meow \"\\n\" print ;\r"
send -- "meow5\r"
expect "Meow. Meow. Meow. Meow. Meow."
# Exit (send CTRL+D EOF)
send -- "\x04"
expect eof
The long meow5 definition line has been broken onto the
next line for this log.
Here it is running!
$ ./test.exp
spawn ./meow5
"Meow\n" print
Meow
: meow "Meow. " print ;
meow
Meow. : meow5 meow meow meow meow meow "\n" print ;
meow5
Meow. Meow. Meow. Meow. Meow.
Goodbye.
I'll add a new alias for it now. (Defined by my "meow"
function in .bashrc):
alias mt="./build.sh ; ./test.exp"
Sweet! That wraps up this log and the goals I had for
it. I'll ad more to the test script as I go. This was
just go get it started.
[x] Get input string from 'read' syscall
[x] Finish 'ps' (non-destructive stack print)
[x] New word: 'all' to list all current word names
[x] Remove word 'newline' (replace with `\n`)
[x] Add some testing (expect!)
I think I might make some math words next so I can use
the language to do basic stuff like add and subtract!