xv6 Extended

Description

This project features the compilation, modification and usage of the xv6 operating system. The main goal of this project is to get familiar with the operating systems and their structure.

Besides, it includes the implementation of custom system calls, environment variables support including PATH and some shell improvements.

Compilation and running

Requirements

GCC compiler
QEMU emulator (including qemu-system-i386)
GNU Make build system

Note

You may need to install some additional packages or modify build settings depending on your OS and hardware.

On non-ELF systems, you will need to install the cross-compiler toolchain to produce x86 ELF binaries or use x86 Linux virtual machine to build the project. Remember to modify the Makefile accordingly.

Compilation

Note

This repository contains fixes for GCC compilers newer than 12.0.0. It is tested on the following configurations:

GCC 10.5.0, 11.4.0, 12.3.0, 13.2.1

QEMU 8.0.4, 8.1.0

and Ubuntu 22.04 LTS (GCC 11.4.0, QEMU 6.2.0).

If you are working on a similar configuration, most likely no additional actions will be required.

To build the xv6 operating system, you need to run the following command:

make

Running

In order to run the operating system, do the following:

make qemu

or run without X environment:

make qemu-nox

Note

For debugging purposes, see qemu-gdb and qemu-nox-gdb rules in the Makefile.

Usage and features

The xv6 operating system is a simple UNIX-like operating system and is very similar in usage to it's UNIX counterparts. It features a simple shell, which is used to run programs and interact with the operating system. Let's walk through the basic usage of the programs, commands and shell I've implemented.

Hello world program

xv6> hello

This is the simplest program which prints "Hello world!" to the console.

cp

This program copies the contents of one file to another. It takes two arguments: the source file and the destination file.

xv6> cp file1 file2

It works by iteratively reading the source file into a buffer and writing it to the destination file.

mv

Moves the file from one location to another. It takes two arguments: the source file and the destination file.

xv6> mv file1 file2

mv is a little bit more interesting, but much simpler than the cp program. It works just by creating a hard link to the source file in the destination path and then removing the source file.

testenv

xv6> testenv

This program was created for testing and demonstrating purposes. It prints all the environment variables of the current process to the console by reading the environ array. For more information about implementation of environment variables, see the Additional tasks implemented section.

export and unset

xv6> export VAR=value # Sets the environment variable VAR to value
or
xv6> export # Prints all the environment variables

xv6> unset VAR

export and unset are built-in shell commands which allow you to set and unset environment variables. Works by modifying the environ array of the current process by calling the setenv() and unsetenv() ulib functions accordingly.

Variable expansion

xv6> <command> $VAR

My implementation of the shell supports variable expansion. It works by replacing all the occurrences of the variable with it's value in the command line. Uses the getenv() ulib function to get the value of the variable.

For example, you can try:

xv6> export GREP_PATTERN=Copyright
xv6> cat README | grep $GREP_PATTERN

PATH environment variable

xv6> export PATH=/

This is a special environment variable which is used to specify the directories in which the shell will search for the executable files. It is used by the execvpe() function. It's explained in more detail in the Additional tasks implemented section.

Extended functionality

PATH environment variable support

In this section, I want to dive deeper into the implementation of the environment support and the PATH variable.

Possible approaches

The original xv6 operating system does not support environment variables. It simply does not have any API for setting, getting, modifying or passing environment variables to the processes. I had to come up with a way to implement this functionality.

The possible approaches to this problem are:

Simply creating some global PATH variable and using it in the exec() function to search for the executable files.

This approach is very simple and straightforward, but it has a lot of drawbacks. First of all, it is not very flexible. There is no way to modify the PATH variable for a specific process. Besides, it is not very convenient to use, because there is no 'canonical' way for user to modify the PATH variable. It indeed could be done by implementing auxiliary system calls and corresponding user programs like setpath and getpath, but there is still no way to pass the PATH variable to user programs and allow them to modify and use it in their own way.

Adding a new entry to the process structure which will contain the pointer to the environment variables array.

Although it seems like a slightly better approach, it's not. Now we can modify the environment variables for a specific process, but there is still no way to pass the environment variables to the user programs. However, we must store additional information in the process structure, which breaks the OS structure.

Moreover, these approaches do not implement environment variables support as such. They only implement the 'isolated' PATH variable, which is, well, not very useful and interesting.

I wanted to fully implement the environment variables support, so I had to come up with a better solution.

My implementation

The 'right' way of passing environment variables to the processes is execve() system call. It takes three arguments: the path to the executable file, the array of arguments and the array of environment variables.

int execve(const char *path, char *const argv[], char *const envp[]);

Here's how it's executed on the UNIX systems:

My implementation works very similarly. I had to add the execve() system call as a replacement for existing exec() To do this, I had to modify the syscall.c file, which contains the array of function pointers to the system calls:

static int (*syscalls[])(void) = {
        ...
        [SYS_execve]    sys_execve,
        ...
}

Also we should add the function prototype to this file, as we are going to implement it in the sysfile.c file:

extern int sys_execve(void);

To properly index the system calls array, we should also add the SYS_execve constant to the syscall.h file:

#define SYS_execve  7

sys_execve(), in turn, reads the arguments using fetchint() and fetchstr() functions and calls the exec() function, which is responsible for loading the executable file into the memory and executing it:

int
sys_execve(void)
{
  char *path, *argv[MAXARG], *envp[MAXENV];
  
// read the arguments

  return exec(path, argv, envp);
}

For user programs to be able to use the execve() system call, we should add the proper interface to the usys.S file:

...
SYSCALL(execve)
...

and add the user function prototype to the user.h file:

...
int execve(const char*, char* const*, char* const*);

Continuing with envp, it is often passed as a third argument to the program's main() function:

int main(int argc, char *argv[], char *envp[]);

It's achieved by first placing the pointer to the environment variables array on stack which, depending on the system's ABI (let's use the System V ABI), looks like this:

   * sp     :    argc
   * argv   :    argv[0]
   *             argv[1]
   *             ...
   *             NULL
   * envp   :    envp[0]
   *             envp[1]
   *             ...
   *             NULL

Then it is the responsibility of the exec() function to prepare the stack. Let's take a closer look at the exec() function to understand how my modification works:

  ...
  for (envc = 0; envp[envc]; envc++) {
    if (envc >= MAXENV)
      goto bad;
    sp = (sp - (strlen(envp[envc]) + 1)) & ~3;
    if(copyout(pgdir, sp, envp[envc], strlen(envp[envc]) + 1) < 0)
      goto bad;
    ustack[argc + 2 + envc] = sp;
  }
  ustack[argc + 2 + envc] = 0;
  ...
  // Other insignificant modifications

This modified version of the exec() function iterates over the envp array and copies the pointers to the environment variables to the stack.

Now envp pointer should be accessible from the user program. Well, almost. There is still one major problem. The xv6 operating system lacks the C runtime, especially crt0. crt0 is basically the set of startup routines linked into C program that performs initialization tasks before calling the main function. Indeed, let's take a look on how the main() function is called on the UNIX systems:

As wee see, the entrypoint of the program is not the main() function, but the _start() function, which calls __libc_start_main() function, which, in turn, calls the main() function.

We need these functions to properly initialize the environment and call the main() function with the correct arguments. And if we take a look on what POSIX says about environment variables, there is one more problem that we aim to solve. It says that the use of a third argument to the main function is not specified in POSIX. According to POSIX, the environment should be accessed via the external variable environ. Setting this variable is the responsibility of the __libc_start_main() function.

Let's take a look on my implementation of the crt0:

.text
.globl _start

_start:
    xorl %ebp, %ebp

	popl %esi
	movl %esp, %ecx

	// We don't need to align the stack to 16 bytes,
	// since SSE is disabled by -mgeneral-regs-only.
	pushl %ecx
	pushl %esi

	pushl $main

	call __libc_start_main
    call exit

The xorl %ebp, %ebp instruction is used to set the ebp register to zero. It is used to mark the end of the stack frame. This is the ABI suggestion and not our point of interest.

popl %esi instruction is used to pop the first argument argc from the stack to the esi register. Now esp points to the argv array. We need to save the esp value to the ecx register, because we will need it later. Now, what we should have done is to align the stack to 16 bytes, which is required by the SSE (Streaming SIMD Extensions) for memory and cache efficiency. However, here we can drop it for simplicity, especially because we disabled SSE by using the -mgeneral-regs-only

Now we should prepare the stack for the __libc_start_main() function:

int __libc_start_main(int (*main) (int, char**, char**), int argc, char** argv);

Note that my implementation of this function is simplified due to incompleteness of the xv6 operating system. Implementing the full version with all functionality, including __libc_csu_init(), __libc_csu_fini(), auxiliary vectors, etc., requires a lot of work and just unreasonable for this project.

The first argument is the main() function, the second argument is the argc and the third argument is the argv array. We push these to the stack and call the __libc_start_main() function:

pushl %ecx  // argv
pushl %esi  // argc
pushl $main // main
call __libc_start_main

Now, let's take a look on the ulibc_start.c file, which contains the implementation of the __libc_start_main() function and declares the environ variable:

char **environ;

int __libc_start_main(int (*main) (int, char**, char**), int argc, char** argv) {
  environ = &argv[argc + 1];

  return main(argc, argv, environ);
}

As we see, it takes care of setting the environ variable to point to the environment variables array. Its address is calculated by adding the argc to the argv array pointer and then adding one to the result (to skip the NULL terminator of the argv array). It then just calls the main() function with the correct arguments.

Well, to be completely precise here, setting the environ variable is the responsibility of the __libc_init_first() function. However, to keep things simple, I've decided to omit it.

As you may have noticed, there is one more function call in the _start function: exit(). It is used to terminate the process after the main() function returns, and that's why we had to manually call exit() in the user programs. Now we can just return from the main() function and start will take care of the rest. By the way, this also eliminates the need of pushing this weird fake return address to the stack: ustack[0] = 0xfffffff which is used in the original implementation of the exec() function for xv6.

Now we need to tell the linker we want to use _start as the entrypoint of programs. Let's take a look on the Makefile, namely the rule for building the user programs (starting with _):

ULIB = ulib.o usys.o printf.o umalloc.o

_%: %.o $(ULIB)
	objcopy --remove-section .note.gnu.property ulib.o
	$(LD) $(LDFLAGS) -N -e main -Ttext 0 -o $@ $^
	$(OBJDUMP) -S $@ > $*.asm
	$(OBJDUMP) -t $@ | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d' > $*.sym

The -e main option tells the linker to use main as the entrypoint of the program. We need to change it to _start:

$(LD) $(LDFLAGS) -N -e _start -Ttext 0 -o $@ $^

and link the programs against the ulibc_start.o along with crt0.o:

ULIB = ulib.o usys.o printf.o umalloc.o ulibc_start.o crt0.o

However, there is still one more step to take. We need to provide the API for setting, getting and modifying the environment variables. I've implemented the following functions in the ulib.c file:

int setenv(const char *name, const char *value, int overwrite);
int unsetenv(const char *name);
char *getenv(const char *name);

They are very similar to the POSIX functions with the same names. I won't include the implementation here, because it's again not our point of interest. However, feel free to take a look on the ulib.c file.

Now that's it! We have implemented the environment variables support in the xv6 operating system, and we can finally continue with the PATH variable.

The PATH variable, as stated above, is special. It is used by the execvpe(), which in contrast to execve() takes care of searching for the executable file in the directories specified by the PATH variable. Its implementation is pretty straightforward, however contains a lot of parsing and string manipulation, so let's only take a glance at the most important parts:

int execvpe(const char *file, char *const argv[], char *const envp[]) {
  if (strchr(file, '/') != 0) {
    return execve(file, argv, envp);
  }

  char *path = getenv("PATH");
  ...
  
  // Search for delimiter

  while (token != 0) {
    ...

    int exec_result = execve(full_path, argv, envp); // Try executing
    if (exec_result == 0) {
      return exec_result; // Return if succeeded
    }
    ...
    // Move on to the next entry
  }
  return -1;
}

As we see, it first checks if the file argument contains the / character. If it does, it means that the user specified the full path to the executable file, so we can just call the execve() function. Otherwise, we should search for the executable file in the directories specified by the PATH variable. It is done by iterating over the PATH variable and trying to execute the file in the current directory. If it succeeds, we return the result of the execve() function. Otherwise, we move on to the next entry in the PATH variable.

Note that execvpe() is a libc function, rather than a system call, used as a wrapper for the execve() system call.

Now we can finally use the PATH variable. As I've said before, there are export and unset built-in commands which allow us to set and unset the environment variables directly from the shell.

Feel free to try it!

Demo

You can find the demo-video file the demo folder: OS_xv6_demo.mp4

License

The MIT License (MIT)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
demo		demo
.cvsignore		.cvsignore
.dir-locals.el		.dir-locals.el
.gdbinit.tmpl		.gdbinit.tmpl
.gitignore		.gitignore
BUGS		BUGS
LICENSE		LICENSE
Makefile		Makefile
Notes		Notes
README		README
README.md		README.md
TRICKS		TRICKS
asm.h		asm.h
bio.c		bio.c
bootasm.S		bootasm.S
bootmain.c		bootmain.c
buf.h		buf.h
bye.c		bye.c
cat.c		cat.c
console.c		console.c
cp.c		cp.c
crt0.S		crt0.S
cuth		cuth
date.h		date.h
defs.h		defs.h
dot-bochsrc		dot-bochsrc
echo.c		echo.c
elf.h		elf.h
entry.S		entry.S
entryother.S		entryother.S
exec.c		exec.c
fcntl.h		fcntl.h
file.c		file.c
file.h		file.h
forktest.c		forktest.c
fs.c		fs.c
fs.h		fs.h
gdbutil		gdbutil
grep.c		grep.c
hello.c		hello.c
ide.c		ide.c
init.c		init.c
initcode.S		initcode.S
ioapic.c		ioapic.c
kalloc.c		kalloc.c
kbd.c		kbd.c
kbd.h		kbd.h
kernel.ld		kernel.ld
kill.c		kill.c
lapic.c		lapic.c
ln.c		ln.c
log.c		log.c
ls.c		ls.c
main.c		main.c
memide.c		memide.c
memlayout.h		memlayout.h
mkdir.c		mkdir.c
mkfs.c		mkfs.c
mmu.h		mmu.h
mp.c		mp.c
mp.h		mp.h
mv.c		mv.c
param.h		param.h
picirq.c		picirq.c
pipe.c		pipe.c
pr.pl		pr.pl
printf.c		printf.c
printpcs		printpcs
proc.c		proc.c
proc.h		proc.h
rm.c		rm.c
runoff		runoff
runoff.list		runoff.list
runoff.spec		runoff.spec
runoff1		runoff1
sh.c		sh.c
show1		show1
sign.pl		sign.pl
sleep1.p		sleep1.p
sleeplock.c		sleeplock.c
sleeplock.h		sleeplock.h
spinlock.c		spinlock.c
spinlock.h		spinlock.h
spinp		spinp
stat.h		stat.h
stressfs.c		stressfs.c
string.c		string.c
swtch.S		swtch.S
syscall.c		syscall.c
syscall.h		syscall.h
sysfile.c		sysfile.c
sysproc.c		sysproc.c
testenv.c		testenv.c
toc.ftr		toc.ftr
toc.hdr		toc.hdr
trap.c		trap.c
trapasm.S		trapasm.S
traps.h		traps.h
types.h		types.h
uart.c		uart.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xv6 Extended

Description

Compilation and running

Requirements

Compilation

Running

Usage and features

Extended functionality

Possible approaches

My implementation

Demo

License

About

Languages

License

andylvua/xv6-extended

Folders and files

Latest commit

History

Repository files navigation

xv6 Extended

Description

Compilation and running

Requirements

Compilation

Running

Usage and features

Extended functionality

Possible approaches

My implementation

Demo

License

About

Resources

License

Stars

Watchers

Forks

Languages