Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Alive2 as a library #1075

Open
can-leh-emmtrix opened this issue Jul 24, 2024 · 1 comment
Open

Using Alive2 as a library #1075

can-leh-emmtrix opened this issue Jul 24, 2024 · 1 comment

Comments

@can-leh-emmtrix
Copy link
Contributor

This issue collects some ideas on how to improve on using alive2 as a library. I think that there is lots of potential in having a well tested symbolic execution library for LLVM translation validation tasks. I have been working on a translation validation tool based on alive2 for a couple of months now. This issue collects some ideas on how to improve on some of the rough edges I encountered.

API

A more extensive public API:

  1. A more fully featured API for manipulating the IR. While most IR manipulation could (and depending on the application should) be done directly in LLVM, alive2 could still benefit from a few more IR inspection and manipulation functions. Some of the functions I added in our fork include:
  • Getters for fields of instructions (BinOp::getOp, BinOp::getFlags, FpBinOp::getOp, FpBinOp::getFastMathFlags, FpBinOp::getRoundingMode, FpBinOp::getExceptionMode, Select::getCond, Phi::getFastMathFlags, ...)
  • Getting a mutable reference to an instruction at a given index in a basic block (BasicBlock::at)
  • Getting the number of instructions in a basic block (BasicBlock::size)
  • Moving an instruction out of a basic block
  • Removing the block terminator in O(1) (BasicBlock::popInstr)
  • Getting the number of basic blocks in a function (Function::getNumBBs)
  • Checking if an instruction is a terminator (Instr::isTerminator)
  • A common API for constructing alive2 types without going through LLVM or alive_parser (not yet implemented)
  1. An API for accessing expressions from State. Some of the functions I added in our fork are:
  • State::getPath(BasicBlock&) for getting the path expression based on predecessor_data
  • State::getUB(BasicBlock&) for getting the UB expression based on predecessor_data

Architectural Improvements

Currently alive2 often relies on global variables to store state. This makes building and maintaining tools built on top of alive2 more difficult. Often manually setting and resetting this state at the correct positions is required (e.g. via reset_state or State::resetGlobals). Global state is especially problematic when the tool using alive2 needs to call into alive2 multiple times in series, as changes to globals are retained between calls.

While I know that this is probably quite a difficult refactoring, ideally I would like to move all globals inside config.h into a Config struct. The main advantage here is that executing a program symbolically with different settings is easier from a library user perspective. Additionally all globals inside globals.h could be stored inside an AliveContext. However, this seems like a big undertaking to me, as these global variables are used pretty extensively throughout the source code.

I think that some of this work could be done incrementally such as in #1074.

@nunoplopes
Copy link
Member

Improving the API is no-brainer. That's ok.

Removing globals is where I don't see a benefit.
Right now we use a global Z3 context. We can remove all other globals, but you are always going to have to stick with that one. That means, no multi-threading, and no construction of multiple things at once. Removing the global Z3 context is possible, but that requires some engineering to make sure that the default build has no perf degradation (i.e., the context remains global).

So, we either start with the hard things, or just shifting around the easy ones is useless, because you'll never get what you want.
The contract of when things need to be reset can't be changed. You'll still have to reset state and globals nevertheless. You are just shifting the function you call for that? The SMT solver still needs to be reset once in a while. And we want to set some global variables in the right way to optimize the SMT encoding (see, e.g., alive-exec where we don't optimize the encoding and don't reset things; it just executes the whole program).

So my question stands: what's so bad about globals (they are mostly static, so they are not leaking into your binary anyway)? What's the initialization/reset contract that doesn't suit your application?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants