Skip to content

Assignment 1 Python

bjjwwang edited this page May 7, 2025 · 6 revisions
Assignment-1
|-- CPP
|   |-- Assignment-1.cpp
|   |-- Assignment-1.h
|   |-- CMakeLists.txt
|   `-- test.cpp
|-- Python
|   |-- AndersenPTA.py
|   |-- Assignment1.py
|   |-- Main.py
|   |-- Test.py
`-- Tests
    |-- SrcSnk.txt
    `-- testcases
        |-- icfg
        |   |-- test1.c
        |   |-- test1.ll
        |   |-- test2.c
        |   `-- test2.ll
        |-- pta
        |   |-- test1.c
        |   |-- test1.ll
        |   |-- test2.c
        |   |-- test2.ll
        |   |-- test3.c
        |   |-- test3.ll
        |   |-- test4.c
        |   `-- test4.ll
        `-- taint
            |-- test1.c
            |-- test1.ll
            |-- test2.c
            `-- test2.ll


1. Get the latest Assignment-1 code template

* Before coding, please type cd $HOME/Software-Security-Analysis and git pull in your terminal to make sure you always have the latest version of the code template before each assignment.

If git pull fails due to the conflict with your local changes, type git stash to store your current code in a temporal branch and type git pull again. If you want to retrieve your code back, type git stash pop.

2. Assignment 1 coding task

- Implement the following methods of class ICFGTraversal and AndersenPTA in Assignment1.py by using some SVF APIs here.

Function Description Marks
read_srcsnk_from_file Identify sources and sinks by parsing APIs in SrcSnk.txt for reachability analysis 20%
reachability Context-sensitive reachability analysis on the ICFG 30%
solve_worklist Field-sensitive inclusion-based points-to analysis (Andersen's analysis) 30%
alias_check Check aliases of the two variables at source and sink. Two variables
are aliases if their points-to sets have at least one overlapping element.
20%

- Tainted Information Flow:

Given a tainted source v1@src (variable v1 at program point src), we say that a sink v2@snk is also tainted if both the following conditions satisfy: (1) src reaches snk on the ICFG via context-sensitive reachability analysis, and (2) pts(v1) ∩ pts(v2) ≠ ∅ inferred by Andersen's field-sensitive analysis. Note that in this assignment, v1 is the return value when calling a source function, and v2 is any argument of the sink function.

- Tips for implementing reachability and solve_worklist.

The implementation of reachability differs from the one in Lab-Exercise-1 in that the paths collected need to be feasible in terms of context sensitivity (calls and returns ICFGNodes must match on each program path). The implementation of solve_worklist also differs from the one in Lab-Exercise-1 by following an additional field-sensitive rule, which distinguishes fields of each struct but is array-insensitive (treating all elements of an array as one object). Please refer to this API to obtain a field object (get_gep_obj_var) given a struct object and a field index. The constraint solving stops until a fixed point is reached (i.e., no new COPY edges are added and the points-to sets are unchanged). No particular order when resolving edges is needed when performing the constraint solving.

C-like form Constraint form Solving rule Explaination
p = &o p <--ADDR-- o pts(p) = pts(p) ∪ {o} add o into p's points-to set
q = p q <--COPY-- p pts(q) = pts(q) ∪ pts(p) union p's points-to set into q's one
q = *p q <--LOAD-- p for each o ∈ pts(p) : add q <--COPY-- o for each o in p's points-to set, add a COPY edge from o to q (if it is not on the graph)
*p = q p <--STORE-- q for each o ∈ pts(p) : add o <--COPY-- q for each o in p's points-to set, add a COPY edge from q to o (if it is not on the graph)
q = &p->fld q <--GEP, fld-- p for each o ∈ pts(p) : pts(q) = pts(q) ∪ {o.fld} for each o in p's points-to set, add o's field object o.fld into q's points-to set

- To test your implementation (sanity checks)

You can run Test.py under Python folder to check your implementation. The test cases are located in the testcases under Tests folder. You can add your own test cases in the testcases folder and run the test script to validate your implementation. If you have any wrong code, the output will be like this(change URL):

If your code is correct, the output will be like this(change URL):

🔧 Workflow: Convert C++(C) to SVG for ICFG Visualization

This guide walks you through converting a C++(C) source file to an interactive SVG representation of its Interprocedural Control Flow Graph (ICFG) using SVF tools.

🧾 Step 1. Convert .cpp to .ll (LLVM IR)

clang++ -S -emit-llvm source.cpp(c) -o output.ll
  • -S: Generate assembly (LLVM IR).
  • -emit-llvm: Emit LLVM IR instead of machine code.
  • -o output.ll: Output filename for the IR.

🛠️ Step 2. Generate a .dot file from .ll using SVF

wpa -ander -dump-icfg output.ll
  • wpa: Whole Program Analysis tool from SVF.
  • -ander: Enables Andersen’s pointer analysis.
  • -dump-icfg: Dumps the Interprocedural Control Flow Graph (ICFG) into a DOT file.

✏️ Step 3. Rename the ICFG dot file

mv svfir_initial.dot output.dot
  • Rename the auto-generated dot file to something meaningful.

🎨 Step 4. Convert the .dot file to .svg for visualization

dot -Tsvg output.dot -o output.svg
  • dot: Graphviz command-line tool.
  • -Tsvg: Output format as SVG.
  • -o output.svg: Specifies the output file name.

🌐 Step 5. Open the .svg file in Chrome

google-chrome output.svg

✅ Alternatively, you can open it directly in VSCode with the Graphviz Preview extension.

✅ Example: Full Command Pipeline

clang++ -S -emit-llvm ass1.cpp -o ass1.ll
wpa -ander -dump-icfg ass1.ll
mv svfir_initial.dot ass1.dot
dot -Tsvg ass1.dot -o ass1.svg
google-chrome ass1.svg

🐞 Debugging Tips

🔍 Debugging tips for ICFG

Add -dump-icfg as an extra option of -icfg for your ass1 executable when debugging your reachability implementation.
This will dump ICFG into a dot file to view in VSCode.


🧠 Debugging tips for ConstraintGraph and Points-to Sets

Add -print-pts as an extra option of -pta for your ass1 executable when debugging your solveWorklist implementation.

  • Use -print-pts to print the final points-to set of each node to validate MAYALIAS and NOALIAS.
  • Use -print-constraint-graph to print edges and nodes of the constraint graph.
  • Use -dump-constraint-graph to export the graph to a dot file for visualization.

Retrieve or manipulate a variable's points-to set using the SVF APIs shown here.

Constraint Edge Corresponding Color in Dot graphs
(PAG and ConstraintGraph)
ADDR Green
COPY Black or (dashed arrow for interprocedural COPY edges)
LOAD Red
STORE Blue
GEP Purple

- Upload Assignment1.py to UNSW WebCMS for your submission.

Your implementation will be evaluated against our 10 internal tests. You will get the full marks if your code can pass them all. Our internal tests are private. Here, we only provided a handful test cases under Assignment-1/Tests/testcases. You are encouraged to add more test cases by yourself to validate the correctness of your implementation.

- You will be working on Assignment1.py only and there is NO need to modify other files under the Assignment-1 folder

3. Configuration && debugging

To debug in PyCharm with parameters, first set the parameters in Run > Edit Configurations, like below

After you set parameters, then set a breakpoint in your code, and finally right-click the script and select Debug to start debugging.

Clone this wiki locally