Record & Pointer Types by JTrenerry · Pull Request #78 · agle/bincaml

JTrenerry · 2026-03-05T00:32:30Z

Adds two new types Records and Pointers. Primarily to support #58.

The value for Pointer is just a Bitvec.t and Records are [offset : Bitvec.t], I think records should probably be any const from AllOps.const, or something but this adds lots of complexities and importing issues

Types

Records

Records consist of a list of fields, fields are defined to be a offset, size and a type.
Syntax {(offset): type, ...}

PoInters

Pointers consist of a lower and upper type.

Syntax: ptr(lower_type, upper_type)

Operations

Records

FSET FACCESS - Field set and Field access

Pointers

PADD - Pointer addition

Changes made to other works

~~- lib/lang/expr_smt - Added Record to logic type, extremely unsure about this~~

lib/analysis/defuse_bool - Added basic logic about records and pointers, field information isn't taken into consideration and would require a significant rework I believe.
Type check changed to make sure the new operations are type correct, load and stores are not currently restricted by pointers

lib/util/types.ml

lib/util/record.ml

JTrenerry · 2026-03-06T06:15:32Z

note: warnings due to eval_smt.ml

agle · 2026-03-06T07:49:26Z

#77 contains the smt generator for datatypes but has no evaluation representation or semantics, it probably makes sense to make the records in this pr a specialisation of ADTs.
However the ADT type does not support subtyping, maybe thats a reason to keep the distinct record type and have the type checker allow subtyping if we make the algebraic subtyping the main type system of the IR; we will need an updated type checker to support this though, and I'm not sure what the value is. At the final generation phase we will have to flatten the types down as the type system of boogie/smt doesn't support subtyping.

For the purposes of generating proofs:

having the record field set/get intrinsics is good as it means we can generate whatever appropriate implementation for them; either in the SMTLib datatype theory or our own axiomatisation.
To simplify generation we disallow anonymous (unnamed) records and require a type declaration be present to define a Sort the values of the record type belong to; the inference pass will have to generate a unique name and declaration for each different record type.
I understand there is no ordering or relative offset semantics to the records here so we may as well make the field keys strings so we can eventually make them a human readable name? However if we don't encode a layout explicitly this makes it impossible to store records to the memory byte array---maybe an operation we need to maintain?
I think it would be preferable to make pointers be integers rather than bitvectors so that the pointer type encodes the assumption of well behaved pointer arithmetic; absence of overflow etc. (this will require a stronger analysis than type inference to translate the operations soundly though---the naive transform could just do pointer calculations as bitvectors then coerce the bitvector to an integer at the last minute (this would require adding a unary coercion operator). Maybe its not worth the complexity though and we could just make pointers and bitvectors interchangeable as far as the operations are concerned.
Bitvectors can be negative so having both poiner addition and subtraction is redundant (assuming it is signed addition)
If the pointer type is based on the SVA should pointer values include the region/global variable it is pointing to?

JTrenerry · 2026-03-06T09:40:26Z

I think it would be preferable to make pointers be integers rather than bitvectors so that the pointer type encodes the assumption of well behaved pointer arithmetic; absence of overflow etc. (this will require a stronger analysis than type inference to translate the operations soundly though---the naive transform could just do pointer calculations as bitvectors then coerce the bitvector to an integer at the last minute (this would require adding a unary coercion operator). Maybe its not worth the complexity though and we could just make pointers and bitvectors interchangeable as far as the operations are concerned.

I did think about this, but thinking about it more, it is a lot of effort and at least in my head would require infinite passes (not realistically but theoretically).

Bitvectors can be negative so having both poiner addition and subtraction is redundant (assuming it is signed addition)

True

If the pointer type is based on the SVA should pointer values include the region/global variable it is pointing to?

I would have to double check, but I believe load and store operations are the only thing that can make pointers so yeah they could

I understand there is no ordering or relative offset semantics to the records here so we may as well make the field keys strings so we can eventually make them a human readable name?

The current key is based on the bit offset, it can show gaps in the record, though this would be only the gaps inbetween fields or at the start and cannot show gaps at the end.

To simplify generation we disallow anonymous (unnamed) records and require a type declaration be present to define a Sort the values of the record type belong to; the inference pass will have to generate a unique name and declaration for each different record type.

Could / Should this be extracted out into it's own shared generator? i.e. all type declarations are named by the same function which uses the ID.gen setup?

having the record field set/get intrinsics is good as it means we can generate whatever appropriate implementation for them; either in the SMTLib datatype theory or our own axiomatisation.

Meaning I should add / make the record stuff intrinsic operations instead?

However the ADT type does not support subtyping

What do you mean by subtyping here? As in records cannot have a field that has a type? I don't think records really have subtypes but are composed of types instead.

JTrenerry · 2026-03-09T00:25:51Z

If the pointer type is based on the SVA should pointer values include the region/global variable it is pointing to?

The pointer values could store this information if it is needed. Although it may be possible for pointers to be joined when they are pointing at different regions.

agle · 2026-03-09T01:15:34Z

I did think about this, but thinking about it more, it is a lot of effort and at least in my head would require infinite passes (not realistically but theoretically).

It shouldn't be intractable; we could use a modified taint analysis and similarly to any other analysis we can push the precise pointer semantics through as far as possible and in the worst case when the safety of pointer operations degrades to a too-hard termination proof we can overapproximate/widen by inserting a trapping coercion. Like I said we can separate concerns by just always insert a trapping coercion initially and later write analyses that clean it up and improve the precision. That's not to say its neccessarily the best idea, currently all our pointers are represented by bitvectors anyway so we get some benefit from just tagging variables/values as storing pointers.

The current key is based on the bit offset, it can show gaps in the record, though this would be only the gaps inbetween [sic] fields or at the start and cannot show gaps at the end.

If you allow integer values to be record fields you are making it impossible define a finite layout for the record I am saying you have to choose one well-defined sematics for your records: either

(1) a record is an algebraic product type with disjoint fields that are accessed by a denotational accessor that is purely a name matching the field name, record values are incomparable with any other type. In this case numeric offsets are just misleading.
(2) a record is a description of a memory layout with precisely defined field offsets and sizes, fields correspond to offsets into the datatype, you define the operation of accessing multiple fields simultaneously, and allow the record to be memcpy'd as a bitvector/byte array representation. A record is a subtype of bitvector of the same size.

I believe the binsub paper is probably going to infer something more similar to (2) and we can abstract that to a type like (1) by using nested records when multiple-field accesses have been observed. (1) however puts more burden on the transform as we need a new memory representation to allow storing record values to memory without converting them to a bitvector representation. My point is just that we make a deliberate choice and don't end up with something messy in between these two.

Could / Should this be extracted out into it's own shared generator? i.e. all type declarations are named by the same function which uses the ID.gen setup?

Yeah the question is more whether we keep this attached to the program to allow generating distinct types later, it might be useful. It would probably make sense to add a field to program storing a typing context for non-local type information, but I want to ensuring that as much as possible type information is stored locally (i.e. attached to expressions not in some external datatstructure) so analyses don't have to pass through a type context object in order to reason about types.

JTrenerry · 2026-03-09T05:03:45Z

@katrinafyi looks like we use different formatters for nix, what one do you use?

katrinafyi · 2026-03-09T23:38:54Z

@katrinafyi looks like we use different formatters for nix, what one do you use?

I actually do the formatting myself, because I don't like the formatter styles (they all prioritise smaller diff over anything else). But if you want to use a formatter, that's fine too. I can suck it up (or make my own formatter).

JTrenerry · 2026-03-10T01:31:32Z

@katrinafyi looks like we use different formatters for nix, what one do you use?

I actually do the formatting myself, because I don't like the formatter styles (they all prioritise smaller diff over anything else). But if you want to use a formatter, that's fine too. I can suck it up (or make my own formatter).

Okay, I shouldn't be touching those files often anyway, and probably only flake.nix, but you may have to suck it up.

JTrenerry self-assigned this Mar 5, 2026

JTrenerry added the enhancement New feature or request label Mar 5, 2026

agle reviewed Mar 5, 2026

View reviewed changes

lib/util/types.ml Outdated Show resolved Hide resolved

JTrenerry marked this pull request as ready for review March 5, 2026 06:17

JTrenerry requested a review from agle March 5, 2026 06:17

JTrenerry marked this pull request as draft March 5, 2026 06:30

JTrenerry marked this pull request as ready for review March 6, 2026 04:55

agle reviewed Mar 6, 2026

View reviewed changes

lib/util/record.ml Outdated Show resolved Hide resolved

JTrenerry force-pushed the type-refactor branch from a41dfcd to 9037079 Compare March 9, 2026 04:18

JTrenerry added 16 commits March 9, 2026 14:40

Initial types and operations defined

de94058

New operations should be done

baae938

remove size field from records

1599508

types added and analysis fixed

7d139dd

fmt

0b069a8

Change sep from f -> ,

0db9681

Fux eval_expr_alg and some parsing issues

8529978

Change record to be map, and record and pointer contain there type

5fe62e7

Finishes of eval functions

3cf406f

test code and correct prints

d8b2eb5

add test file

3b3d674

fix fset printer and add final test

3fbd0af

move record and remove smt

d3d70f5

fix failing test

3193ce1

change both prints

cfb040e

forgor

e439f3a

JTrenerry added 7 commits March 9, 2026 14:53

remove ptrsub

e58847a

fmt

22d18e7

remove ptrsub

3e00e54

change ZMap to be StringMap

41a72a1

fix test

741ec63

fix from rebase

e5baf8b

remove field2

5b5a974

JTrenerry force-pushed the type-refactor branch from bad9a97 to 5b5a974 Compare March 9, 2026 04:59

expr -> expr2

00094e7

adds some comments for adt record functions

890dfcf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record & Pointer Types#78

Record & Pointer Types#78
JTrenerry wants to merge 25 commits intomainfrom
type-refactor

JTrenerry commented Mar 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

JTrenerry commented Mar 6, 2026

Uh oh!

agle commented Mar 6, 2026 •

edited

Loading

Uh oh!

JTrenerry commented Mar 6, 2026 •

edited

Loading

Uh oh!

JTrenerry commented Mar 9, 2026

Uh oh!

agle commented Mar 9, 2026

Uh oh!

JTrenerry commented Mar 9, 2026

Uh oh!

katrinafyi commented Mar 9, 2026

Uh oh!

JTrenerry commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JTrenerry commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Types

Records

PoInters

Operations

Records

Pointers

Changes made to other works

Uh oh!

Uh oh!

Uh oh!

JTrenerry commented Mar 6, 2026

Uh oh!

agle commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JTrenerry commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JTrenerry commented Mar 9, 2026

Uh oh!

agle commented Mar 9, 2026

Uh oh!

JTrenerry commented Mar 9, 2026

Uh oh!

katrinafyi commented Mar 9, 2026

Uh oh!

JTrenerry commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JTrenerry commented Mar 5, 2026 •

edited

Loading

agle commented Mar 6, 2026 •

edited

Loading

JTrenerry commented Mar 6, 2026 •

edited

Loading