-
-
Notifications
You must be signed in to change notification settings - Fork 454
Replacing bincode to postcard in concolic #3714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -11,21 +11,21 @@ | |||||||||||||||||
| //! Specifically: | ||||||||||||||||||
| //! * it requires only constant memory space for serialization, which allows for tracing complex and/or | ||||||||||||||||||
| //! long-running programs. | ||||||||||||||||||
| //! * the trace itself requires little space. A typical binary operation (such as an add) typically takes just 3 bytes. | ||||||||||||||||||
| //! * the trace itself requires little space. A typical binary operation (such as an add) typically takes just a 3 bytes. | ||||||||||||||||||
| //! * it easy to encode. There is no translation between the interface of the runtime itself and the trace it generates. | ||||||||||||||||||
| //! * it is similarly easy to decode and can be easily translated into an in-memory AST without overhead, because | ||||||||||||||||||
| //! expressions are decoded from leaf to root instead of root to leaf. | ||||||||||||||||||
| //! expressions are decoded from leaf to root instead of root to leaf. | ||||||||||||||||||
| //! * At its core, it is just [`SymExpr`]s, which can be added to, modified and removed from with ease. The | ||||||||||||||||||
| //! definitions are automatically shared between the runtime and the consuming program, since both depend on the same | ||||||||||||||||||
| //! `LibAFL`. | ||||||||||||||||||
| //! | ||||||||||||||||||
| //! ## Techniques | ||||||||||||||||||
| //! The serialization format applies multiple techniques to achieve its goals. | ||||||||||||||||||
| //! * It uses bincode for efficient binary serialization. Crucially, bincode uses variable length integer encoding, | ||||||||||||||||||
| //! * It uses postcard for efficient binary serialization. Crucially, postcard uses variable length integer encoding, | ||||||||||||||||||
| //! allowing it encode small integers use fewer bytes. | ||||||||||||||||||
| //! * References to previous expressions are stored relative to the current expressions id. The vast majority of | ||||||||||||||||||
| //! expressions refer to other expressions that were defined close to their use. Therefore, encoding relative references | ||||||||||||||||||
| //! keeps references small. Therefore, they make optimal use of bincodes variable length integer encoding. | ||||||||||||||||||
| //! keeps references small. Therefore, they make optimal use of postcard's variable length integer encoding. | ||||||||||||||||||
| //! * Ids of expressions ([`SymExprRef`]s) are implicitly derived by their position in the message stream. Effectively, | ||||||||||||||||||
| //! a counter is used to identify expressions. | ||||||||||||||||||
| //! * The current length of the trace in bytes in serialized in a fixed format at the beginning of the trace. | ||||||||||||||||||
|
|
@@ -43,64 +43,58 @@ | |||||||||||||||||
| //! ... making for a total of 5 bytes. | ||||||||||||||||||
|
|
||||||||||||||||||
| use core::fmt::{self, Debug, Formatter}; | ||||||||||||||||||
| use std::io::{self, Cursor, Read, Seek, SeekFrom, Write}; | ||||||||||||||||||
| use std::io::{self, Seek, SeekFrom, Write}; | ||||||||||||||||||
|
|
||||||||||||||||||
| use bincode::{ | ||||||||||||||||||
| config::{self, Configuration}, | ||||||||||||||||||
| decode_from_std_read, encode_into_std_write, | ||||||||||||||||||
| error::{DecodeError, EncodeError}, | ||||||||||||||||||
| }; | ||||||||||||||||||
| use postcard::Error as PostcardError; | ||||||||||||||||||
|
|
||||||||||||||||||
| use super::{SymExpr, SymExprRef}; | ||||||||||||||||||
|
|
||||||||||||||||||
| fn serialization_options() -> Configuration { | ||||||||||||||||||
| config::standard() | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| /// A `MessageFileReader` reads a stream of [`SymExpr`] and their corresponding [`SymExprRef`]s from any [`Read`]. | ||||||||||||||||||
| pub struct MessageFileReader<R: Read> { | ||||||||||||||||||
| reader: R, | ||||||||||||||||||
| deserializer_config: Configuration, | ||||||||||||||||||
| /// A `MessageFileReader` reads a stream of [`SymExpr`] and their corresponding [`SymExprRef`]s from a byte buffer. | ||||||||||||||||||
| pub struct MessageFileReader<'a> { | ||||||||||||||||||
| buffer: &'a [u8], | ||||||||||||||||||
| current_id: usize, | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| impl<R: Read> Debug for MessageFileReader<R> { | ||||||||||||||||||
| impl Debug for MessageFileReader<'_> { | ||||||||||||||||||
| fn fmt(&self, f: &mut Formatter) -> fmt::Result { | ||||||||||||||||||
| write!(f, "MessageFileReader {{ current_id: {} }}", self.current_id) | ||||||||||||||||||
| write!( | ||||||||||||||||||
| f, | ||||||||||||||||||
| "MessageFileReader {{ current_id: {}, remaining: {} bytes }}", | ||||||||||||||||||
| self.current_id, | ||||||||||||||||||
| self.buffer.len() | ||||||||||||||||||
| ) | ||||||||||||||||||
| } | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| impl<R: Read> MessageFileReader<R> { | ||||||||||||||||||
| /// Construct from the given reader. | ||||||||||||||||||
| pub fn from_reader(reader: R) -> Self { | ||||||||||||||||||
| impl<'a> MessageFileReader<'a> { | ||||||||||||||||||
| /// Construct from the given buffer. | ||||||||||||||||||
| #[must_use] | ||||||||||||||||||
| pub fn from_buffer(buffer: &'a [u8]) -> Self { | ||||||||||||||||||
| Self { | ||||||||||||||||||
| reader, | ||||||||||||||||||
| deserializer_config: serialization_options(), | ||||||||||||||||||
| buffer, | ||||||||||||||||||
| current_id: 1, | ||||||||||||||||||
| } | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| /// Parse the next message out of the stream. | ||||||||||||||||||
| /// [`Option::None`] is returned once the stream is depleted. | ||||||||||||||||||
| /// IO and serialization errors are passed to the caller as [`DecodeError`]. | ||||||||||||||||||
| /// Parse the next message out of the buffer. | ||||||||||||||||||
| /// [`Option::None`] is returned once the buffer is depleted. | ||||||||||||||||||
| /// Serialization errors are passed to the caller as [`PostcardError`]. | ||||||||||||||||||
| /// Finally, the returned tuple contains the message itself as a [`SymExpr`] and the [`SymExprRef`] associated | ||||||||||||||||||
| /// with this message. | ||||||||||||||||||
| /// The `SymExprRef` may be used by following messages to refer back to this message. | ||||||||||||||||||
| pub fn next_message(&mut self) -> Option<Result<(SymExprRef, SymExpr), DecodeError>> { | ||||||||||||||||||
| match decode_from_std_read(&mut self.reader, self.deserializer_config) { | ||||||||||||||||||
| Ok(mut message) => { | ||||||||||||||||||
| pub fn next_message(&mut self) -> Option<Result<(SymExprRef, SymExpr), PostcardError>> { | ||||||||||||||||||
| if self.buffer.is_empty() { | ||||||||||||||||||
| return None; | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| // Use postcard's take_from_bytes which deserializes and returns remaining bytes | ||||||||||||||||||
| match postcard::take_from_bytes::<SymExpr>(self.buffer) { | ||||||||||||||||||
| Ok((mut message, remaining)) => { | ||||||||||||||||||
| self.buffer = remaining; | ||||||||||||||||||
| let message_id = self.transform_message(&mut message); | ||||||||||||||||||
| Some(Ok((message_id, message))) | ||||||||||||||||||
| } | ||||||||||||||||||
| Err(e) => match e { | ||||||||||||||||||
| DecodeError::Io { | ||||||||||||||||||
| inner: ref io_err, .. | ||||||||||||||||||
| } => match io_err.kind() { | ||||||||||||||||||
| io::ErrorKind::UnexpectedEof => None, | ||||||||||||||||||
| _ => Some(Err(e)), | ||||||||||||||||||
| }, | ||||||||||||||||||
| _ => Some(Err(e)), | ||||||||||||||||||
| }, | ||||||||||||||||||
| Err(e) => Some(Err(e)), | ||||||||||||||||||
|
||||||||||||||||||
| Err(e) => Some(Err(e)), | |
| Err(e) => { | |
| // On error, mark the buffer as depleted so callers iterating until `None` | |
| // do not repeatedly hit the same error without making progress. | |
| self.buffer = &[]; | |
| Some(Err(e)) | |
| } |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this still allocates. you clear the old buffer then throw it away and replace it(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, clearing the buffer, then using it for new message, so maximum size of the buzzer will be the max size of message received. Earlier i was storing all the messages sequentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you still us allocvec that allocates a vector. The comments say nothing is being allocated for each message but it's not what actually happens..
Copilot
AI
Feb 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trace_len is read as u64 and then cast with as usize. On 32-bit targets this can truncate large values and make the length check/slicing incorrect. Use usize::try_from(trace_len_u64) and return an io::Error on overflow instead of truncating.
| let trace_len = u64::from_le_bytes(len_bytes) as usize; | |
| let trace_len_u64 = u64::from_le_bytes(len_bytes); | |
| let trace_len = usize::try_from(trace_len_u64).map_err(|_| { | |
| io::Error::new( | |
| io::ErrorKind::InvalidData, | |
| "Trace length too large for this architecture", | |
| ) | |
| })?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module docs claim serialization uses “only constant memory space”, but
write_messagenow allocates a freshVecviapostcard::to_allocvecfor every message. This contradicts the stated design goal and can add per-message allocation overhead. Consider serializing directly into the targetWrite(or reusing a scratch buffer across calls) to keep memory usage bounded and avoid repeated allocations.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable