-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[indexer-alt] add coin jsonrpc impl #20931
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
2 Skipped Deployments
|
d100b09
to
dbb6916
Compare
// let cursor_coin_query = cb::table.select(StoredCoinBalanceBucket::as_select()).filter(cb::object_id.eq(cursor.to_vec())); | ||
let mut cursor_coins: Vec<StoredCoinBalanceBucket> = | ||
conn.results_with_raw_query(cursor_coin_query).await?; | ||
if cursor_coins.len() != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you are trying to achieve here, but there can and will be multiple entries with the same object_id in this table, since it is an append table, we add entries for new mutations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to get the balance bucket of the cursor coin here so that we can use the balance in the filter, because results are sorted by balance. And you are right I should account for the append-only-ness here. I've updated the query in the latest commit to select the row with the latest cp sequence number.
|
||
// First load the coins' object ids. | ||
// TODO: optimize this query according to the indexes in coin_balance_buckets table. | ||
let mut query = query!(" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you would have to rewrite the entire function with a full raw string as the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in the latest commit.
cb1f663
to
e6c0d42
Compare
Revamped and rebased this PR based on the latest changes in configs, pagination, errors and data loaders. |
e6c0d42
to
feffac2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking in pretty good shape, thanks @emmazzz. Here's a summary of the comments I left, from most to least important:
- Query performance -- I think the implementation of
getAllCoins
may have an unnecessary sort in it that we can avoid, and I'm a little bit worried about the DB's ability to efficiently plan the comparison on bucket, checkpoint, ID (so worth double checking that).- There's a tangent here about the size of the cursor and potentially using BCS encoded cursors which we can discuss.
- Transactional test runner change -- this is a bit of a larger change and needs some more design -- I've proposed something in the inline comments. The issue with the current approach is that the feature being introduced (interpolating object IDs into cursors) bridges two places where we use similar syntax for different things, which is likely to cause confusion.
- Errors -- I think are mostly there, but I made some suggestions to bring them fully in line (the biggest thing to address here is the conversion function not using
unwrap
s, the second biggest thing is the behaviour when we fail to fetch contents which we need to align on).
crates/sui-indexer-alt-e2e-tests/tests/jsonrpc/coin/deleted.move
Outdated
Show resolved
Hide resolved
"result": { | ||
"data": [ | ||
{ | ||
"coinType": "0x0000000000000000000000000000000000000000000000000000000000000002::sui::SUI", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This output makes me think that we are not using the coin balance buckets index as efficiently as we could, because we are merging together different coin types, but IIRC coin type is the second field in the index after owner (so we should be returning all the results for one coin type in one go).
This also implies that the cursor should include the coin type. The cursor is also already quite large, so this may indicate that we need to introduce a BCS-encoded cursor (similar to what we did for GraphQL), but we can do that as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is for getAllCoins
, which is different from getCoins
in that it will return coins of all types in one go. getCoins
on the other hand will only return coins of the same type each time so it shouldn't have any issues. getAllCoins
is actually not used by Stashed. I only added this method because, well, it was convenient to. I'm happy to remove it or we can tackle this as a follow up later on after Stashed migration.
crates/sui-indexer-alt-e2e-tests/tests/jsonrpc/coin/inconsistency.move
Outdated
Show resolved
Hide resolved
crates/sui-indexer-alt-e2e-tests/tests/jsonrpc/coin/pagination.move
Outdated
Show resolved
Hide resolved
feffac2
to
77055b4
Compare
2a79114
to
75d0069
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recording our conversation from today!
query = query | ||
.filter(candidates!(cp_sequence_number).le(cp_sequence_number as i64)) | ||
.filter( | ||
// Since the result is ordered by coin_balance_bucket followed by object_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Filter
on the first Index Only Scan means that we are performing a scan through the results that we may be able to avoid by utilising the index better (I think the piece we are missing for that is bounding by cp_sequence_number
in the right order w.r.t. coin_balance_bucket
and object_id
-- i.e. in between).
@@ -1352,7 +1350,28 @@ impl<'a> SuiTestAdapter { | |||
|
|||
variables.insert(format!("cursor_{idx}"), base64d); | |||
} else { | |||
variables.insert(format!("cursor_{idx}"), Base64::encode(s)); | |||
// Handle cursor strings containing @{obj_x_y} pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recording our conversation from the pair programming session: It's okay (beneficial, even) to only interpolate other variables when dealing with interpolation for cursors (i.e. pass in an empty list of cursors), and I think it's a good idea to factor this change out into its own commit, at least, but that commit should come first (because the test adapter is being actively touched by multiple people at the moment, so I would rather not increase the confusion factor there).
query = query | ||
.filter(candidates!(cp_sequence_number).le(cp_sequence_number as i64)) | ||
.filter( | ||
// Since the result is ordered by coin_balance_bucket followed by object_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about how we would probably need to invert the direction of object IDs in the index to make this as efficient as possible.
75d0069
to
3b834a1
Compare
})) = page.cursor | ||
{ | ||
query = query.filter(sql::<Bool>(&format!( | ||
r#"ROW(candidates.coin_balance_bucket, candidates.cp_sequence_number, candidates.object_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to using ROW
here and ordering by coin_balance_bucket, cp_sequence_number, object_id
all DESC a few lines later as we've discussed @amnn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using &format!(...)
, use diesel's bind
construct, so that we don't have to worry about re-implementing escaping/encoding. See below from #21119 for an example:
sui/crates/sui-indexer-alt-jsonrpc/src/api/objects/filter.rs
Lines 155 to 163 in bc3b4c6
if let Some(c) = page.cursor { | |
query = query.filter( | |
sql::<Bool>(r#"("candidates"."cp_sequence_number", "candidates"."object_id") < ("#) | |
.bind::<BigInt, _>(c.cp_sequence_number as i64) | |
.sql(", ") | |
.bind::<Bytea, _>(c.object_id.clone()) | |
.sql(")"), | |
); | |
} |
(I only learned about this recently, while working on this PR). It would look something like this for coin_balance_buckets
:
if let Some(Cursor(c)) = page.cursor {
query = query.filter(
sql::<Bool>(r#"("candidates"."coin_balance_bucket", "candidates"."cp_sequence_number", "candidates"."object_id") < ("#)
.bind::<BigInt, _>(c.coin_balance_bucket as i64)
.sql(", ")
.bind::<BigInt, _>(c.cp_sequence_number as i64)
.sql(", ")
.bind::<Bytea, _>(c.object_id.clone())
.sql(")"),
);
}
(the ROW
keyword is optional).
Updated the PR to address the second round of comments, include the index change on |
3b834a1
to
7d18700
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there -- main comments:
- I suspect
getAllCoins
is still not going to perform well because of its ordering. We need to order by coin type first, then balance bucket, etc - The index changes need to be split into their own PR to unblock landing this one (also some suggestions for making the migration idempotent and easy to run proactively).
- Various patterns/suggestions based on rpc-alt: getOwnedObjects #21119. Main ones are to use
SqlLiteral::bind
, represent the ObjectID using its DB form in the cursor, and to usecontext(...)
instead of.ok_or_else(|| anyhow!(...))
.
...indexer-alt-schema/migrations/2025-02-05-232254_coin_balance_buckets_object_id_desc/down.sql
Outdated
Show resolved
Hide resolved
...indexer-alt-schema/migrations/2025-02-05-232254_coin_balance_buckets_object_id_desc/down.sql
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
impl Coins { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it would be nice if we could structure this endpoint like the rest of them (with a response
module to hold these functions). The rest of the methods are structured that way to promote sharing (it's not possible to share response functions when they are tied to the type implementing the method), and it would be good to maintain consistency so that people can follow that pattern when they are navigating the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but decided not to do it that way because:
- I don't think the code here is long or complex enough to need to be split up into different modules. In my opinion being in different files and modules can hurt readability (personally I don't like jumping between files), unless the code is long (
transactions
's case) or shareable (objects'
's case), in which case the benefits outweigh the drawbacks, which bring me to the second point. - I don't see the response function in this case being shared. The only methods that return coin related responses are the methods in this file,
getCoins
,getAllCoins
andgetAllBalances
. And we certainly don't plan to expand jsonrpc interface so most likely no future methods will be using it.
Or do you think this needs to be done for consistency reasons, i.e. any method that has filters and pagination should follow that structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recording our offline conversation on this:
- Indeed, it's mostly to do with consistency (to make it easier for people to connect the dots going from the implementation of one RpcModule to the next).
- But it's true that splitting things across multiple files can hurt readability too.
Where we ended up was that we would follow the same patterns, but not split it up across files unless the file got large. In this case, with the absence of getAllCoins
, this translates into:
- inlining
get_coins_impl
back intoget_coins
, - renaming
get_coin_id_page
asfilter_coins
and making it a free function (in the same module), - making
coin_response
a free function (but in the same module).
crates/sui-indexer-alt-e2e-tests/tests/jsonrpc/coin/get_all_coins.snap
Outdated
Show resolved
Hide resolved
7d18700
to
2466063
Compare
Hey @amnn thanks for the review! I have addressed most of the comments in the latest commit. Specifically,
I understand there are changes in other PRs that can further improve this one (bcs cursor and your improvements to |
03b548d
to
b9e30f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small nits, but this is good to go! Agreed we can incorporate BCS cursors in a follow-up.
BadType(String, anyhow::Error), | ||
} | ||
|
||
// TODO: use bcs cursor once we have it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now available, after a rebase, I believe.
#[derive(Serialize, Deserialize, Debug, Clone)] | ||
struct BalanceCursor { | ||
#[serde(rename = "o")] | ||
object_id: ObjectID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object_id: ObjectID, | |
object_id: Vec<u8>, |
} | ||
} | ||
|
||
impl Coins { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recording our offline conversation on this:
- Indeed, it's mostly to do with consistency (to make it easier for people to connect the dots going from the implementation of one RpcModule to the next).
- But it's true that splitting things across multiple files can hurt readability too.
Where we ended up was that we would follow the same patterns, but not split it up across files unless the file got large. In this case, with the absence of getAllCoins
, this translates into:
- inlining
get_coins_impl
back intoget_coins
, - renaming
get_coin_id_page
asfilter_coins
and making it a free function (in the same module), - making
coin_response
a free function (but in the same module).
// We get all the qualified coin ids first. | ||
let coin_id_page = self.get_coin_id_page(owner, coin_type_tag, page).await?; | ||
|
||
let coin_futures = coin_id_page.data.iter().map(|id| self.coin_response(*id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is it possible to call into_iter
here instead?
candidates.field(cb::object_id), | ||
candidates.field(cb::cp_sequence_number), | ||
candidates.field(cb::coin_balance_bucket).assume_not_null(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the following work?
candidates.field(cb::object_id), | |
candidates.field(cb::cp_sequence_number), | |
candidates.field(cb::coin_balance_bucket).assume_not_null(), | |
candidates!(object_id), | |
candidates!(cp_sequence_number), | |
candidates!(coin_balance_bucket).assume_not_null(), |
})) = page.cursor | ||
{ | ||
query = query.filter( | ||
sql::<Bool>(r#"(candidates.coin_balance_bucket, candidates.cp_sequence_number, candidates.object_id) < ("#) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can drop the raw string if the string contents does not include quotes:
sql::<Bool>(r#"(candidates.coin_balance_bucket, candidates.cp_sequence_number, candidates.object_id) < ("#) | |
sql::<Bool>("(candidates.coin_balance_bucket, candidates.cp_sequence_number, candidates.object_id) < (") |
// This test is to verify that deleted coins are not included in the result of suix_getCoins. | ||
// We create two coins, of balances 12 and 34, call the rpc method to see bot of them in the results. | ||
// Then we merge the coins and call the rpc method again to see that only the merged coin with | ||
// balance 46 is in the results. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment copypasta
Description
This PR added indexer alt jsonrpc reader implementation for
getCoins
andgetAllCoins
. I ported overraw_query
from graphql crate for usage here.Test plan
Added e2e tests.
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.