-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Fix MoW table size fault #40879
Comments
dataroaring
pushed a commit
that referenced
this issue
Sep 19, 2024
dataroaring
pushed a commit
that referenced
this issue
Sep 19, 2024
gavinchou
pushed a commit
that referenced
this issue
Oct 31, 2024
Issue #40879 step 2 make rowset total size = rowset data size + rowset index size
Yukang-Lian
added a commit
to Yukang-Lian/doris
that referenced
this issue
Nov 7, 2024
Issue apache#40879 step 2 make rowset total size = rowset data size + rowset index size
Yukang-Lian
added a commit
to Yukang-Lian/doris
that referenced
this issue
Nov 11, 2024
Issue apache#40879 step 2 make rowset total size = rowset data size + rowset index size
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Search before asking
Description
When merging rowsets during partial column updates in MOW tables, the
total_disk_size
was not updated, resulting in discrepancies between the data seen by FE (as observed through MySQL'sSHOW DATA
) and BE (as observed throughcurl compaction_status
). This discrepancy leads to incorrect data reporting, with FE showing more data than it should and BE showing less. This issue impacts both compute-storage separation and integrated compute-storage systems using MOW tables with partial column updates. It is particularly significant in cloud environments where user billing is involved.Solution
Update
total_disk_size
during partial column updates in the merge rowset logic to ensure that the data seen by BE and FE is consistent. Although the data may not be entirely correct, this allows the testing team to start validating the changes. The relevant commit can be found here.Clarify the relationships between
total_disk_size
,data_disk_size
, andindex_disk_size
. Review all code logic involving these sizes to ensure data accuracy.Modify the compaction logic so that the rowset output size after each compaction is accurate. Update this data on both FE and BE sides, gradually correcting the previously erroneous data to achieve data convergence.
Add a tool to verify table size data. This tool should read the rowset meta PB (ensuring accuracy) and update the data on both the FE and BE sides.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: