Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Fix MoW table size fault #40879

Closed
3 tasks done
Yukang-Lian opened this issue Sep 14, 2024 · 0 comments
Closed
3 tasks done

[Enhancement] Fix MoW table size fault #40879

Yukang-Lian opened this issue Sep 14, 2024 · 0 comments

Comments

@Yukang-Lian
Copy link
Collaborator

Search before asking

  • I had searched in the issues and found no similar issues.

Description

When merging rowsets during partial column updates in MOW tables, the total_disk_size was not updated, resulting in discrepancies between the data seen by FE (as observed through MySQL's SHOW DATA) and BE (as observed through curl compaction_status). This discrepancy leads to incorrect data reporting, with FE showing more data than it should and BE showing less. This issue impacts both compute-storage separation and integrated compute-storage systems using MOW tables with partial column updates. It is particularly significant in cloud environments where user billing is involved.

Solution

  1. Update total_disk_size during partial column updates in the merge rowset logic to ensure that the data seen by BE and FE is consistent. Although the data may not be entirely correct, this allows the testing team to start validating the changes. The relevant commit can be found here.

  2. Clarify the relationships between total_disk_size, data_disk_size, and index_disk_size. Review all code logic involving these sizes to ensure data accuracy.

  3. Modify the compaction logic so that the rowset output size after each compaction is accurate. Update this data on both FE and BE sides, gradually correcting the previously erroneous data to achieve data convergence.

  4. Add a tool to verify table size data. This tool should read the rowset meta PB (ensuring accuracy) and update the data on both the FE and BE sides.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

gavinchou pushed a commit that referenced this issue Oct 31, 2024
Issue #40879 step 2
make rowset total size = rowset data size + rowset index size
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Nov 7, 2024
Issue apache#40879 step 2
make rowset total size = rowset data size + rowset index size
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this issue Nov 11, 2024
Issue apache#40879 step 2
make rowset total size = rowset data size + rowset index size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant