Skip to content

Commit

Permalink
[Spark] Throw an exception on write for DV size mismatch (#4203)
Browse files Browse the repository at this point in the history
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

Right now, we are only logging for
delta.deletionVector.write.offsetMismatch. In this PR, we trigger an
exception at the time of writing the DV, so before we make a commit to
the table. It'll fail the specific query, but won't put the table in a
broken and unreadable state. We also change the log from
delta.deletionVector.write.offsetMismatch into a delta assertion.

## How was this patch tested?

Existing tests pass.

## Does this PR introduce _any_ user-facing changes?

Yes. If we detect an issue related to the file size while writing
Deletion Vector files, we will now throw an exception at the time of
writing the DV, before we make a commit to the table. This will fail the
specific query, but will prevent the table from being in an unreadable
state.
  • Loading branch information
c27kwan authored Feb 28, 2025
1 parent 64ca858 commit 55adec5
Showing 1 changed file with 7 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -223,13 +223,16 @@ class HadoopFileSystemDVStore(hadoopConf: Configuration)
checksum = DeletionVectorStore.calculateChecksum(data))

if (writtenBytes != dvRange.offset) {
recordDeltaEvent(
deltaLog = null,
opType = "delta.deletionVector.write.offsetMismatch",
deltaAssert(
writtenBytes == dvRange.offset,
name = "dv.write.offsetMismatch",
msg = s"Offset mismatch while writing deletion vector to file",
data = Map(
"path" -> path.path.toString,
"reportedOffset" -> dvRange.offset,
"calculatedOffset" -> writtenBytes))
"calculatedOffset" -> writtenBytes)
)
throw DeltaErrors.deletionVectorSizeMismatch()
}

log.debug(s"Writing DV range to file: Path=${path.path}, Range=${dvRange}")
Expand Down

0 comments on commit 55adec5

Please sign in to comment.