Skip to content

chore: fix typos #17135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

chore: fix typos #17135

wants to merge 19 commits into from

Conversation

waynexia
Copy link
Member

@waynexia waynexia commented Aug 12, 2025

Signed-off-by: Ruihang Xia [email protected]## Which issue does this PR close?

Rationale for this change

From #16859 (comment), then I think why not add an automatic checker

What changes are included in this PR?

A typo checker, and fix all existing typos (or allowlist them). It runs like https://github.com/waynexia/arrow-datafusion/actions/runs/16897471061/job/47869907355

Are these changes tested?

no

Are there any user-facing changes?

API change:

SplitMetrics::batches_splitted field is renamed to SplitMetrics::batches_split (grammar typo #17135 (comment))

waynexia added 10 commits August 8, 2025 08:18
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
@github-actions github-actions bot added documentation Improvements or additions to documentation sql SQL Planner development-process Related to development process of DataFusion logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate substrait Changes to the substrait crate catalog Related to the catalog crate common Related to common crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate labels Aug 12, 2025
typos.toml Outdated
alph = "alph"
wih = "wih"
Ded = "Ded"
Serializeable = "Serializeable"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a public interface so I added it to allowlist

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the checker have a way to exclude given work at given use place? or only global excludes are available?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can specify the scope https://github.com/crate-ci/typos#false-positives. This might be useful to those from test text.

typos.toml Outdated
Comment on lines 2 to 27
Pn = "Pn"
fo = "fo"
flate = "flate"
nd = "nd"
Nd = "Nd"
YOUY = "YOUY"
typ = "typ"
ba = "ba"
lits = "lits"
ECT = "ECT"
Ue = "Ue"
Iy = "Iy"
hte = "hte"
numer = "numer"
abd = "abd"
aroun = "aroun"
carefull = "carefull"
abov = "abov"
Ois = "Ois"
alo = "alo"
precentage = "precentage"
datas = "datas"
hom = "hom"
alph = "alph"
wih = "wih"
Ded = "Ded"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of them are from tests or benchmarks

Signed-off-by: Ruihang Xia <[email protected]>
name: Spell Check with Typos
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pin to a fixed revision to prevent it from breaking unexpectedly because of a dictionary update. And use an unversioned new commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if "unversioned new commit." is something not released yet, then let's please use a tagged version

See also #17046 (comment)
I believe there is no good reason to pin precise commits for GitHub's own internal actions such as actions/checkout.

Copy link
Contributor

@zhuqi-lucas zhuqi-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM , thank you @waynexia , very good improvement to me!

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! ❤️

This PR is quite a lot of changes.
Could you please send typo fixes in PR separate from spellchecker workflow?
Typo fixes require low scrutiny (mostly checks for API components), which is different from GH actions.

name: Spell Check with Typos
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if "unversioned new commit." is something not released yet, then let's please use a tagged version

See also #17046 (comment)
I believe there is no good reason to pin precise commits for GitHub's own internal actions such as actions/checkout.

- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
with:
persist-credentials: false
- uses: crate-ci/typos@master
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a 3rd party action maintained at https://github.com/crate-ci/typos?
Is it already ASF approved?
If yes, this must pin to a particular commit hash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool file! Is there ASF-maintained workflow or action to validate our workflows using the approved_patterns file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very sure about this part... actually I doubt how this is enforced 🙈

@waynexia
Copy link
Member Author

Thank you!

I removed the job from this PR. I'll check if it's permitted while waiting for this PR to merge, and file another one dedicated for the checker

@findepi findepi changed the title ci: add typo checker chore: fix typos Aug 13, 2025
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % few places.

@@ -403,7 +403,7 @@ async fn run_aggregate_test(input1: Vec<RecordBatch>, group_by_columns: Vec<&str
Left Plan:\n{}\n\
Right Plan:\n{}\n\
schema:\n{schema}\n\
Left Ouptut:\n{}\n\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@@ -192,13 +192,14 @@ impl AsyncFuncExpr {
);
}

let datas = ColumnarValue::values_to_arrays(&result_batches)?
let data_vec = ColumnarValue::values_to_arrays(&result_batches)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no such word datas, but it's likely as meaningful as data_vec
it could be allowed (no need to change anything)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I then found there are many datas so I added it to the allowlist... will revert this as well

@@ -173,15 +173,15 @@ impl SpillMetrics {
#[derive(Debug, Clone)]
pub struct SplitMetrics {
/// Number of times an input [`RecordBatch`] was split
pub batches_splitted: Count,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an api change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh yes. Do we need to extract a dedicated PR for breaking changes? (this one and the ignored Serializeable)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need a dedicated PR. We could add a note in the upgrade guide.

Given that these are such small changes, maybe just adding the api-change label so they are highlighted in the release notes would be good enough

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I also added a note to PR description 👍

Comment on lines +741 to +742
#[case::missing_assignment_target("UPDATE person SET doesnotexist = true")]
#[case::missing_assignment_expression("UPDATE person SET age = doesnotexist + 42")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was clearly a typo. In rstest? In our code? Did it work, or should we just delete these lines?

I would feel better if this change is backed out from this PR, because I don't understand its consequences

Suggested change
#[case::missing_assignment_target("UPDATE person SET doesnotexist = true")]
#[case::missing_assignment_expression("UPDATE person SET age = doesnotexist + 42")]
#[case::missing_assignement_target("UPDATE person SET doesnotexist = true")]
#[case::missing_assignement_expression("UPDATE person SET age = doesnotexist + 42")]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from our side. These strings will become a test case name.

image

typos.toml Outdated
nd = "nd"
Nd = "Nd"
YOUY = "YOUY"
typ = "typ"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is perhaps fine as type is reserved. worth adding a comment line that typ stands for type

typos.toml Outdated
numer = "numer"
abd = "abd"
aroun = "aroun"
carefull = "carefull"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of these words are either typos, or some arbitrary abbreviations.

Let's not add this file in this PR.
Let's add it in a PR that adds the checking workflow.

typos.toml Outdated
alph = "alph"
wih = "wih"
Ded = "Ded"
Serializeable = "Serializeable"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the checker have a way to exclude given work at given use place? or only global excludes are available?

@findepi findepi added the api change Changes the API exposed to users of the crate label Aug 13, 2025
@alamb
Copy link
Contributor

alamb commented Aug 15, 2025

Is this one ready to merge? It looks like there are some conflicts to resolve and some unresolved comments

@waynexia
Copy link
Member Author

Yes it's close. I'll address conflicts and comments tomorrow. Got some issues connecting to my server now 🥲

waynexia and others added 6 commits August 19, 2025 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate catalog Related to the catalog crate common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate sql SQL Planner substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants