Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid re-resolving column names after analysis #25240

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

martint
Copy link
Member

@martint martint commented Mar 6, 2025

Record the fields directly and extract the column name from them when needed.

Follow up to https://github.com/trinodb/trino/pull/24055/files#r1952735457

cc @kokosing @ksobolew

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

@cla-bot cla-bot bot added the cla-signed label Mar 6, 2025
@martint martint force-pushed the masks branch 2 times, most recently from 898190f to ef453bb Compare March 7, 2025 00:49
Record the fields directly and extract the column name from them when needed.
Copy link
Contributor

@ksobolew ksobolew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely looks like a better solution and one small step towards #17, thanks :) In terms of performance overhead, though, it looks like a wash - one linear algorithm gets replaced by some other linear processing paths. But I guess we can't avoid that.

Comment on lines +889 to +899
Set<Field> fields = metadata.getTableSchema(session, tableHandle)
.columns().stream()
.map(column -> Field.newQualified(
node.getTableName(),
Optional.of(column.getName()),
column.getType(),
column.isHidden(),
Optional.of(tableName),
Optional.of(column.getName()),
false))
.collect(Collectors.toSet());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, that PR is what I started doing, but this is the place where I got into trouble, because I didn't know where to get a Field instance that I would put into the collection of referenced columns.

.map(Expression::toString)))
.distinct()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why distinct was moved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously .distinct() worked on String, so it made sense to do it early. Now we're processing Field instances, which are not comparable, so .distinct() is moved to a later stage, where we convert them to ColumnInfo, which are comparable.

columns.stream()
.map(Field::getOriginColumnName)
.map(Optional::get)
.collect(Collectors.toSet()))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static import, immutableSet?

@@ -46,4 +47,19 @@ public Optional<String> getMask()
{
return mask;
}

@Override
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we migrate this class to record?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an SPI class, although marked as @Unstable, so we can migrate it to record, but still we should do it carefully

Optional.of(tableName),
Optional.of(column.getName()),
false))
.collect(Collectors.toSet());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

immutable set and static import?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is a convention is this class, not sure we should be changing that (at this time)

@kokosing
Copy link
Member

kokosing commented Mar 7, 2025

In terms of performance overhead, though, it looks like a wash - one linear algorithm gets replaced by some other linear processing paths.

@ksobolew what do you mean? I see resolveColumnMask is gone now so there is no linear search anymore. What am I missing?

@ksobolew
Copy link
Contributor

ksobolew commented Mar 7, 2025

@ksobolew what do you mean? I see resolveColumnMask is gone now so there is no linear search anymore. What am I missing?

Now we linearly converts columns to Fieldin Analyzer and StatementAnalyzer instead.

@kokosing
Copy link
Member

kokosing commented Mar 7, 2025

Yes, but now you are doing this loop only once, and then any lookup in the analysis is constant.

@ksobolew
Copy link
Contributor

ksobolew commented Mar 7, 2025

Yes, but now you are doing this loop only once, and then any lookup in the analysis is constant.

I guess you're right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants