Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Add merge_maps function #16461

Merged
merged 6 commits into from
Nov 29, 2022

Conversation

TylerHelmuth
Copy link
Member

Description:
Adds a new function that is able to gracefully merge a source map into a target map. If target already contains a key from source, the source's value is used.

For efficiency, I opted for a Function for this capability instead of a Factory Function combined with set for efficiency.

Link to tracking Issue:
Related to #9410
Related to #14938

Testing:
added unit tests

Documentation:
updated function doc

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that making this a function instead of a factory function is the right move in terms of efficiency for probably the majority of use cases. However, I can also see more complex uses arising for transforming maps, like KeepKeys(MergeMaps(ParseJSON(body), resource.attributes), "http_.*"). One way to do both could be to have factory function versions of some functions, like merge_maps and MergeMaps, but that risks being confusing to users. Alternatively, we could sacrifice some efficiency and standardize on using factory functions with set, or just forgo the factory function versions for now until there are requests for more complex functionality. What do you think?

if err != nil {
return nil, err
}
if targetMap, ok := targetVal.(pcommon.Map); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily an issue that needs to be solved in this PR, but I think we should do something to notify the user that they've provided an invalid value. Should we emit a debug log if either of the values aren't maps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya this is a problem with lots of our functions. Need to figure out a good way to guarantee access to the telemetrysettings.Logger in all functions. We could enforce telemetrysettings to always be the first param as a start.

@@ -238,6 +239,27 @@ Examples:

- `limit(resource.attributes, 50, ["http.host", "http.method"])`

### merge_maps

`merge_maps(target, source)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just "merge" and say for the moment only works with maps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try this out. Is slice the only other type that we'd support in a merge function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu updated the name to merge, but with that addition of the strategy param, is there another type that you think we'll be supporting in a merge function in the future?

@TylerHelmuth
Copy link
Member Author

@evan-bradley I think in the situation of KeepKeys(MergeMaps(ParseJSON(body), resource.attributes), "http_.*"), to take advantage of the result of keep keys you still have to then call set, so the final result is

set(resource.attributes, KeepKeys(MergeMaps(ParseJSON(body), resource.attributes), "http_.*"))

If I am counting correctly thats 1 loop inside MergeMaps and a new map created, 1 loop inside KeepKeys and a new map created, and then 1 loop inside set to copy the new value. Since Factory Functions are pure (since they don't know the source of the input) both KeepKeys and MergeMaps have to create new maps and can't modify their inputs.

If we go with merge_map then the same result, spread over 2 statements, looks like

merge_maps(resource.attributes, ParseJSON(body))
set(resource.attributes, KeepKeys(resource.attributes, "http_.*"))

I think that results in 1 loop within merge_maps, without having to create a new map, 1 loop instead KeepKeys and a new map created, and then 1 loop inside set to copy the new value. We may also be able to hold off on a function like KeepKeys if delete_matching_keys can suffice. So I think this ends but with a pretty good performance gain. Downside is that it doesn't let you keep resource.attributes as the default values so I think merge_maps will need an extra param that specifies who wins in a tie.

Overall going the Factory Function route would mean more flexibility, but I kinda like the simplicity of merge_maps and its performance gains. What do you think?

@TylerHelmuth
Copy link
Member Author

@djaglowski @evan-bradley @bogdandrutu @kentquirk ready for re-review.

@evan-bradley
Copy link
Contributor

Overall going the Factory Function route would mean more flexibility, but I kinda like the simplicity of merge_maps and its performance gains.

I definitely agree here, just wanted to bring up a few options. I think this implementation is the best route for now, we can consider additional paths forward if we need more flexibility.

@TylerHelmuth
Copy link
Member Author

@evan-bradley thanks for the great discussion. I think the more log transformations we enable the more times we're gonna have to decide whether we want to chain factory functions or if multiple statements can handle the use case.

Once conditions get involved its nice to be able to do things in a single statement so that the condition doesn't have to be spread between multiple statements. Could be a reason to allow OTTL's grammar to support multiple Invocations per condition.

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I think we should probably keep the name as merge_maps considering the merge strategies don't make sense for slices, and having disjoint sets of strategies in a single function signature seems like a likely source of confusion.

@github-actions github-actions bot requested a review from bogdandrutu November 28, 2022 23:32
@github-actions github-actions bot requested a review from bogdandrutu November 28, 2022 23:40
@djaglowski
Copy link
Member

I like the strategy parameter and associated values. An additional aspect of merging to consider is recursion. Should we have a way to support the following?

target:
  a: 1
  b: 2
  details:
    important: foo

source:
  b: 3
  c: 4
  details:
    not_important: bar

result:
  a: 1
  b: 3
  c: 4
  details:
    important: foo
    not_important: bar  

@TylerHelmuth
Copy link
Member Author

@djaglowski that's a good point. How does Stanza handle this?

@djaglowski
Copy link
Member

It doesn't actually, just works like upsert at the immediate level only.

I don't feel strongly that we should support it. It introduces a lot of complexity and it's not clearly needed.

@TylerHelmuth
Copy link
Member Author

@djaglowski In that case I'd vote for moving this forward as is for now and supporting that capability once we have a specific request/need.

@djaglowski djaglowski merged commit 8890693 into open-telemetry:main Nov 29, 2022
@TylerHelmuth TylerHelmuth deleted the ottl-merge-maps branch November 29, 2022 02:23
shalper2 pushed a commit to shalper2/opentelemetry-collector-contrib that referenced this pull request Dec 6, 2022
@plantfansam plantfansam mentioned this pull request Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants