-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Event funnel queries are slow #282
Comments
The performance is even worse when no event matches the criteria specified in step 1 and step 2. No idea why. |
I had a try at speeding this up with a JavaScript UDF, I figured I might be able to reduce the number of time spent evaluating different regexes that way. Slot time seems to go down from ~28 minutes to 7 minutes which is a nice improvement, though the numbers seem to be a bit off. We'd need to rewrite our regex generation if we took this approach, I'm doing some hacky stuff below but it's not generating quite the right results: CREATE TEMP FUNCTION
udf_js_regex_match (regex STRING, input STRING)
RETURNS int64
LANGUAGE js AS """
if (input == null || !input.length) {
return 0;
}
re = new RegExp(regex.replace(/\\Q/g, '').replace(/\\E/g, '').replace('(?:.*?)', '.*').replace(/\\?:/g, '').slice(1,-2) + "?)");
let match = input.match(re)
return match ? match.filter(m=>m !== undefined).length : 0;
""";
WITH
step_1 AS (
SELECT
mozfun.event_analysis.aggregate_match_strings( ARRAY_AGG( CONCAT(property_value.value, mozfun.event_analysis.event_index_to_match_string(et.INDEX)))) AS match_string
FROM
`mozdata.fenix.event_types` AS et
CROSS JOIN
UNNEST(event_properties) AS properties
CROSS JOIN
UNNEST(properties.value) AS property_value
WHERE
(category = 'events')
AND (event = 'app_opened')
AND 1=1 -- no filter on 'step_1.property_value'
),
step_2 AS (
SELECT
mozfun.event_analysis.aggregate_match_strings( ARRAY_AGG( CONCAT(property_value.value,mozfun.event_analysis.event_index_to_match_string(et.INDEX)))) AS match_string
FROM
`mozdata.fenix.event_types` AS et
CROSS JOIN
UNNEST(event_properties) AS properties
CROSS JOIN
UNNEST(properties.value) AS property_value
WHERE
(category = 'context_menu')
AND (event = 'item_tapped')
AND (properties.key = 'named')
AND (property_value.key = 'share_link')),
results as (
SELECT
udf_js_regex_match(mozfun.event_analysis.create_funnel_regex([step_1.match_string, step_2.match_string], TRUE), funnel_analysis.events) AS funnel_analysis_count_completed_step_2
FROM
`mozdata.fenix.events_daily` AS funnel_analysis
CROSS JOIN
step_1
CROSS JOIN
step_2
WHERE
funnel_analysis.submission_date = DATE('2021-10-18')
)
select countif(funnel_analysis_count_completed_step_2 > 1) as step1, countif(funnel_analysis_count_completed_step_2 > 2) as step2 from results |
Update: we have opened a support request with GCP to diagnose and hopefully provide some recommendations |
Google looked into it, but basically reproduced the same set of findings that are already in this issue: repeated use of regular expressions in BigQuery is (relatively) slow. |
Feedback from Google suggests that the query as I'm testing it is only using a small number of slots (87) which may be partly responsible for the slowness. I think we're using on-demand pricing with GCP + Looker, which might limit the number of slots allocated to this query. Something to bear in mind if this comes up as a big issue for people. |
The following query (generated by the event funnel explore) is slower than expected:
According to BigQuery, it's only processing 2.4gb worth of data but when I run it in the console (in the mozdata project), it takes a full 21.4 seconds to run. Looking at the query profile, this segment sticks out:
It appears this corresponds to the REGEXP_CONTAINS part in the latter part of the query. It seems most likely that it's the repeated calls to REGEXP_CONTAINS that are slow.
The text was updated successfully, but these errors were encountered: