Skip to content

Conversation

@pooknull
Copy link
Contributor

@pooknull pooknull commented Nov 21, 2025

K8SPXC-1748 Powered by Pull Request Badge

https://perconadev.atlassian.net/browse/K8SPXC-1748

DESCRIPTION

Problem:
When PITR is enabled, the collector runs the following DDL statements each time it starts:

  • CREATE FUNCTION IF NOT EXISTS get_last_record_timestamp_by_binlog RETURNS INTEGER SONAME 'binlog_utils_udf.so'
  • CREATE FUNCTION IF NOT EXISTS get_gtid_set_by_binlog RETURNS STRING SONAME 'binlog_utils_udf.so'
  • CREATE FUNCTION IF NOT EXISTS get_first_record_timestamp_by_binlog RETURNS INTEGER SONAME 'binlog_utils_udf.so'

Galera treats it as a replicated DDL operation and engages TOI

Solution:
These statements should be executed in the pxc-entrypoint. The collector should verify that these functions exist and create them if they do not (as a safeguard).

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PXC version?
  • Does the change support oldest and newest supported Kubernetes version?

Copy link
Contributor

@egegunes egegunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pooknull we need to create these functions in entrypoint too so there won't be any DDL when you enable PiTR on a running cluster.

handling these in the collector is to cover an edge case where users have an old cluster with pitr disabled and they want to enable it after upgrading to v1.19.0. if we just create these functions in entrypoint, that'd create problems.

Comment on lines +362 to +369
err := p.db.QueryRowContext(ctx, `SELECT 1 FROM mysql.func WHERE name = ? LIMIT 1`, functionName).Scan(&x)
if err != nil && !errors.Is(err, sql.ErrNoRows) {
return errors.Wrapf(err, "check if function %s exists", functionName)
}
if err == nil {
log.Printf("function %s already exists", functionName)
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we are using the IF NOT EXISTS on the subsequent query, is this check regarding the existence of the function necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the whole point. CREATE IF NOT EXISTS still counts as DDL

@pooknull pooknull requested review from egegunes and gkech November 24, 2025 11:04
_, err = p.db.ExecContext(ctx, "CREATE FUNCTION IF NOT EXISTS get_first_record_timestamp_by_binlog RETURNS INTEGER SONAME 'binlog_utils_udf.so'")
if err != nil {
return errors.Wrap(err, "create function get_first_record_timestamp_by_binlog")
createQ := fmt.Sprintf("CREATE FUNCTION IF NOT EXISTS %s RETURNS %s SONAME 'binlog_utils_udf.so'", functionName, returnType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to wrap this with SET SESSION wsrep_on = OFF; and SET SESSION wsrep_on = ON;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 474 to 478
echo "CREATE FUNCTION IF NOT EXISTS get_last_record_timestamp_by_binlog RETURNS INTEGER SONAME 'binlog_utils_udf.so'" | "${mysql[@]}"

echo "CREATE FUNCTION IF NOT EXISTS get_gtid_set_by_binlog RETURNS STRING SONAME 'binlog_utils_udf.so'" | "${mysql[@]}"

echo "CREATE FUNCTION IF NOT EXISTS get_first_record_timestamp_by_binlog RETURNS INTEGER SONAME 'binlog_utils_udf.so'" | "${mysql[@]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see a lot of test failures with the error full cluster crash detected. probably these statements are the reason. also do we have these in mysql 5.7?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egegunes
Copy link
Contributor

@pooknull please add description

@pooknull pooknull requested a review from egegunes November 25, 2025 12:36
fi
set -x

if [ "$MYSQL_VERSION" == '8.0' ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's check for 8.4 as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pooknull pooknull requested a review from egegunes November 27, 2025 12:10
@JNKPercona
Copy link
Collaborator

Test Name Result Time
auto-tuning-8-0 passed 00:00:00
allocator-8-0 passed 00:00:00
backup-storage-tls-8-0 passed 00:00:00
cross-site-8-0 passed 00:00:00
custom-users-8-0 passed 00:00:00
demand-backup-cloud-8-0 passed 00:00:00
demand-backup-encrypted-with-tls-8-0 passed 00:00:00
demand-backup-8-0 passed 00:00:00
demand-backup-flow-control-8-0 passed 00:00:00
demand-backup-parallel-8-0 passed 00:00:00
demand-backup-without-passwords-8-0 passed 00:00:00
haproxy-5-7 passed 00:00:00
haproxy-8-0 passed 00:00:00
init-deploy-5-7 passed 00:00:00
init-deploy-8-0 passed 00:00:00
limits-8-0 passed 00:00:00
monitoring-2-0-8-0 passed 00:00:00
monitoring-pmm3-8-0 passed 00:00:00
one-pod-5-7 passed 00:00:00
one-pod-8-0 passed 00:00:00
pitr-8-0 failure 00:27:28
pitr-gap-errors-8-0 passed 00:00:00
proxy-protocol-8-0 passed 00:00:00
proxy-switch-8-0 passed 00:00:00
proxysql-sidecar-res-limits-8-0 passed 00:00:00
proxysql-scheduler-8-0 passed 00:00:00
pvc-resize-5-7 passed 00:00:00
pvc-resize-8-0 passed 00:00:00
recreate-8-0 passed 00:00:00
restore-to-encrypted-cluster-8-0 passed 00:00:00
scaling-proxysql-8-0 passed 00:00:00
scaling-8-0 passed 00:00:00
scheduled-backup-5-7 passed 00:00:00
scheduled-backup-8-0 passed 00:00:00
security-context-8-0 passed 00:00:00
smart-update1-8-0 failure 00:03:58
smart-update2-8-0 failure 00:05:01
storage-8-0 passed 00:00:00
tls-issue-cert-manager-ref-8-0 passed 00:00:00
tls-issue-cert-manager-8-0 passed 00:00:00
tls-issue-self-8-0 passed 00:00:00
upgrade-consistency-8-0 passed 00:00:00
upgrade-haproxy-5-7 passed 00:00:00
upgrade-haproxy-8-0 passed 00:00:00
upgrade-proxysql-5-7 passed 00:00:00
upgrade-proxysql-8-0 passed 00:00:00
users-5-7 passed 00:00:00
users-8-0 passed 00:00:00
validation-hook-8-0 passed 00:00:00
Summary Value
Tests Run 49/49
Job Duration 01:14:03
Total Test Time 00:36:29

commit: 5484528
image: perconalab/percona-xtradb-cluster-operator:PR-2264-54845288

Copy link
Collaborator

@hors hors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pooknull @egegunes We have a problem with smart-update tests - full cluster crash detected

-----------------------------------------------------------------------------------
wait for running cluster
-----------------------------------------------------------------------------------

++ seq 0 2
+ for i in '$(seq 0 $last_pod)'
+ wait_pod smart-update-pxc-0 480
+ local pod=smart-update-pxc-0
+ local max_retry=480
+ local ns=
++ echo smart-update-pxc-0
++ /usr/bin/sed -E 's/.*-(pxc|proxysql)-[0-9]/\1/'
++ grep -E '^(pxc|proxysql)$'
+ local container=pxc
+ set +o xtrace
pod/smart-update-pxc-0 condition met
waiting for pod/smart-update-pxc-0 to become Ready.full cluster crash detected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M 30-99 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants