Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When wrongly configuring s3-integrator Postgresql charm will get stuck in error state and needs to be redeployed #661

Open
gustavosr98 opened this issue Oct 25, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@gustavosr98
Copy link

Steps to reproduce

Deploy postgresql charm
Follow backup and restore tutorial
I forgot to set up tls-ca-chain config that was required in this case
And now postgresql charm gets stuck in error state not being able to get out of it wihout hacking inside charm hooks or redeploying it

Expected behavior

Not to get stuck in error state if the user makes a configuration mistake but to get into blocked state

Actual behavior

# juju status
[..]
postgresql/0                 active    idle   0        192.168.30.106  5432/tcp
postgresql/1*                error     idle   1        192.168.30.138  5432/tcp  hook failed: "s3-parameters-relation-changed"
postgresql/2                 active    idle   2        192.168.30.152  5432/tcp

You can NOT get out of this state even

  • By re-running the hook
  • By updating the config
  • By updating the config and re-running the hook
  • By removing the relation
  • By removing the relation and re-running the hook

Versions

Operating system: Ubuntu 22.04
Juju: 3.5.4
Charm revision: 14/stable rev468

Log output

Did not collect it at the moment

Additional context

You could hackily get out of this situation wihout redeploying by skipping the hook and then removing the relation

  1. On a terminal "A"
juju debug-hooks postgresql/1
# Now you are on a tmux session waiting for a hook to get triggered
  1. On a terminal "B"
juju resolved postgresql/1
# Trigget the hook to re-run
  1. On terminal "A"
# tmux session is waiting for you to manually execute the reaction to the s3-parameters-relation-changed hook
exit 0
# to make juju belive the hook was run okay
exit
# to exit the juju-debug session

Now the hook has been skipped and you can remove the relation
Update the config
And re-add the relation with the proper configuration

@gustavosr98 gustavosr98 added the bug Something isn't working label Oct 25, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-5713.

This message was autogenerated

@hloeung
Copy link
Contributor

hloeung commented Dec 2, 2024

Any updates on this? We're seeing it here too:

is-managed-database-prod-marketing-airbyte-marketo@is-bastion-ps6:~$ jsft
Model                                               Controller                         Cloud/Region           Version  SLA          Timestamp
is-managed-database-prod-marketing-airbyte-marketo  juju-controller-35-production-ps6  prodstack6/prodstack6  3.5.3    unsupported  21:18:19Z

App                    Version  Status  Scale  Charm                      Channel        Rev  Exposed  Message
backups-s3-integrator           active      1  s3-integrator              latest/stable   32  no
data-integrator                 active      1  data-integrator            latest/stable   41  no
landscape-client                active      6  landscape-client           latest/stable   69  no       Client registered!
postgresql             14.12    active      3  postgresql                 14/stable      468  yes
telegraf                        active      6  telegraf                   latest/stable   75  yes      Monitoring tls-certificates/0 (source version/commit 23.10)
tls-certificates                active      1  tls-certificates-operator  latest/stable   22  no
ubuntu-advantage                active      6  ubuntu-advantage           latest/stable   95  no       Attached (esm-apps,esm-infra,livepatch)

Unit                      Workload  Agent  Machine  Public address  Ports     Message
backups-s3-integrator/0*  active    idle   0        10.146.64.41
  landscape-client/2      active    idle            10.146.64.41              Client registered!
  telegraf/2              active    idle            10.146.64.41    9103/tcp  Monitoring backups-s3-integrator/0 (source version/commit 23.10)
  ubuntu-advantage/2      active    idle            10.146.64.41              Attached (esm-apps,esm-infra,livepatch)
data-integrator/1*        active    idle   2        10.146.64.46
  landscape-client/1      active    idle            10.146.64.46              Client registered!
  telegraf/1              active    idle            10.146.64.46    9103/tcp  Monitoring data-integrator/1 (source version/commit 23.10)
  ubuntu-advantage/1      active    idle            10.146.64.46              Attached (esm-apps,esm-infra,livepatch)
postgresql/0*             error     idle   3        10.146.64.39    5432/tcp  hook failed: "s3-parameters-relation-changed"
  landscape-client/3      active    idle            10.146.64.39              Client registered!
  telegraf/3              active    idle            10.146.64.39    9103/tcp  Monitoring postgresql/0 (source version/commit 23.10)
  ubuntu-advantage/3      active    idle            10.146.64.39              Attached (esm-apps,esm-infra,livepatch)
postgresql/1              waiting   idle   4        10.146.64.37    5432/tcp  Awaiting restart operation
  landscape-client/5      active    idle            10.146.64.37              Client registered!
  telegraf/5              active    idle            10.146.64.37    9103/tcp  Monitoring postgresql/1 (source version/commit 23.10)
  ubuntu-advantage/5      active    idle            10.146.64.37              Attached (esm-apps,esm-infra,livepatch)
postgresql/2              waiting   idle   5        10.146.64.43    5432/tcp  awaiting for cluster to start
  landscape-client/4      active    idle            10.146.64.43              Client registered!
  telegraf/4              active    idle            10.146.64.43    9103/tcp  Monitoring postgresql/2 (source version/commit 23.10)
  ubuntu-advantage/4      active    idle            10.146.64.43              Attached (esm-apps,esm-infra,livepatch)
tls-certificates/0*       active    idle   1        10.146.64.42
  landscape-client/0*     active    idle            10.146.64.42              Client registered!
  telegraf/0*             active    idle            10.146.64.42    9103/tcp  Monitoring tls-certificates/0 (source version/commit 23.10)
  ubuntu-advantage/0*     active    idle            10.146.64.42              Attached (esm-apps,esm-infra,livepatch)

With the hook failure logs:

unit-postgresql-0: 21:12:06 DEBUG unit.postgresql/0.juju-log s3-parameters:19: Endpoint provider result: https://radosgw.ps6.canonical.com/is-managed-database-prod-marketing-airbyte-marketo-backup
unit-postgresql-0: 21:12:06 DEBUG unit.postgresql/0.juju-log s3-parameters:19: Selecting from endpoint provider's list of auth schemes: "sigv4". User selected auth scheme is: "None"
unit-postgresql-0: 21:12:06 DEBUG unit.postgresql/0.juju-log s3-parameters:19: Selected auth type "v4" as "v4" with signing context params: {'region': 'us-east-1', 'signing_name': 's3', 'disableDoubleEncoding': True}
unit-postgresql-0: 21:12:06 ERROR unit.postgresql/0.juju-log s3-parameters:19: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-postgresql-0/charm/./src/charm.py", line 1866, in <module>
    main(PostgresqlOperatorCharm)
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/ops/main.py", line 553, in main
    manager.run()
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/ops/main.py", line 529, in run
    self._emit()
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/ops/main.py", line 518, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name, self._juju_context)
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/ops/main.py", line 139, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/ops/framework.py", line 347, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-postgresql-0/charm/lib/charms/data_platform_libs/v0/s3.py", line 768, in _on_relation_changed
    getattr(self.on, "credentials_changed").emit(
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/ops/framework.py", line 347, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-postgresql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 735, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-postgresql-0/charm/src/backups.py", line 657, in _on_s3_credential_changed
    self._create_bucket_if_not_exists()
  File "/var/lib/juju/agents/unit-postgresql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 735, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-postgresql-0/charm/src/backups.py", line 294, in _create_bucket_if_not_exists
    bucket.create(CreateBucketConfiguration={"LocationConstraint": region})
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/boto3/resources/factory.py", line 581, in do_action
    response = action(self, *args, **kwargs)
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/boto3/resources/action.py", line 88, in __call__
    response = getattr(parent.meta.client, operation_name)(*args, **params)
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/botocore/client.py", line 569, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/botocore/client.py", line 980, in _make_api_call
    request_dict = self._convert_to_request_dict(
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/botocore/client.py", line 1047, in _convert_to_request_dict
    request_dict = self._serializer.serialize_to_request(
  File "/var/lib/juju/agents/unit-postgresql-0/charm/venv/botocore/validate.py", line 381, in serialize_to_request
    raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid type for parameter CreateBucketConfiguration.LocationConstraint, value: None, type: <class 'NoneType'>, valid types: <class 'str'>
unit-postgresql-0: 21:12:06 ERROR juju.worker.uniter.operation hook "s3-parameters-relation-changed" (via hook dispatching script: dispatch) failed: exit status 1

@hloeung
Copy link
Contributor

hloeung commented Dec 4, 2024

Turns out, my issue is due to not set region. I've set it but then ran into #689

It would be nice if the charm could handle this better and allow to change or set the region rather than the workaround of removing the relation, resolve --no-retry, then re-adding the relation.

@lucasgameiroborges
Copy link
Member

Hey @hloeung ! Thank your for opening a bug report!

There is a possible fix under review at #701

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants