Skip to content

Conversation

@alexxa
Copy link
Collaborator

@alexxa alexxa commented Sep 30, 2025

  • commit "check if the resource is present ..." is to address this error:

Transaction test succeeded.\nRunning transaction\n Preparing : 1/1 \n Erasing : bind-utils-32:9.16.23-1.el9_0.10.x86_64 ...e 0 -a 0 -ne 1 ']'\n+ pcs resource cleanup aws-vpc-move-ip\nCleaned up aws-vpc-move-ip on localhost\nConfiguration specifies 'aws-vpc-move-ip' should remain stopped\nWaiting for 1 reply from the controller\n... got reply (done)\n+ '[' 0 -ne 0 -a 0 -ne 1 ']'\n+ pcs resource delete aws-vpc-move-ip\nDeleting Resource - aws-vpc-move-ip\n+ '[' 0 -ne 0 ']'\n+ '[' 9 -lt 8 ']'\n+ pcs resource config aws-vpc-move-ip\nError: unable to find resource 'aws-vpc-move-ip'\n+ '[' 1 -eq 0 ']'\n+ for R in ${RAS}\n++ basename /usr/lib/ocf/resource.d/heartbeat/aws-vpc-route53\n+ RA=aws-vpc-route53\n+ pcs resource disable --wait=5 aws-vpc-route53\nWaiting for the cluster to apply configuration changes (timeout: 5 seconds)...\nresource 'aws-vpc-route53' is not running on any node\n+ '[' 0 -ne 0 -a 0 -ne 1 ']'\n+ pcs resource cleanup aws-vpc-route53\nCleaned up aws-vpc-route53 on localhost\nConfiguration specifies 'aws-vpc-route53' should remain stopped\nWaiting for 1 reply from the controller\n... got reply (done)\n+ '[' 0 -ne 0 -a 0 -ne 1 ']'\n+ pcs resource delete aws-vpc-route53\nError: unable to get cib\n+ '[' 1 -ne 0 ']'\n+ echo 'Cannot delete resource agent aws-vpc-route53 resource.'\n+ exit 1\n").rc

  • commit "add error handling ..." is to address this error:

yum --setopt=clean_requirements_on_remove=True -y remove bind-utils\n+ '[' 0 -ne 0 ']'\n+ yum -y install bind-utils\n+ '[' 0 -ne 0 ']'\n+ rpm -q bind-utils\n+ '[' 0 -ne 0 ']'\n+ for HAPKG in ${HAPKGS}\n+ rpm -q fence-agents-aws\n+ '[' 1 -eq 0 ']'\n+ yum -y install fence-agents-aws\nRPM: error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Resource temporarily unavailable)\nError: Could not run transaction.\n+ '[' 1 -ne 0 ']'\n+ echo 'Install of fence-agents-aws failed.'\n+ exit 1\n").rc

Summary by Sourcery

Improve error handling and idempotency in the RHEL HA AWS check script by adding installation retries, robust resource cleanup logic, and clearer messaging.

Bug Fixes:

  • Add retry logic (up to 5 attempts) for yum package installs to handle transient lock/unavailability errors.
  • Prevent script failures when disabling, cleaning up, or deleting non-existent HA resources by accepting expected exit codes and checking resource existence before deletion.

Enhancements:

  • Clarify the behavior of the --force flag on resource creation via comments and improve the removal failure message for resource config checks.

    checking for the existence of the lock file
    /var/lib/rpm/.rpm.lock and waiting for it to be released
    before proceeding
@sourcery-ai
Copy link

sourcery-ai bot commented Sep 30, 2025

Reviewer's Guide

This PR bolsters the RHEL HA AWS check script by introducing retry logic for package installations to work around transient yum lock errors and hardens the resource cleanup workflow with improved exit-code handling, existence checks, and explanatory comments.

Flow diagram for improved yum install retry logic

flowchart TD
    A["Start package install loop"] --> B["Attempt yum install ${HAPKG}"]
    B --> C{Install succeeded?}
    C -- Yes --> D["Set SUCCESS=1"]
    C -- No --> E["Print attempt failed message"]
    E --> F["Sleep 5 seconds"]
    F --> G["Increment RETRY_COUNT"]
    G --> B
    D --> H{SUCCESS == 1?}
    H -- Yes --> I["rpm -q ${HAPKG}"]
    H -- No --> J["Print install failed after retries"]
    J --> K["Exit 1"]
    I --> L["Continue script"]
Loading

Flow diagram for improved resource deletion logic

flowchart TD
    A["For each resource agent"] --> B["Disable resource"]
    B --> C{Exit code 0 or 1?}
    C -- No --> D["Print error and exit"]
    C -- Yes --> E["Cleanup resource"]
    E --> F{Exit code 0 or 1?}
    F -- No --> D
    F -- Yes --> G["Check if resource exists (pcs resource config)"]
    G --> H{Resource exists?}
    H -- Yes --> I["Delete resource"]
    I --> J{Delete succeeded?}
    J -- No --> D
    J -- Yes --> K["Show/config resource"]
    H -- No --> K
    K --> L{Resource still present?}
    L -- Yes --> D["Print removal failed and exit"]
    L -- No --> M["Continue"]
Loading

File-Level Changes

Change Details Files
Add retry logic for yum package installation to handle lock errors
  • Introduce MAX_RETRIES, RETRY_COUNT and SUCCESS flags
  • Wrap yum install in a while loop with sleep and retry messages
  • Exit with an error after multiple unsuccessful attempts
scripts/rhel-ha-aws-check.sh
Enhance HA resource management error handling and deletion checks
  • Allow pcs disable and cleanup to succeed on exit codes 0 or 1
  • Pre-check resource existence via pcs resource config before delete
  • Document force-creation warnings with inline comments
scripts/rhel-ha-aws-check.sh

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Avoid calling $? multiple times in the same conditional—capture the exit code into a variable before testing to prevent mismatched results.
  • The final check after deletion prints “Removal of ${RA} failed.” on a successful pcs resource config (i.e. the resource still exists), which is confusing; revise the conditional and message to clearly reflect a deletion failure.
  • Consider refactoring the yum retry loop into a reusable function to reduce duplication and improve readability.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Avoid calling `$?` multiple times in the same conditional—capture the exit code into a variable before testing to prevent mismatched results.
- The final check after deletion prints “Removal of ${RA} failed.” on a successful `pcs resource config` (i.e. the resource still exists), which is confusing; revise the conditional and message to clearly reflect a deletion failure.
- Consider refactoring the yum retry loop into a reusable function to reduce duplication and improve readability.

## Individual Comments

### Comment 1
<location> `scripts/rhel-ha-aws-check.sh:266-267` </location>
<code_context>
+        fi
+    done
+
+    if [ ${SUCCESS} -ne 1 ]; then
+        echo "Install of ${HAPKG} failed after multiple retries."
         exit 1
     else
         rpm -q ${HAPKG}
</code_context>

<issue_to_address>
**suggestion:** Consider logging the specific error encountered after retries fail.

Including the last error message will make it easier to diagnose why the installation failed.

```suggestion
        INSTALL_OUTPUT=$(yum -y install ${HAPKG} 2>&1)
        if [ $? -eq 0 ]; then
            SUCCESS=1
        else
            LAST_ERROR="${INSTALL_OUTPUT}"
            echo "Attempt $((RETRY_COUNT+1)) to install ${HAPKG} failed. Retrying in 5 seconds..."
            sleep 5
            RETRY_COUNT=$((RETRY_COUNT+1))
        fi
    done

    if [ ${SUCCESS} -ne 1 ]; then
        echo "Install of ${HAPKG} failed after multiple retries."
        echo "Last error encountered:"
        echo "${LAST_ERROR}"
        exit 1
    else
        rpm -q ${HAPKG}
        exit 1
    fi
```
</issue_to_address>

### Comment 2
<location> `scripts/rhel-ha-aws-check.sh:346` </location>
<code_context>
-        exit 1
+    # Check if the resource exists before attempting to delete it
+    # This prevents the script from failing if the resource is already gone.
+    pcs resource config ${RA} &> /dev/null
+    if [ $? -eq 0 ]; then
+        pcs resource delete ${RA}
+        if [ $? -ne 0 ]; then
</code_context>

<issue_to_address>
**suggestion:** Suppressing all output may hide useful error information.

Suppressing stderr may hinder troubleshooting. Consider allowing error messages to be logged or only redirecting stdout.

```suggestion
    pcs resource config ${RA} > /dev/null
```
</issue_to_address>

### Comment 3
<location> `scripts/rhel-ha-aws-check.sh:360-362` </location>
<code_context>
     fi
     if [ $? -eq 0 ]; then
-        echo "Got resource configuration for ${RA}."
+        echo "Removal of ${RA} failed."
         exit 1
     fi
 done
</code_context>

<issue_to_address>
**suggestion:** Message may be misleading if resource was never present.

Clarify the message to distinguish between a resource not found and a removal failure.

```suggestion
    if [ $? -eq 0 ]; then
        echo "Removal of ${RA} failed: resource still present."
        exit 1
    else
        echo "Resource ${RA} not found. Nothing to remove."
        exit 1
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 266 to 267
exit 1
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider logging the specific error encountered after retries fail.

Including the last error message will make it easier to diagnose why the installation failed.

Suggested change
exit 1
fi
INSTALL_OUTPUT=$(yum -y install ${HAPKG} 2>&1)
if [ $? -eq 0 ]; then
SUCCESS=1
else
LAST_ERROR="${INSTALL_OUTPUT}"
echo "Attempt $((RETRY_COUNT+1)) to install ${HAPKG} failed. Retrying in 5 seconds..."
sleep 5
RETRY_COUNT=$((RETRY_COUNT+1))
fi
done
if [ ${SUCCESS} -ne 1 ]; then
echo "Install of ${HAPKG} failed after multiple retries."
echo "Last error encountered:"
echo "${LAST_ERROR}"
exit 1
else
rpm -q ${HAPKG}
exit 1
fi

@alexxa alexxa requested a review from F-X64 September 30, 2025 10:21
Copy link
Collaborator

@F-X64 F-X64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many quick fixes in place and I'm afraid we will make the code harder to maintain long term. Added a suggestion for using functions instead (that's just a quick example).

exit 1
# Check if the resource exists before attempting to delete it
# This prevents the script from failing if the resource is already gone.
pcs resource config ${RA} &> /dev/null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we moved forward reusable functions and more readable error logging.
A simple encapsulation into a delete resource function would increase readability and maintainability.
It would also avoid having to add explaining comments to our code which in itself is a bad code smell to me.
As an example:

Suggested change
pcs resource config ${RA} &> /dev/null
delete_resource() {
local RA="$1"
if pcs resource show "$RA" &> /dev/null; then
if pcs resource delete "$RA"; then
echo "Resource '$RA' deleted successfully."
else
echo "failed to delete resource '$RA'."
exit 1
fi
else
echo "resource '$RA' not found. Skipping deletion."
fi
}
delete_resource "${RA}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants