Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better healthcheck correctness #1056

Merged
merged 5 commits into from
Jan 25, 2022

Conversation

lamont-granquist
Copy link
Contributor

@lamont-granquist lamont-granquist commented Jan 21, 2022

closes #1051

Fixes healthchecks on AIX, Solaris and FreeBSD.

Extracts the solaris healthcheck routine out to its own method

This reworks the filtering introduced in 9551f72 to do it in pure ruby but deliberately drops the complexity of the regexps. This was responsible for breaking health checks on FreeBSD.

There are now some internal consistency checks which ensure that the output of the find command actually produces some output and that there is at least one file in the install_path, it also ensures that the "ldd" (or "otool", etc) routine finds at least one affirmatively good library. These should at least catch breakages that result in silently failing NOPs.

@lamont-granquist lamont-granquist requested a review from a team as a code owner January 21, 2022 23:50
@lamont-granquist
Copy link
Contributor Author

I'm fairly certain this will have test failures, but I'm having difficulty running them locally for some reason.

@lamont-granquist lamont-granquist force-pushed the lcg/better-healthcheck-correctness-checking branch from 0b247b8 to 09ef897 Compare January 22, 2022 01:35
@lamont-granquist
Copy link
Contributor Author

@jeremiahsnapp jeremiahsnapp added the Expeditor: Bump Version Minor Used by github.minor_bump_labels to bump the Minor version number. label Jan 24, 2022
@lamont-granquist lamont-granquist force-pushed the lcg/better-healthcheck-correctness-checking branch from d0a26be to 4fc0e40 Compare January 25, 2022 03:28
1. ensure the find command succeeds
2. ensure the find command produces output
3. ensure the find command produces output that matches the project_dir
4. ensure the ldd-like command finds at least one affirmatively good library

Signed-off-by: Lamont Granquist <[email protected]>
The linuxism of -regextype has broken healthchecks on Solaris and FreeBSD
since 2016

This applies the filters to all the distros and simplifies the code.
Support for regexps on filename suffixes has been dropped in favor of
simplicity and verbosity.

Signed-off-by: Lamont Granquist <[email protected]>
Result of a lot of hacking on CI to make it turn green

Signed-off-by: Lamont Granquist <[email protected]>
Signed-off-by: Lamont Granquist <[email protected]>
Signed-off-by: Lamont Granquist <[email protected]>
@lamont-granquist lamont-granquist force-pushed the lcg/better-healthcheck-correctness-checking branch from 4fc0e40 to 41e598a Compare January 25, 2022 18:55
@jeremiahsnapp jeremiahsnapp merged commit 22689c9 into main Jan 25, 2022
@jeremiahsnapp jeremiahsnapp deleted the lcg/better-healthcheck-correctness-checking branch January 25, 2022 18:59
# feed the list of files to the "ldd" command
#

# this command will typically fail if the last file isn't a valid lib/binary which happens often

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I may have spotted a regression; or I'm just hitting a non-deterministic issue more frequently with these code changes

The scenario is - I've just upgraded to a newer version of Omnibus, and it looks like the ldd command here can break halfway through with this current pattern. It took a while to track down things as the status code here is getting ignored, so the error was being swallowed.

Example, reading 3 files and the 2nd ELF file causes an error:

[9] pry(#<Omnibus::HealthCheck>)> puts ldd_output = shellout(ldd_command, input: "/opt/project/file1\n/opt/project/file2\n/opt/project/file3").stdout
/opt/project/file1:
	not a dynamic executable
/opt/project/file2:
=> nil

We get a partial result in the current implementation, but if you extract the status code you can see it has failed an exitstatus of 135:

[11] pry(#<Omnibus::HealthCheck>)> puts ldd_output = shellout(ldd_command, input: "/opt/project/file1\n/opt/project/file2\n/opt/project/file3").result
NoMethodError: undefined method `result' for <Mixlib::ShellOut#1230: command: 'xargs ldd' process_status: #<Process::Status: pid 46708 exit 123> stdout: '/opt/project/file1:
	not a dynamic executable
/opt/project/file2:' stderr: 'ldd: exited with unknown exit code (135)' child_pid: 46708 environment: {} timeout: 7200 user:  group:  working_dir:  >:Mixlib::ShellOut
from (pry):11:in `block in read_shared_libs'

I actually thought there was a bug in shellout(...).stdout since it looked like partial reads of stdout.

For my current setup, this new code path skips multiple healthchecks as a result of exiting earlier than expected

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pull request: #1137

adfoster-r7 added a commit to adfoster-r7/omnibus that referenced this pull request Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Expeditor: Bump Version Minor Used by github.minor_bump_labels to bump the Minor version number.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HealthCheck is busted on FreeBSD, Solaris and AIX
3 participants