Skip to content

SRE-3704 ci: adapt NLT pipeline to cover Fault Injection testing procedure.#516

Open
grom72 wants to merge 15 commits intomasterfrom
grom72/SRE-3704
Open

SRE-3704 ci: adapt NLT pipeline to cover Fault Injection testing procedure.#516
grom72 wants to merge 15 commits intomasterfrom
grom72/SRE-3704

Conversation

@grom72
Copy link
Copy Markdown
Contributor

@grom72 grom72 commented May 8, 2026

This PR introduces logic that simplifies Fault Injection testing stage setup in the Jenkinsfile.
Requires:

NLT Fault Injection testing is fully implemented using the unitTest/unitTestPost Groovy
procedures from pipeline-lib:

    steps {
        job_step_update(
            unitTest(timeout_time: 600,
                     inst_repos: daosRepos(),
                     test_script: 'ci/unit/test_nlt.sh --memcheck no' +
                                  ' --system-ram-reserved 4 --server-debug WARN' +
                                  ' --log-usage-import nltr.json' +
                                  ' --log-usage-save nltr.xml' +
                                  ' --class-name fault-injection fi',
                     unstash_opt: true,
                     unstash_tests: false,
                     inst_rpms: unitPackages(target: 'el9') + ' daos-client-tests',
                     image_version: 'el9.7'))
    }
    post {
        always {
            unitTestPost artifacts: ['nlt_logs/'],
                         testResults: 'nlt-junit.xml',
                         always_script: 'ci/unit/test_nlt_post.sh',
                         valgrind_stash: 'fault-inject-valgrind',
                         nlt_name: 'Fault injection issues'
            ...
        }
    }

The stage is given a new name (NLT Fault Injection testing) so it is not confused with
the existing Fault injection testing stage.

parseStageInfo.groovy:

  • Detect the NLT Fault Injection stage by name (case-insensitive) and set the FI flag.
  • When FI is set, skip valgrind configuration — NLT FI runs with --memcheck no and
    does not produce memcheck XML files.
  • When FI is not set (plain NLT), keep the existing valgrind_pattern/with_valgrind
    setup unchanged.

unitTest.groovy:

  • Pass environment: "VM_CPUS=20" to provisionNodes for all NLT stages; the NLT test
    suite requires at least 20 CPU cores to run reliably.
  • Move the results stash into a finally block so ignore_failure is always written to
    the stash even if afterTest() throws. Without this, unitTestPost() reads the earlier
    stash from runTest() which lacks the ignore_failure key, causing
    allowEmptyArchive: null and a NullPointerException in ArtifactArchiver.
  • Remove config['NLT'] from the Valgrind check condition — NLT FI does not run with
    memcheck, so the Valgrind path should only be taken when with_valgrind is set.
  • Guard Valgrind tarball creation with a fileExists() check to avoid a shell error when
    the copy step produced no output directory.

unitTestPost.groovy:

  • Replace all direct results['ignore_failure'] map accesses with
    .get('ignore_failure', false) to prevent NPE when the key is absent in the stash.
  • Replace the single tool: with a tools: list (nltTools) so the Fault Injection
    stage can report both nlt-errors.json and nlt-client-leaks.json as separate issue
    sources under recordIssues.
  • Set recordIssues name: dynamically to 'Fault injection' or 'NLT' based on the
    FI flag, replacing the hard-coded 'Node local testing' label.
  • Archive the memcheck tarball directly via archiveArtifacts rather than appending it
    to the deferred artifact list, consistent with how other binary artifacts are handled.
  • Remove the NLT condition from the with_valgrind block — same rationale as unitTest.

skipStage.groovy:

  • Add 'NLT Fault injection testing' as a recognized case so the new stage name is
    handled identically to 'Fault injection testing' when evaluating skip conditions.

grom72 added 12 commits May 8, 2026 13:32
Fix NullPointerException when allowEmptyArchive receives null
from missing ignore_failure key

When unitTest() throws inside afterTest(), the results_map stash written
by runTest() (which lacks the ignore_failure key) is the only stash
available to unitTestPost(). Reading results['ignore_failure'] on a Map
with no such key returns null in Groovy. Passing null to
archiveArtifacts(allowEmptyArchive: null) causes a NullPointerException
in Java reflection (unboxBoolean) because ArtifactArchiver.allowEmptyArchive
is a primitive boolean.

Root cause fix (unitTest.groovy):
- Wrap afterTest() in a try/finally block so that the updated stash
  containing ignore_failure is always written, even if afterTest() throws.

Defensive fix (unitTestPost.groovy):
- Replace all results['ignore_failure'] accesses with
  results.get('ignore_failure', false) to guard against any future
  code path where the stash is incomplete.
Both 'NLT' and 'NLT Fault injection testing' stages have their
stage_info['NLT'] set to true by parseStageInfo() because both stage
names contain the string 'NLT'. As a result unitTestPost() calls
recordIssues with the hardcoded id: 'VM_test' for both stages in the
same build, causing:

  IllegalStateException: ID VM_test is already used by another action

Fix by replacing the hardcoded 'VM_test' with
sanitizedStageName() + '_VM_test', which produces a unique ID per
stage (e.g. 'NLT_VM_test' and 'NLT_Fault_injection_testing_VM_test').
When a stage runs with memcheck disabled (e.g. NLT Fault injection
testing uses '--memcheck no'), no *memcheck.xml files are created.
The fileOperations copy step copies 0 files so the target directory
is never created, and the unconditional tar command fails with:

  tar: <dir>: Cannot stat: No such file or directory

Guard the tar with fileExists() so it is only executed when the
memcheck directory was actually populated.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Priority: 2
Cancel-prev-build: false
Skip-python-bandit: true
Skip-unit-test-memcheck: true
Skip-func-vm-all: true
Skip-test-el-9-rpms: true
Skip-test-leap-15-rpms: true
Skip-func-hw-test: true
Skip-build-el8-gcc: true
Skip-build-leap15-gcc: true
Add a new config['nlt_name'] parameter to unitTestPost() to allow
callers to override the display name used for the NLT recordIssues
section in the Jenkins UI. Defaults to 'Node local testing' to keep
existing behaviour for the NLT stage.

The NLT Fault injection testing stage passes nlt_name: 'Fault injection
issues' so its warnings section is clearly distinguished from the
plain NLT stage in the Jenkins UI.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Priority: 2
Cancel-prev-build: false
Skip-python-bandit: true
Skip-unit-test-memcheck: true
Skip-func-vm-all: true
Skip-test-el-9-rpms: true
Skip-test-leap-15-rpms: true
Skip-func-hw-test: true
Skip-build-el8-gcc: true
Skip-build-leap15-gcc: true
The memcheck tarball (${stage}_memcheck_results.tar.bz2) is only created
by unitTest.groovy when memcheck files actually exist (guarded by
fileExists(memcheck_dir)). However, unitTestPost.groovy added the tarball
to artifact_list unconditionally, and artifact_list is archived with
allowEmptyArchive tied to ignore_failure. For the NLT Fault injection
testing stage ignore_failure=false, so archiveArtifacts throws:

  No artifacts found that match the file pattern
  "NLT_Fault_injection_testing_memcheck_results.tar.bz2".
  Configuration error?

Archive the memcheck tarball directly with allowEmptyArchive: true instead
of adding it to artifact_list, so it is silently skipped when no memcheck
files were produced.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Priority: 2
Cancel-prev-build: false
Skip-python-bandit: true
Skip-unit-test-memcheck: true
Skip-func-vm-all: true
Skip-test-el-9-rpms: true
Skip-test-leap-15-rpms: true
Skip-func-hw-test: true
Skip-build-el8-gcc: true
Skip-build-leap15-gcc: true
parseStageInfo: detect 'NLT fault injection' stage separately and set
FI=true in addition to NLT=true, leaving the regular NLT path unchanged
(with valgrind enabled).

unitTest/unitTestPost: remove NLT from the valgrind check condition since
fault injection runs with --memcheck no and produces no memcheck files.

unitTestPost: when FI=true, add nlt-client-leaks.json as a second tool to
the recordIssues call alongside the existing vm_test/nlt-errors.json.

skipStage: add 'NLT Fault injection testing' case to match the stage name
used in the Jenkinsfile.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
@grom72 grom72 requested a review from janekmi May 8, 2026 11:35
@grom72 grom72 marked this pull request as ready for review May 8, 2026 16:21
@grom72 grom72 requested review from JohnMalmberg, daltonbohning, phender and ryon-jensen and removed request for daltonbohning and phender May 8, 2026 16:22
Doc-only: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
@grom72 grom72 force-pushed the grom72/SRE-3704 branch from 3b4a3ee to 6fe3ebe Compare May 11, 2026 06:39
grom72 added 2 commits May 11, 2026 08:58
Doc-only: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Copy link
Copy Markdown
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this goes along with daos-stack/daos#17953.

So we are effectively going to have different handling for this stage based on the release branch? Since the stage name is different, pipeline-lib will treat it differently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants