-
Notifications
You must be signed in to change notification settings - Fork 31
Add E2E test workflow for kind cluster #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add user workflow test simulating real application usage - Deploy full RAG stack in kind for CI testing - Optimized Helm values for CPU-only environment - Runs on PRs, pushes, and manual dispatch
yashoza19
approved these changes
Oct 13, 2025
sauagarwa
approved these changes
Oct 13, 2025
yashoza19
approved these changes
Oct 13, 2025
yashoza19
approved these changes
Oct 13, 2025
yashoza19
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
b46db26 to
ca8f3f7
Compare
- Install OpenShift Route CRD in Kind cluster for compatibility - Update workflow to support OpenShift-specific resources - Add fallback CRD definition if upstream Route CRD unavailable - Update documentation to reflect MicroShift compatibility testing - Ensure helm install works with OpenShift Route resources This enables testing the RAG application in an environment that mirrors MicroShift/OpenShift deployments while using Kind for CI.
The kind-action was failing because the inline config YAML wasn't being parsed correctly. Creating the config file explicitly before passing it to kind-action resolves the issue.
This step is required to fetch chart dependencies (pgvector, minio, llm-service, configure-pipeline, ingestion-pipeline, llama-stack) before helm install. Without this, the installation fails with missing dependencies error.
Disable llm-service and configure-pipeline components that require: - InferenceService (serving.kserve.io/v1beta1) - ServingRuntime (serving.kserve.io/v1alpha1) - DataSciencePipelinesApplication (datasciencepipelinesapplications.opendatahub.io/v1) - Notebook (kubeflow.org/v1) These CRDs are not available in Kind clusters. The llama-stack component provides the inference capabilities we need for basic e2e testing without requiring KServe.
Install minimal CRD definitions to satisfy Helm chart validation even though the actual components (llm-service, configure-pipeline, ingestion-pipeline) are disabled in e2e tests. CRDs installed: - routes.route.openshift.io (OpenShift) - inferenceservices.serving.kserve.io (KServe) - servingruntimes.serving.kserve.io (KServe) - datasciencepipelinesapplications.datasciencepipelinesapplications.opendatahub.io (OpenDataHub) - notebooks.kubeflow.org (Kubeflow) This approach allows Kind-based e2e tests to work with helm charts that reference these CRDs without requiring full MicroShift/OpenShift setup.
Even with enabled: false, the configure-pipeline subchart was trying to create a PVC. Explicitly disable persistence and PVC creation to prevent the PersistentVolumeClaim pipeline-vol from blocking deployment.
Disabled subcharts (configure-pipeline, llm-service, ingestion-pipeline) still create resources including PVCs that may never bind. Removing --wait from helm install and instead explicitly waiting for only the core deployments we need (rag UI and llamastack). This prevents the 20-minute timeout waiting for unused resources.
Added detailed logging throughout the wait process: - List all resources before waiting - Show deployment and pod status - Describe deployments to see configuration - Show events to catch scheduling/image pull issues - Add failure handlers with detailed diagnostics - Show logs on failure - Exit with error on timeout for faster feedback This will help identify why deployments get stuck (image pull, resource constraints, scheduling issues, etc.)
Disabled in e2e tests: - minio.sampleFileUpload: Job was failing with ImagePullBackOff - mcp-servers: Not needed for basic e2e tests - ingestion-pipeline: Add top-level enabled: false These components were creating pods with image pull issues that blocked deployment. We only need the core stack (rag UI + llamastack + pgvector + minio) for basic e2e testing.
The llamastack init container was waiting for a model service endpoint created by llm-service (which we disabled). For basic e2e tests: - Removed global.models configuration - Disabled llamastack init containers - Focus on testing UI/backend connectivity without full model inference This allows the e2e tests to validate the application stack without requiring KServe/llm-service infrastructure.
Modified test_user_workflow.py to focus on connectivity and health checks: - Skip model inference tests when SKIP_MODEL_TESTS=true (default) - Test UI accessibility - Test backend connectivity - Test API endpoint availability - Test health endpoints This allows e2e tests to validate application deployment without requiring full model serving infrastructure, significantly reducing resource requirements and startup time.
- Fixed NameError by removing INFERENCE_MODEL print statement - Set ingestion-pipeline replicaCount: 0 to prevent pod creation
- Restored INFERENCE_MODEL variable from environment - Added intelligent model detection (SKIP_MODEL_TESTS=auto by default) - Tests will automatically skip inference if no models configured - Tests will run inference if models are available (future-proof) - Gracefully handles both scenarios without errors
The Llama Stack API returns 404 on root endpoint (/) which is valid behavior for API-only services. Allow both 200 and 404 status codes to pass the connectivity test.
sauagarwa
approved these changes
Oct 24, 2025
Author
|
closing in favour of #65 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Consolidates and fixes e2e tests to run successfully in Kind-based GitHub Actions with OpenShift/MicroShift compatibility.
Key Changes
Tests & Workflows
CRDs & Dependencies
Configuration
What Gets Tested
✅ RAG UI accessibility
✅ Llama Stack connectivity
✅ API endpoints & health checks
⏭️ Model inference (auto-skipped if no models configured)
Results
Note
This is a lightweight deployment validation test. For full functionality testing with models, enable llm-service and set SKIP_MODEL_TESTS=false.