-
Notifications
You must be signed in to change notification settings - Fork 80
add e2e reconcile test for databricks source #2145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2145 +/- ##
==========================================
- Coverage 65.23% 64.95% -0.29%
==========================================
Files 100 101 +1
Lines 8504 8544 +40
Branches 875 879 +4
==========================================
+ Hits 5548 5550 +2
- Misses 2769 2802 +33
- Partials 187 192 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
✅ 51/51 passed, 10 flaky, 4m10s total Flaky tests:
Running from acceptance #3058 |
| recon_runner.run(operation_name=RECONCILE_OPERATION_NAME) | ||
|
|
||
| _, job_run_url = recon_runner.run(operation_name=RECONCILE_OPERATION_NAME) | ||
| if ctx.prompts.confirm(f"Would you like to open the job run URL `{job_run_url}` in the browser?"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where you are going which this, but let us not do this given our overall goal of going away from user prompting to UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the implementation in llm_transpile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then my refactoring makes sense, the prompting is where it should be in the cli and not in the runner that should not know anything about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a move: hoisting it from inside the .run() call to outside. I'm neutral on that; even though I agree with @sundarshankar89 about the original code, that's not the point of this PR.
Overall I'd prefer to just drop the prompt and leave the existing logging of the URL: every terminal allows the job to be opened by just clicking on it.
That's not the point or focus of this PR, however, which is why I'm neutral on leaving the UX as-is for now.
I assume this was moved to make some of the tests easier in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prompts are part of the cli - they wait till a user interacts. Other entry points like here (an integration test) do not require any user interaction nor the UI.
Removing it made the tests easier that I dont have to mock it since the real implementation suspends the execution waiting for a user input. I moved it to the cli because it only belongs there
|
|
||
| recon_runner.run(operation_name=AGG_RECONCILE_OPERATION_NAME) | ||
| _, job_run_url = recon_runner.run(operation_name=AGG_RECONCILE_OPERATION_NAME) | ||
| if ctx.prompts.confirm(f"Would you like to open the job run URL `{job_run_url}` in the browser?"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
asnare
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that we've now got an integration test covering reconciliation, although normally I'd also like something to cover the CLI entry-point. That's an omission from the original code and although it would also be an improvement though, it not being present is not a problem with this PR.
That said, we need to change back to the existing structure with two sets of tests:
unit: fast tests, typically focused on a narrow area of the code, mocks and so forth for convenience. No dependencies on testing infrastructure. Anyone that clones the repo should be able to run these.integration: slower, typically covering larger interactions between components in our code and aligned with user workflows. These typically use actual components rather than mocks or stubs. Although they may rely on testing infrastructure and external resources, its absence should result in skipped tests rather than failures. Although for convenience these can be run directly via pytest, the reference runner is ourlabs testcommand: this is also what CI/CD does.
I know there's an existing test-install entry-point but that's a mistake and we should be trying to eliminate it as a special category. (I understand that its presence is confusing and potentially misleading.)
| recon_runner.run(operation_name=RECONCILE_OPERATION_NAME) | ||
|
|
||
| _, job_run_url = recon_runner.run(operation_name=RECONCILE_OPERATION_NAME) | ||
| if ctx.prompts.confirm(f"Would you like to open the job run URL `{job_run_url}` in the browser?"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a move: hoisting it from inside the .run() call to outside. I'm neutral on that; even though I agree with @sundarshankar89 about the original code, that's not the point of this PR.
Overall I'd prefer to just drop the prompt and leave the existing logging of the URL: every terminal allows the job to be opened by just clicking on it.
That's not the point or focus of this PR, however, which is why I'm neutral on leaving the UX as-is for now.
I assume this was moved to make some of the tests easier in some way?
| test-reconcile: setup_spark_remote | ||
| hatch run test-reconcile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not do this. I'll provide more context in the review summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need this convenience in my day to day. labs test errors out on my machine and I didnt really get how it works
asnare
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had another look at this, and beyond the changes to the testing/project setup I've highlighted some other areas that need attention.
Getting back to the broad testing/project setup, we need to consolidate on 2 suites of tests, as described in my first review: unit and integration.
It's worth mentioning that I see that integration currently only runs the reconcile tests. This is also a mistake (that I don't fully understand). The goal we should we working towards is that the integration suite runs everything under test/integration. Fragmenting this by introducing additional suites just makes maintenance more brittle over the long term because it's harder to run/test everything locally.
I understand your concerns around workflow, and we definitely need to address these. You should absolutely have access to the sandbox environment, and this not working is something we need to solve. Similarly, it's easy to run specific sets of tests:
- The
labstool can run a single test by name. (Often last resort, ways below are easier.) - From the IDE, running a single test or all in a file/package is normally a case of right-click + "Run".
- From the command-line:
hatch run pytest […]works as well if necessary.
(If these aren't working for you that's also something we need to look at.)
| if self._is_testing(): | ||
| self._test_env = TestEnvGetter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do better than this. It's a pity the existing code mixes test-specific logic into the production code… that's an indicator that we've got a design flaw.
Let's not make it more pervasive though. Instead I think we can pass the cluster configuration (optionally) as an argument when initialising the instance? Tests can then pass in the cluster they want to be used and it can default to the reconciliation cluster. (Ideally the cluster would come from the ApplicationContext but that's for another day.)
I think that would allow us to get rid of this optional _test_env attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestEnvGetter is intended only for use during integration tests, I'd really prefer to avoid including it in our production code. (As noted: we should be able to use dependency injection from the tests rather than have production code consult this directly.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit curious about why this unit test was removed? (I'm not sure I see an obvious replacement, so it's not clear to me if it was moved?)
| def test_recon_databricks(ws): | ||
| ctx = ApplicationContext(ws) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to type-hint, and I think we need a better name for this test: what we're testing is that the job can be submitted.
Although we wait for the outcome of the job, we don't:
- Check whether the job succeeded or not.
- Check the reconciliation results.
| ctx.replace(product_info=ProductInfo.for_testing(LakebridgeConfiguration)) | ||
| ctx.installation.save(recon_config) | ||
| ctx.installation.upload(filename, TABLE_RECON_JSON.encode()) | ||
| ctx.workspace_installation.install(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we clean up the resources that this test creates?
In general integration tests need to clean up after themselves, but I must confess the codebase doesn't really seem to be set up to make this easy to do.
| def test_cli_reconcile(mock_workspace_client): | ||
| with patch("databricks.labs.lakebridge.reconcile.runner.ReconcileRunner.run", return_value=True): | ||
| cli.reconcile(w=mock_workspace_client) | ||
| with patch("databricks.labs.lakebridge.reconcile.runner.ReconcileRunner.run", return_value=(MagicMock(), True)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mocked return value here doesn't match the type hint of that function (tuple[Wait[Run], str]).
| def test_cli_aggregates_reconcile(mock_workspace_client): | ||
| with patch("databricks.labs.lakebridge.reconcile.runner.ReconcileRunner.run", return_value=True): | ||
| cli.aggregates_reconcile(w=mock_workspace_client) | ||
| with patch("databricks.labs.lakebridge.reconcile.runner.ReconcileRunner.run", return_value=(MagicMock(), True)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tuple returned by the mock doesn't match the type hint of that function (tuple[Wait[Run], str]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh… here they are. It looks like we've accidentally moved these unit tests into the integration suite?
Changes
What does this PR do?
adds an end-to-end (e2e) test for reconcile to validate the setup of testing reconcile e2e
Relevant implementation details
Caveats/things to watch out for when reviewing:
Tests