add e2e reconcile test for databricks source #2145

m-abulazm · 2025-11-13T14:46:06Z

Changes

What does this PR do?

adds an end-to-end (e2e) test for reconcile to validate the setup of testing reconcile e2e

Relevant implementation details

adds first e2e test
moves older ignored reconcile specs that needed sandbox infra access

Caveats/things to watch out for when reviewing:

this does not e2e test all sources and more tests will be added in following PRs

Tests

added acceptance tests

codecov · 2025-11-13T14:48:23Z

Codecov Report

❌ Patch coverage is 72.72727% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.95%. Comparing base (db65960) to head (552f0ba).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
src/databricks/labs/lakebridge/cli.py	66.66%	2 Missing and 2 partials ⚠️
src/databricks/labs/lakebridge/deployment/job.py	66.66%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2145      +/-   ##
==========================================
- Coverage   65.23%   64.95%   -0.29%     
==========================================
  Files         100      101       +1     
  Lines        8504     8544      +40     
  Branches      875      879       +4     
==========================================
+ Hits         5548     5550       +2     
- Misses       2769     2802      +33     
- Partials      187      192       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-11-13T14:49:23Z

✅ 51/51 passed, 10 flaky, 4m10s total

Flaky tests:

🤪 test_validate_mixed_checks (171ms)
🤪 test_validate_non_empty_tables (7ms)
🤪 test_validate_invalid_schema_path (1ms)
🤪 test_validate_invalid_source_tech (177ms)
🤪 test_validate_table_not_found (1ms)
🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (22.744s)
🤪 test_transpiles_informatica_to_sparksql (24.211s)
🤪 test_transpile_teradata_sql (24.567s)
🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (4.269s)
🤪 test_transpile_teradata_sql_non_interactive[True] (6.486s)

_{Running from acceptance #3058}

sundarshankar89 · 2025-11-18T07:25:49Z

src/databricks/labs/lakebridge/cli.py

-    recon_runner.run(operation_name=RECONCILE_OPERATION_NAME)
+
+    _, job_run_url = recon_runner.run(operation_name=RECONCILE_OPERATION_NAME)
+    if ctx.prompts.confirm(f"Would you like to open the job run URL `{job_run_url}` in the browser?"):


I see where you are going which this, but let us not do this given our overall goal of going away from user prompting to UI.

Check the implementation in llm_transpile

then my refactoring makes sense, the prompting is where it should be in the cli and not in the runner that should not know anything about it.

This looks like a move: hoisting it from inside the .run() call to outside. I'm neutral on that; even though I agree with @sundarshankar89 about the original code, that's not the point of this PR.

Overall I'd prefer to just drop the prompt and leave the existing logging of the URL: every terminal allows the job to be opened by just clicking on it.

That's not the point or focus of this PR, however, which is why I'm neutral on leaving the UX as-is for now.

I assume this was moved to make some of the tests easier in some way?

Prompts are part of the cli - they wait till a user interacts. Other entry points like here (an integration test) do not require any user interaction nor the UI.

Removing it made the tests easier that I dont have to mock it since the real implementation suspends the execution waiting for a user input. I moved it to the cli because it only belongs there

sundarshankar89 · 2025-11-18T07:25:59Z

src/databricks/labs/lakebridge/cli.py


-    recon_runner.run(operation_name=AGG_RECONCILE_OPERATION_NAME)
+    _, job_run_url = recon_runner.run(operation_name=AGG_RECONCILE_OPERATION_NAME)
+    if ctx.prompts.confirm(f"Would you like to open the job run URL `{job_run_url}` in the browser?"):


Same as above

tests/unit/test_cli_other.py

pyproject.toml

asnare

I like that we've now got an integration test covering reconciliation, although normally I'd also like something to cover the CLI entry-point. That's an omission from the original code and although it would also be an improvement though, it not being present is not a problem with this PR.

That said, we need to change back to the existing structure with two sets of tests:

unit: fast tests, typically focused on a narrow area of the code, mocks and so forth for convenience. No dependencies on testing infrastructure. Anyone that clones the repo should be able to run these.
integration: slower, typically covering larger interactions between components in our code and aligned with user workflows. These typically use actual components rather than mocks or stubs. Although they may rely on testing infrastructure and external resources, its absence should result in skipped tests rather than failures. Although for convenience these can be run directly via pytest, the reference runner is our labs test command: this is also what CI/CD does.

I know there's an existing test-install entry-point but that's a mistake and we should be trying to eliminate it as a special category. (I understand that its presence is confusing and potentially misleading.)

asnare · 2025-11-20T10:21:50Z

src/databricks/labs/lakebridge/cli.py

-    recon_runner.run(operation_name=RECONCILE_OPERATION_NAME)
+
+    _, job_run_url = recon_runner.run(operation_name=RECONCILE_OPERATION_NAME)
+    if ctx.prompts.confirm(f"Would you like to open the job run URL `{job_run_url}` in the browser?"):


This looks like a move: hoisting it from inside the .run() call to outside. I'm neutral on that; even though I agree with @sundarshankar89 about the original code, that's not the point of this PR.

Overall I'd prefer to just drop the prompt and leave the existing logging of the URL: every terminal allows the job to be opened by just clicking on it.

That's not the point or focus of this PR, however, which is why I'm neutral on leaving the UX as-is for now.

I assume this was moved to make some of the tests easier in some way?

tests/unit/test_cli_other.py

asnare · 2025-11-20T10:37:45Z

Makefile

+test-reconcile: setup_spark_remote
+	hatch run test-reconcile


Let's not do this. I'll provide more context in the review summary.

I need this convenience in my day to day. labs test errors out on my machine and I didnt really get how it works

asnare

I've had another look at this, and beyond the changes to the testing/project setup I've highlighted some other areas that need attention.

Getting back to the broad testing/project setup, we need to consolidate on 2 suites of tests, as described in my first review: unit and integration.

It's worth mentioning that I see that integration currently only runs the reconcile tests. This is also a mistake (that I don't fully understand). The goal we should we working towards is that the integration suite runs everything under test/integration. Fragmenting this by introducing additional suites just makes maintenance more brittle over the long term because it's harder to run/test everything locally.

I understand your concerns around workflow, and we definitely need to address these. You should absolutely have access to the sandbox environment, and this not working is something we need to solve. Similarly, it's easy to run specific sets of tests:

The labs tool can run a single test by name. (Often last resort, ways below are easier.)
From the IDE, running a single test or all in a file/package is normally a case of right-click + "Run".
From the command-line: hatch run pytest […] works as well if necessary.

(If these aren't working for you that's also something we need to look at.)

asnare · 2025-12-01T11:33:29Z

src/databricks/labs/lakebridge/deployment/job.py

+        if self._is_testing():
+            self._test_env = TestEnvGetter()


I think we can do better than this. It's a pity the existing code mixes test-specific logic into the production code… that's an indicator that we've got a design flaw.

Let's not make it more pervasive though. Instead I think we can pass the cluster configuration (optionally) as an argument when initialising the instance? Tests can then pass in the cluster they want to be used and it can default to the reconciliation cluster. (Ideally the cluster would come from the ApplicationContext but that's for another day.)

I think that would allow us to get rid of this optional _test_env attribute?

asnare · 2025-12-01T11:38:01Z

src/databricks/labs/lakebridge/connections/debug_envgetter.py

TestEnvGetter is intended only for use during integration tests, I'd really prefer to avoid including it in our production code. (As noted: we should be able to use dependency injection from the tests rather than have production code consult this directly.)

asnare · 2025-12-01T11:42:01Z

tests/unit/deployment/test_job.py

I'm a bit curious about why this unit test was removed? (I'm not sure I see an obvious replacement, so it's not clear to me if it was moved?)

asnare · 2025-12-01T11:50:10Z

tests/integration/reconcile_system_tests/test_recon_databricks.py

+def test_recon_databricks(ws):
+    ctx = ApplicationContext(ws)


Don't forget to type-hint, and I think we need a better name for this test: what we're testing is that the job can be submitted.

Although we wait for the outcome of the job, we don't:

Check whether the job succeeded or not.

Check the reconciliation results.

asnare · 2025-12-01T11:57:55Z

tests/integration/reconcile_system_tests/test_recon_databricks.py

+    ctx.replace(product_info=ProductInfo.for_testing(LakebridgeConfiguration))
+    ctx.installation.save(recon_config)
+    ctx.installation.upload(filename, TABLE_RECON_JSON.encode())
+    ctx.workspace_installation.install(config)


Where do we clean up the resources that this test creates?

In general integration tests need to clean up after themselves, but I must confess the codebase doesn't really seem to be set up to make this easy to do.

asnare · 2025-12-01T12:06:02Z

tests/unit/test_cli_other.py

 def test_cli_reconcile(mock_workspace_client):
-    with patch("databricks.labs.lakebridge.reconcile.runner.ReconcileRunner.run", return_value=True):
-        cli.reconcile(w=mock_workspace_client)
+    with patch("databricks.labs.lakebridge.reconcile.runner.ReconcileRunner.run", return_value=(MagicMock(), True)):


The mocked return value here doesn't match the type hint of that function (tuple[Wait[Run], str]).

asnare · 2025-12-01T12:06:13Z

tests/unit/test_cli_other.py

 def test_cli_aggregates_reconcile(mock_workspace_client):
-    with patch("databricks.labs.lakebridge.reconcile.runner.ReconcileRunner.run", return_value=True):
-        cli.aggregates_reconcile(w=mock_workspace_client)
+    with patch("databricks.labs.lakebridge.reconcile.runner.ReconcileRunner.run", return_value=(MagicMock(), True)):


This tuple returned by the mock doesn't match the type hint of that function (tuple[Wait[Run], str]).

asnare · 2025-12-01T12:08:32Z

tests/integration/deployment/test_job.py

Ahh… here they are. It looks like we've accidentally moved these unit tests into the integration suite?

add e2e reconcile test

13b45e7

m-abulazm had a problem deploying to tool November 13, 2025 14:46 — with GitHub Actions Failure

do not serialize dataclass

a32a768

m-abulazm temporarily deployed to tool November 13, 2025 15:04 — with GitHub Actions Inactive

m-abulazm self-assigned this Nov 13, 2025

introduce acceptance scope

db87720

m-abulazm had a problem deploying to tool November 14, 2025 12:13 — with GitHub Actions Failure

keep old behavior with integration running only reconcile module

bb146ec

m-abulazm temporarily deployed to tool November 14, 2025 13:52 — with GitHub Actions Inactive

use fixture from pytester

fa504ed

m-abulazm temporarily deployed to tool November 14, 2025 14:18 — with GitHub Actions Inactive

fmt

6cc0e2c

m-abulazm temporarily deployed to tool November 14, 2025 15:06 — with GitHub Actions Inactive

wait for job in reconcile acceptance test

8a9cf75

m-abulazm temporarily deployed to tool November 14, 2025 15:50 — with GitHub Actions Inactive

fix unit tests

f44f956

m-abulazm temporarily deployed to tool November 14, 2025 16:19 — with GitHub Actions Inactive

m-abulazm requested review from gueniai and sundarshankar89 November 14, 2025 16:35

m-abulazm changed the title ~~add e2e reconcile test~~ add e2e reconcile test for databricks source Nov 14, 2025

m-abulazm marked this pull request as ready for review November 14, 2025 16:35

m-abulazm requested a review from a team as a code owner November 14, 2025 16:35

improve covergae

2094562

m-abulazm temporarily deployed to tool November 17, 2025 09:25 — with GitHub Actions Inactive

make it clear what is integration test vs system vs reconcile

1210304

m-abulazm temporarily deployed to tool November 17, 2025 16:07 — with GitHub Actions Inactive

Merge branch 'main' into e2e/reconcile

f317d45

m-abulazm temporarily deployed to tool November 17, 2025 20:01 — with GitHub Actions Inactive

sundarshankar89 reviewed Nov 18, 2025

View reviewed changes

m-abulazm requested a review from sundarshankar89 November 18, 2025 10:05

add test_databricks_read_schema_happy_sandbox

052b718

m-abulazm temporarily deployed to tool November 18, 2025 10:38 — with GitHub Actions Inactive

move the tests one more time

eed1a1b

m-abulazm temporarily deployed to tool November 19, 2025 15:58 — with GitHub Actions Inactive

m-abulazm requested a review from asnare November 19, 2025 17:21

asnare requested changes Nov 20, 2025

View reviewed changes

do not patch. pass factory instead

effd8a3

m-abulazm temporarily deployed to tool November 20, 2025 12:11 — with GitHub Actions Inactive

m-abulazm requested a review from asnare November 20, 2025 12:35

Merge branch 'main' into e2e/reconcile

d132e11

m-abulazm temporarily deployed to tool November 25, 2025 13:47 — with GitHub Actions Inactive

m-abulazm added 3 commits November 28, 2025 13:45

move debug_envgetter

3d24c69

use test cluster during test recon deployments

80fe8cc

configure productinfo in e2e spec as a test run

25237b0

m-abulazm temporarily deployed to tool November 28, 2025 12:45 — with GitHub Actions Inactive

move two tests to integration as they depend on test creds

d4207e7

m-abulazm had a problem deploying to tool November 28, 2025 13:16 — with GitHub Actions Error

remove copied fixture as it triggers fmt duplicate code error

552f0ba

m-abulazm temporarily deployed to tool November 28, 2025 13:19 — with GitHub Actions Inactive

asnare added tech debt design flaws and other cascading effects internal technical pr's not end user facing labels Dec 1, 2025

asnare requested changes Dec 1, 2025

View reviewed changes

add e2e reconcile test for databricks source #2145

Are you sure you want to change the base?

add e2e reconcile test for databricks source #2145

Uh oh!

Conversation

m-abulazm commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

What does this PR do?

Relevant implementation details

Caveats/things to watch out for when reviewing:

Tests

Uh oh!

codecov bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

asnare left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asnare left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

m-abulazm commented Nov 13, 2025 •

edited

Loading

codecov bot commented Nov 13, 2025 •

edited

Loading

github-actions bot commented Nov 13, 2025 •

edited

Loading