Improve handling of output from LSP servers #2163

asnare · 2025-11-24T15:16:55Z

Changes

LSP servers use stdin/stdout for the LSP calls, and stderr as a general logging facility. Currently we mirror the stderr output as logs, but due to the implementation details this fails when an individual line received from stderr is longer than 64KiB in size.

Although #2160 contains a hot fix for this problem, this PR improves the handling in a more robust manner:

Long lines from the LSP server are now detected and broken up into chunks, as several log entries. (This is preferable to a large limit because it means we have an upper bound on memory usage rather than potentially buffering forever and running out of memory.)
The cause of the error when a long line hits has been addressed, however if we hit some other error during stderr processing we now log the (critical!) error so that it's at least clear something has gone wrong. (The nature of this is that the CLI might hang, but it will be noisy about it.)

Relevant implementation details

The primary change here is to handle line-breaking and decoding ourselves rather than relying on the StreamReader.readline() implementation. (The behaviour of the latter cannot be reliably controlled when a line exceeds the configured limit, which defaults to 64K.)

Caveats/things to watch out for when reviewing:

Further changes are needed to the test LSP server (lsp_server.py) that we use for testing, they are out of scope for this PR.

Linked issues

Supersedes #2160, closes #2164.

Functionality

modified existing command: databricks labs lakebridge transpile

Tests

manually tested
added unit tests

Changes include: - Logging the (critical!) error if something goes wrong during the handling. - Dealing with lines longer than the chunking limit (64KiB). - Tests to exercise this.

codecov · 2025-11-24T15:19:46Z

Codecov Report

❌ Patch coverage is 89.79592% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.32%. Comparing base (338e93c) to head (d2e8aed).

Files with missing lines	Patch %	Lines
...ricks/labs/lakebridge/transpiler/lsp/lsp_engine.py	89.79%	2 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2163      +/-   ##
==========================================
+ Coverage   65.23%   65.32%   +0.08%     
==========================================
  Files         100      100              
  Lines        8504     8533      +29     
  Branches      875      883       +8     
==========================================
+ Hits         5548     5574      +26     
- Misses       2769     2770       +1     
- Partials      187      189       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-11-24T15:21:56Z

✅ 51/51 passed, 11 flaky, 3m55s total

Flaky tests:

🤪 test_validate_mixed_checks (179ms)
🤪 test_validate_invalid_schema_path (1ms)
🤪 test_validate_successful_schema_check (176ms)
🤪 test_validate_non_empty_tables (11ms)
🤪 test_validate_invalid_schema_check (1ms)
🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (21.022s)
🤪 test_transpiles_informatica_to_sparksql (21.659s)
🤪 test_transpile_teradata_sql_non_interactive[True] (23.684s)
🤪 test_transpile_teradata_sql_non_interactive[False] (23.015s)
🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (4.441s)
🤪 test_transpile_teradata_sql (8.675s)

_{Running from acceptance #3055}

Improvements include: - Properly enforce the upper memory limit. - Use [..?] instead of [...] to indicate a prematurely flushed chunk, we don't know if another chunk will arrive. - Zero-copy buffer management.

A large buffer is no longer needed now that we handle such lines properly.

asnare added 2 commits November 24, 2025 15:03

Improve handling of large lines written to stderr by LSP servers.

8cd7d38

Changes include: - Logging the (critical!) error if something goes wrong during the handling. - Dealing with lines longer than the chunking limit (64KiB). - Tests to exercise this.

Split pipe_stderr() to allow for easier unit-testing.

9c7cbf0

asnare self-assigned this Nov 24, 2025

asnare added the bug Something isn't working label Nov 24, 2025

asnare temporarily deployed to tool November 24, 2025 15:17 — with GitHub Actions Inactive

asnare added 2 commits November 24, 2025 20:26

Reorder imports.

a2bce28

Unit tests for the stream-logger, and performance improvements.

025c64b

Improvements include: - Properly enforce the upper memory limit. - Use [..?] instead of [...] to indicate a prematurely flushed chunk, we don't know if another chunk will arrive. - Zero-copy buffer management.

asnare temporarily deployed to tool November 24, 2025 19:28 — with GitHub Actions Inactive

asnare marked this pull request as ready for review November 24, 2025 19:32

asnare requested a review from a team as a code owner November 24, 2025 19:32

asnare requested review from gueniai, jimidle, m-abulazm and sundarshankar89 November 24, 2025 19:46

asnare mentioned this pull request Nov 25, 2025

Hotfix: increase the maximum line-size that can be handled when process stderr from LSP servers #2160

Merged

asnare added 2 commits November 25, 2025 10:28

Merge branch 'main' into fix/stderr-long-lines

fcf9bee

Remove earlier workaround for large lines from stderr.

e73b9bf

A large buffer is no longer needed now that we handle such lines properly.

asnare temporarily deployed to tool November 25, 2025 09:30 — with GitHub Actions Inactive

asnare added 2 commits November 25, 2025 10:47

Improve type annotations.

b43238d

Move tests into the file dedicated to tests of stderr processing.

eb84751

asnare temporarily deployed to tool November 25, 2025 10:13 — with GitHub Actions Inactive

Merge branch 'main' into fix/stderr-long-lines

d2e8aed

asnare temporarily deployed to tool November 27, 2025 14:53 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve handling of output from LSP servers #2163

Improve handling of output from LSP servers #2163

Uh oh!

asnare commented Nov 24, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve handling of output from LSP servers #2163

Are you sure you want to change the base?

Improve handling of output from LSP servers #2163

Uh oh!

Conversation

asnare commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Relevant implementation details

Caveats/things to watch out for when reviewing:

Linked issues

Functionality

Tests

Uh oh!

codecov bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asnare commented Nov 24, 2025 •

edited

Loading

codecov bot commented Nov 24, 2025 •

edited

Loading

github-actions bot commented Nov 24, 2025 •

edited

Loading