Skip to content

Fix SQL Server 2017 CI #2599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 23, 2025
Merged

Fix SQL Server 2017 CI #2599

merged 3 commits into from
Apr 23, 2025

Conversation

Torch3333
Copy link
Contributor

@Torch3333 Torch3333 commented Apr 17, 2025

Description

The Ubuntu 20.04 Github action runner was used to run SQL Server 2017 CI job, but this image is no longer available.
Also, the docker container for SQL Server 2017 is incompatible with other Ubuntu action runners (versions 22.04 and 24.04), so we switched to using a Windows Server runner instead.

Related issues and/or PRs

N/A

Changes made

Update the CI and make some changes to some integration tests. See the review comments for more details.

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • I have considered whether similar issues could occur in other products, components, or modules if this PR is for bug fixes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

N/A

Release notes

N/A

@Torch3333 Torch3333 self-assigned this Apr 17, 2025
@Torch3333 Torch3333 force-pushed the fix_sqlserver_2017_CI branch 6 times, most recently from a96c545 to a6a8788 Compare April 22, 2025 07:53
@Torch3333 Torch3333 force-pushed the fix_sqlserver_2017_CI branch from a6a8788 to cb170fd Compare April 22, 2025 07:54
@Torch3333 Torch3333 changed the title Fix SQLServer 2017 CI Fix SQL Server 2017 CI Apr 22, 2025
@Torch3333 Torch3333 marked this pull request as ready for review April 22, 2025 09:17
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the SQL Server 2017 CI by switching the runner from Ubuntu to Windows and adjusting integration tests to account for a different timing behavior in transaction recovery. Key changes include:

  • Updating the prepared timestamp in integration tests to subtract an additional millisecond.
  • Transitioning the CI workflow from Ubuntu 20.04 to Windows by updating the runner, service configuration, and related environment variables.
  • Adapting the Oracle JDK setup steps for the Windows environment and updating container registry login details.

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.

File Description
integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/TwoPhaseConsensusCommitSpecificIntegrationTestBase.java Adjusts the prepared timestamp for the PREPARED and DELETED tests by subtracting an extra 1ms.
integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/ConsensusCommitSpecificIntegrationTestBase.java Similar adjustment to the prepared timestamp in PREPARED and DELETED tests.
integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/ConsensusCommitNullMetadataIntegrationTestBase.java Updates the prepared timestamp in tests that handle null metadata.
.github/workflows/ci.yaml Switches from an Ubuntu runner (with Docker-based SQL Server 2017) to a Windows runner and updates various setup steps to suit the new environment.
Files not reviewed (1)
  • ci/no-superuser/create-no-superuser-sqlserver.sh: Language not supported
Comments suppressed due to low confidence (4)

integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/TwoPhaseConsensusCommitSpecificIntegrationTestBase.java:421

  • Consider extracting the hard-coded '- 1' offset into a clearly named constant (e.g., EXTRA_OFFSET_MS) to document its purpose and improve maintainability in these test cases.
long prepared_at = System.currentTimeMillis() - RecoveryHandler.TRANSACTION_LIFETIME_MILLIS - 1;

integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/ConsensusCommitSpecificIntegrationTestBase.java:515

  • Consider extracting the extra '- 1' into a named constant to clarify why an additional millisecond is subtracted in these tests.
long prepared_at = System.currentTimeMillis() - RecoveryHandler.TRANSACTION_LIFETIME_MILLIS - 1;

integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/ConsensusCommitNullMetadataIntegrationTestBase.java:641

  • Consider extracting the '- 1' offset into a constant to document its intent and ensure consistency across integration tests.
long prepared_at = System.currentTimeMillis() - RecoveryHandler.TRANSACTION_LIFETIME_MILLIS - 1;

.github/workflows/ci.yaml:1174

  • It might be beneficial to add error handling or logging after the Oracle JDK installation step to capture potential installation failures in the CI run.
Write-Host "Install Oracle JDK"

ports:
- 1433:1433
options: --name sqlserver17
runs-on: windows-latest
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to a windows runner for SQL Server 2017

Comment on lines +1140 to +1145
- name: Setup Sql Server 2017
uses: rails-sqlserver/setup-mssql@v1
with:
components: sqlcmd,sqlengine
sa-password: "SqlServer17"
version: 2017
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL Server 2017 is now installed locally instead of using a Docker container on WSL 2, the latter performing worse, as the CI lasted twice as long compared to a local SQL Server installation.
The CI takes about 30 minutes now, compared to less than 15 minutes on Ubuntu.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30 mins for SQL Server 2017?

Anyway, we need to think about running only one version (probably the latest version) for each database in CI of a PR and running with other versions regularly, like weekly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the wrong reporting, the SQL Server 2017 took 30 minutes with some modifications I added for debugging.
In reality, it takes 18 minutes for the regular commit job and 20 minutes for the group commit job to complete. Which is a couple of minutes longer compared to the previous duration observed on the Ubuntu runner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, we need to think about running only one version (probably the latest version) for each database in CI of a PR and running with other versions regularly, like weekly.

I agree.

Comment on lines -1160 to +1182
- name: Login to Oracle container registry
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
if: ${{ env.INT_TEST_JAVA_RUNTIME_VENDOR == 'oracle' }}
if: ${{ env.SET_UP_INT_TEST_RUNTIME_ORACLE_JDK_8_OR_11 == 'true'}}
with:
registry: container-registry.oracle.com
username: ${{ secrets.OCR_USERNAME }}
password: ${{ secrets.OCR_TOKEN }}
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.CR_PAT }}

- name: Set up JDK ${{ env.INT_TEST_JAVA_RUNTIME_VERSION }} (oracle) to run the integration test
if: ${{ env.INT_TEST_JAVA_RUNTIME_VENDOR == 'oracle' }}
if: ${{ env.SET_UP_INT_TEST_RUNTIME_ORACLE_JDK_8_OR_11 == 'true'}}
run: |
container_id=$(docker create "container-registry.oracle.com/java/jdk:${{ env.INT_TEST_JAVA_RUNTIME_VERSION }}")
docker cp -L "$container_id:/usr/java/default" /usr/lib/jvm/oracle-jdk && docker rm "$container_id"
$container_id=$(docker create "ghcr.io/scalar-labs/oracle/jdk:${{ env.INT_TEST_JAVA_RUNTIME_VERSION }}-windows")
docker cp "${container_id}:oracle-jdk.exe" .
docker rm "$container_id"
Write-Host "Install Oracle JDK"
Start-Process "oracle-jdk.exe" -NoNewWindow -Wait -ArgumentList "/s"
Write-Host "Oracle JDK installation successful"
if ( ${env:INT_TEST_JAVA_RUNTIME_VERSION} -eq '8' ) {
$jdk_root_dir = "jdk-1.8"
} else {
$jdk_root_dir = "jdk-11"
}
echo "JAVA_HOME=C:\Program Files\Java\${jdk_root_dir}" >> ${env:GITHUB_ENV}
Copy link
Contributor Author

@Torch3333 Torch3333 Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the JDK setup from the Cosmos DB job, which runs on Windows too.

Comment on lines +5 to +9
SQL_SERVER_PASSWORD=$1
MAX_RETRY_COUNT=$2
RETRY_INTERVAL=$3
# If set, use `sqlcmd` of the SQL Server docker container. If unset, use `sqlcmd` installed on the host.
SQL_SERVER_CONTAINER_NAME=${4:-''}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the order of the script input parameters to set the $SQL_SERVER_CONTAINER_NAME last, since it is now optional.

@@ -638,7 +638,7 @@ public void scan_ScanGivenForPreparedWhenCoordinatorStateAborted_ShouldRollback(
selection_SelectionGivenForPreparedWhenCoordinatorStateNotExistAndExpired_ShouldAbortTransaction(
Selection s) throws ExecutionException, CoordinatorException, TransactionException {
// Arrange
long prepared_at = System.currentTimeMillis() - RecoveryHandler.TRANSACTION_LIFETIME_MILLIS;
long prepared_at = System.currentTimeMillis() - RecoveryHandler.TRANSACTION_LIFETIME_MILLIS - 1;
Copy link
Contributor Author

@Torch3333 Torch3333 Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Windows runner, failures occurred for integration tests that verify the behavior of reading a prepared record without a corresponding transaction status in the coordinator table, which should roll back the transaction if it is expired.

Without the fix above, the transaction was considered not to have expired in some cases by the RecoveryHandler triggered after the prepared record was read.

private void abortIfExpired(Selection selection, TransactionResult result) {
long current = System.currentTimeMillis();
if (current <= result.getPreparedAt() + TRANSACTION_LIFETIME_MILLIS) {
return;
}
try {
coordinator.putStateForLazyRecoveryRollback(result.getId());
rollbackRecord(selection, result);
} catch (CoordinatorException e) {
logger.warn("Coordinator tries to abort {}, but failed", result.getId(), e);
}
}

This happened because System.currentTimeInMillis() sometimes returns the same timestamp as the previous call on this line, which set the prepared_at value for the record.

long prepared_at = System.currentTimeMillis() - RecoveryHandler.TRANSACTION_LIFETIME_MILLIS - 1;

So, in RecoveryHandler.abortIfExpired(), the if (current <= result.getPreparedAt() + TRANSACTION_LIFETIME_MILLIS) { condition was true and the transaction was considered as not expired, so not a target for abort.

The same timestamp value is returned because of a relative lack of system clock precision that we don't observe on Ubuntu. Increasing the system clock precision may be possible( see this documentation), but it is not trivial, so we updated the integration test code instead.

runs-on: windows-latest
env:
# This variable evaluates to: if {!(Temurin JDK 8) && !(Oracle JDK 8 or 11)} then {true} else {false}
SET_UP_INT_TEST_RUNTIME_JDK_WHEN_NOT_ORACLE_8_OR_11: "${{ (github.event_name == 'workflow_dispatch' && !(inputs.INT_TEST_JAVA_RUNTIME_VERSION == '8' && inputs.INT_TEST_JAVA_RUNTIME_VENDOR == 'temurin') && !(inputs.INT_TEST_JAVA_RUNTIME_VENDOR == 'oracle' && (inputs.INT_TEST_JAVA_RUNTIME_VERSION == '8' || inputs.INT_TEST_JAVA_RUNTIME_VERSION == '11'))) && 'true' || 'false' }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[very trivial] The order of the first condition is VERSION and then VENDOR, but the order of the second condition is VENDER and then VERSION. It might be better if they're consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Torch3333 Could you add a comment to explain why this condition is needed instead of SET_UP_INT_TEST_RUNTIME_NON_ORACLE_JDK ?

Copy link
Contributor Author

@Torch3333 Torch3333 Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed both, thank you.
407c806

Please refer to this PR #1978 description for more details about the special handling for Oracle JDK 8 and 11 on Windows.

Triggering the CI with Oracle JDK 8 or 11 to run the integration test currently fails because these versions are unavailable with the Setup Java Github action and more generally through public direct download. To retrieve these JDK versions, you must create a free Oracle account and accept the end-user license agreement.

For these reasons, we need to store these versions internally so that our CI can install them when required.
To store the JDK installer, we decided to package it as Docker image and publish it on the Scalar Github registry. The advantage is that it is cost-effective and we don't need to manage additional credentials compared to a storage option on AWS or Azure.

Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@Torch3333 Torch3333 force-pushed the fix_sqlserver_2017_CI branch from 45ac60d to 407c806 Compare April 23, 2025 00:25
@Torch3333 Torch3333 requested a review from komamitsu April 23, 2025 00:28
Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!!

@brfrn169 brfrn169 merged commit 41a86d7 into master Apr 23, 2025
95 of 96 checks passed
@brfrn169 brfrn169 deleted the fix_sqlserver_2017_CI branch April 23, 2025 01:32
feeblefakie pushed a commit that referenced this pull request Apr 23, 2025
feeblefakie pushed a commit that referenced this pull request Apr 23, 2025
feeblefakie pushed a commit that referenced this pull request Apr 23, 2025
feeblefakie pushed a commit that referenced this pull request Apr 23, 2025
Torch3333 added a commit that referenced this pull request Apr 23, 2025
# Conflicts:
#	.github/workflows/ci.yaml
#	integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/ConsensusCommitSpecificIntegrationTestBase.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants