-
Notifications
You must be signed in to change notification settings - Fork 40
Fix SQL Server 2017 CI #2599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix SQL Server 2017 CI #2599
Conversation
a96c545
to
a6a8788
Compare
a6a8788
to
cb170fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes the SQL Server 2017 CI by switching the runner from Ubuntu to Windows and adjusting integration tests to account for a different timing behavior in transaction recovery. Key changes include:
- Updating the prepared timestamp in integration tests to subtract an additional millisecond.
- Transitioning the CI workflow from Ubuntu 20.04 to Windows by updating the runner, service configuration, and related environment variables.
- Adapting the Oracle JDK setup steps for the Windows environment and updating container registry login details.
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.
File | Description |
---|---|
integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/TwoPhaseConsensusCommitSpecificIntegrationTestBase.java | Adjusts the prepared timestamp for the PREPARED and DELETED tests by subtracting an extra 1ms. |
integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/ConsensusCommitSpecificIntegrationTestBase.java | Similar adjustment to the prepared timestamp in PREPARED and DELETED tests. |
integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/ConsensusCommitNullMetadataIntegrationTestBase.java | Updates the prepared timestamp in tests that handle null metadata. |
.github/workflows/ci.yaml | Switches from an Ubuntu runner (with Docker-based SQL Server 2017) to a Windows runner and updates various setup steps to suit the new environment. |
Files not reviewed (1)
- ci/no-superuser/create-no-superuser-sqlserver.sh: Language not supported
Comments suppressed due to low confidence (4)
integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/TwoPhaseConsensusCommitSpecificIntegrationTestBase.java:421
- Consider extracting the hard-coded '- 1' offset into a clearly named constant (e.g., EXTRA_OFFSET_MS) to document its purpose and improve maintainability in these test cases.
long prepared_at = System.currentTimeMillis() - RecoveryHandler.TRANSACTION_LIFETIME_MILLIS - 1;
integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/ConsensusCommitSpecificIntegrationTestBase.java:515
- Consider extracting the extra '- 1' into a named constant to clarify why an additional millisecond is subtracted in these tests.
long prepared_at = System.currentTimeMillis() - RecoveryHandler.TRANSACTION_LIFETIME_MILLIS - 1;
integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/ConsensusCommitNullMetadataIntegrationTestBase.java:641
- Consider extracting the '- 1' offset into a constant to document its intent and ensure consistency across integration tests.
long prepared_at = System.currentTimeMillis() - RecoveryHandler.TRANSACTION_LIFETIME_MILLIS - 1;
.github/workflows/ci.yaml:1174
- It might be beneficial to add error handling or logging after the Oracle JDK installation step to capture potential installation failures in the CI run.
Write-Host "Install Oracle JDK"
ports: | ||
- 1433:1433 | ||
options: --name sqlserver17 | ||
runs-on: windows-latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch to a windows runner for SQL Server 2017
- name: Setup Sql Server 2017 | ||
uses: rails-sqlserver/setup-mssql@v1 | ||
with: | ||
components: sqlcmd,sqlengine | ||
sa-password: "SqlServer17" | ||
version: 2017 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL Server 2017 is now installed locally instead of using a Docker container on WSL 2, the latter performing worse, as the CI lasted twice as long compared to a local SQL Server installation.
The CI takes about 30 minutes now, compared to less than 15 minutes on Ubuntu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
30 mins for SQL Server 2017?
Anyway, we need to think about running only one version (probably the latest version) for each database in CI of a PR and running with other versions regularly, like weekly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the wrong reporting, the SQL Server 2017 took 30 minutes with some modifications I added for debugging.
In reality, it takes 18 minutes for the regular commit job and 20 minutes for the group commit job to complete. Which is a couple of minutes longer compared to the previous duration observed on the Ubuntu runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, we need to think about running only one version (probably the latest version) for each database in CI of a PR and running with other versions regularly, like weekly.
I agree.
- name: Login to Oracle container registry | ||
- name: Login to GitHub Container Registry | ||
uses: docker/login-action@v3 | ||
if: ${{ env.INT_TEST_JAVA_RUNTIME_VENDOR == 'oracle' }} | ||
if: ${{ env.SET_UP_INT_TEST_RUNTIME_ORACLE_JDK_8_OR_11 == 'true'}} | ||
with: | ||
registry: container-registry.oracle.com | ||
username: ${{ secrets.OCR_USERNAME }} | ||
password: ${{ secrets.OCR_TOKEN }} | ||
registry: ghcr.io | ||
username: ${{ github.repository_owner }} | ||
password: ${{ secrets.CR_PAT }} | ||
|
||
- name: Set up JDK ${{ env.INT_TEST_JAVA_RUNTIME_VERSION }} (oracle) to run the integration test | ||
if: ${{ env.INT_TEST_JAVA_RUNTIME_VENDOR == 'oracle' }} | ||
if: ${{ env.SET_UP_INT_TEST_RUNTIME_ORACLE_JDK_8_OR_11 == 'true'}} | ||
run: | | ||
container_id=$(docker create "container-registry.oracle.com/java/jdk:${{ env.INT_TEST_JAVA_RUNTIME_VERSION }}") | ||
docker cp -L "$container_id:/usr/java/default" /usr/lib/jvm/oracle-jdk && docker rm "$container_id" | ||
$container_id=$(docker create "ghcr.io/scalar-labs/oracle/jdk:${{ env.INT_TEST_JAVA_RUNTIME_VERSION }}-windows") | ||
docker cp "${container_id}:oracle-jdk.exe" . | ||
docker rm "$container_id" | ||
Write-Host "Install Oracle JDK" | ||
Start-Process "oracle-jdk.exe" -NoNewWindow -Wait -ArgumentList "/s" | ||
Write-Host "Oracle JDK installation successful" | ||
if ( ${env:INT_TEST_JAVA_RUNTIME_VERSION} -eq '8' ) { | ||
$jdk_root_dir = "jdk-1.8" | ||
} else { | ||
$jdk_root_dir = "jdk-11" | ||
} | ||
echo "JAVA_HOME=C:\Program Files\Java\${jdk_root_dir}" >> ${env:GITHUB_ENV} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied the JDK setup from the Cosmos DB job, which runs on Windows too.
SQL_SERVER_PASSWORD=$1 | ||
MAX_RETRY_COUNT=$2 | ||
RETRY_INTERVAL=$3 | ||
# If set, use `sqlcmd` of the SQL Server docker container. If unset, use `sqlcmd` installed on the host. | ||
SQL_SERVER_CONTAINER_NAME=${4:-''} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the order of the script input parameters to set the $SQL_SERVER_CONTAINER_NAME
last, since it is now optional.
@@ -638,7 +638,7 @@ public void scan_ScanGivenForPreparedWhenCoordinatorStateAborted_ShouldRollback( | |||
selection_SelectionGivenForPreparedWhenCoordinatorStateNotExistAndExpired_ShouldAbortTransaction( | |||
Selection s) throws ExecutionException, CoordinatorException, TransactionException { | |||
// Arrange | |||
long prepared_at = System.currentTimeMillis() - RecoveryHandler.TRANSACTION_LIFETIME_MILLIS; | |||
long prepared_at = System.currentTimeMillis() - RecoveryHandler.TRANSACTION_LIFETIME_MILLIS - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Windows runner, failures occurred for integration tests that verify the behavior of reading a prepared record without a corresponding transaction status in the coordinator table, which should roll back the transaction if it is expired.
Without the fix above, the transaction was considered not to have expired in some cases by the RecoveryHandler
triggered after the prepared record was read.
scalardb/core/src/main/java/com/scalar/db/transaction/consensuscommit/RecoveryHandler.java
Lines 95 to 107 in a98b694
private void abortIfExpired(Selection selection, TransactionResult result) { | |
long current = System.currentTimeMillis(); | |
if (current <= result.getPreparedAt() + TRANSACTION_LIFETIME_MILLIS) { | |
return; | |
} | |
try { | |
coordinator.putStateForLazyRecoveryRollback(result.getId()); | |
rollbackRecord(selection, result); | |
} catch (CoordinatorException e) { | |
logger.warn("Coordinator tries to abort {}, but failed", result.getId(), e); | |
} | |
} |
This happened because System.currentTimeInMillis()
sometimes returns the same timestamp as the previous call on this line, which set the prepared_at
value for the record.
Line 641 in cb170fd
long prepared_at = System.currentTimeMillis() - RecoveryHandler.TRANSACTION_LIFETIME_MILLIS - 1; |
So, in RecoveryHandler.abortIfExpired()
, the if (current <= result.getPreparedAt() + TRANSACTION_LIFETIME_MILLIS) {
condition was true and the transaction was considered as not expired, so not a target for abort.
The same timestamp value is returned because of a relative lack of system clock precision that we don't observe on Ubuntu. Increasing the system clock precision may be possible( see this documentation), but it is not trivial, so we updated the integration test code instead.
runs-on: windows-latest | ||
env: | ||
# This variable evaluates to: if {!(Temurin JDK 8) && !(Oracle JDK 8 or 11)} then {true} else {false} | ||
SET_UP_INT_TEST_RUNTIME_JDK_WHEN_NOT_ORACLE_8_OR_11: "${{ (github.event_name == 'workflow_dispatch' && !(inputs.INT_TEST_JAVA_RUNTIME_VERSION == '8' && inputs.INT_TEST_JAVA_RUNTIME_VENDOR == 'temurin') && !(inputs.INT_TEST_JAVA_RUNTIME_VENDOR == 'oracle' && (inputs.INT_TEST_JAVA_RUNTIME_VERSION == '8' || inputs.INT_TEST_JAVA_RUNTIME_VERSION == '11'))) && 'true' || 'false' }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[very trivial] The order of the first condition is VERSION
and then VENDOR
, but the order of the second condition is VENDER
and then VERSION
. It might be better if they're consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Torch3333 Could you add a comment to explain why this condition is needed instead of SET_UP_INT_TEST_RUNTIME_NON_ORACLE_JDK
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed both, thank you.
407c806
Please refer to this PR #1978 description for more details about the special handling for Oracle JDK 8 and 11 on Windows.
Triggering the CI with Oracle JDK 8 or 11 to run the integration test currently fails because these versions are unavailable with the Setup Java Github action and more generally through public direct download. To retrieve these JDK versions, you must create a free Oracle account and accept the end-user license agreement.
For these reasons, we need to store these versions internally so that our CI can install them when required.
To store the JDK installer, we decided to package it as Docker image and publish it on the Scalar Github registry. The advantage is that it is cost-effective and we don't need to manage additional credentials compared to a storage option on AWS or Azure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
45ac60d
to
407c806
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!!
# Conflicts: # .github/workflows/ci.yaml # integration-test/src/main/java/com/scalar/db/transaction/consensuscommit/ConsensusCommitSpecificIntegrationTestBase.java
Description
The Ubuntu 20.04 Github action runner was used to run SQL Server 2017 CI job, but this image is no longer available.
Also, the docker container for SQL Server 2017 is incompatible with other Ubuntu action runners (versions 22.04 and 24.04), so we switched to using a Windows Server runner instead.
Related issues and/or PRs
N/A
Changes made
Update the CI and make some changes to some integration tests. See the review comments for more details.
Checklist
Additional notes (optional)
N/A
Release notes
N/A