Skip to content

Features/hpcdatamgm 2088 - DON'T MERGE#142

Open
eranrosenberg wants to merge 18 commits intoreleases/3.23.0from
features/HPCDATAMGM-2088
Open

Features/hpcdatamgm 2088 - DON'T MERGE#142
eranrosenberg wants to merge 18 commits intoreleases/3.23.0from
features/HPCDATAMGM-2088

Conversation

@eranrosenberg
Copy link
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings February 20, 2026 14:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces initial support for “auto-tiering” migrations, adding new API/scheduler entry points, schema + DB changes, and a DAO intended to query external archives for inactive files.

Changes:

  • Add auto-tiering REST endpoint and scheduled tasks to create/process bulk auto-tiering migration tasks.
  • Extend domain schemas/DB migrations and DAO logic for new auto-tiering-related fields/status/type.
  • Update S3 path-style access provider list to include VAST_NCIVS.

Reviewed changes

Copilot reviewed 23 out of 28 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
utils/hpc-client/lib/hpc-common-3.23.0.jar Adds a compiled client JAR artifact.
src/hpc-server/hpc-ws-rs-impl/src/main/resources/WEB-INF/spring/hpc-server.properties Adds VAST_NCIVS to path-style enabled S3 providers.
src/hpc-server/hpc-ws-rs-impl/src/main/java/gov/nih/nci/hpc/ws/rs/impl/HpcDataMigrationRestServiceImpl.java Implements the new processAutoTiering REST method.
src/hpc-server/hpc-ws-rs-api/src/main/java/gov/nih/nci/hpc/ws/rs/HpcDataMigrationRestService.java Adds /autoTiering/{configurationId} endpoint definition.
src/hpc-server/hpc-scheduler/src/main/resources/logback.xml Changes scheduler log path property from catalina.base to catalina.home.
src/hpc-server/hpc-scheduler/src/main/resources/WEB-INF/spring/hpc-scheduler.properties Adds VAST_NCIVS to path-style enabled S3 providers.
src/hpc-server/hpc-scheduler-migration/src/main/resources/logback.xml Changes migration-scheduler log path property from catalina.base to catalina.home.
src/hpc-server/hpc-scheduler-migration/src/main/resources/WEB-INF/spring/hpc-scheduler-migration.properties Sets migration serverIds/serverId, adds cron entries, updates S3 providers list.
src/hpc-server/hpc-scheduler-migration/src/main/java/gov/nih/nci/hpc/scheduler/migration/impl/HpcScheduledTasksImpl.java Adds scheduled executions for auto-tiering and bulk auto-tiering processing.
src/hpc-server/hpc-domain-types/src/main/resources/schema/HpcDataMigrationTypes.xsd Adds new migration status/type enum values.
src/hpc-server/hpc-domain-model/src/main/resources/schema/HpcDataMigration.xsd Removes stray whitespace.
src/hpc-server/hpc-domain-model/src/main/resources/schema/HpcDataManagement.xsd Adds new auto-tiering-related configuration fields to the schema.
src/hpc-server/hpc-dao-impl/src/main/scripts/migration/release-3.23.0/hpc_auto_tiering.sql Adds DB columns for auto-tiering configuration + migration task source-location fields.
src/hpc-server/hpc-dao-impl/src/main/resources/META-INF/spring/hpc-dao-impl-beans-configuration.xml Registers hpcExternalArchiveDAO bean.
src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/trino/impl/HpcExternalArchiveDAOImpl.java Introduces a Trino-backed DAO stub for external archive queries (currently dummy).
src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/oracle/impl/HpcDataMigrationDAOImpl.java Extends migration task/task-result SQL and task mapping for new columns/statuses.
src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/oracle/impl/HpcDataManagementConfigurationDAOImpl.java Maps new S3 auto-tiering DB columns onto transfer configuration objects.
src/hpc-server/hpc-dao-api/src/main/java/gov/nih/nci/hpc/dao/HpcExternalArchiveDAO.java Adds external archive DAO interface.
src/hpc-server/hpc-bus-service-impl/src/main/java/gov/nih/nci/hpc/bus/impl/HpcDataMigrationBusServiceImpl.java Adds bus-service logic to create and process bulk auto-tiering tasks.
src/hpc-server/hpc-bus-service-api/src/main/java/gov/nih/nci/hpc/bus/HpcDataMigrationBusService.java Adds new bus-service API methods for auto-tiering.
src/hpc-server/hpc-app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataMigrationServiceImpl.java Adds auto-tiering task creation + “files not accessed” lookup via external archive DAO.
src/hpc-server/hpc-app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataManagementServiceImpl.java Adds helper to list auto-tiering-enabled configurations.
src/hpc-server/hpc-app-service-api/src/main/java/gov/nih/nci/hpc/service/HpcDataMigrationService.java Extends service API for auto-tiering and new task-creation signature.
src/hpc-server/hpc-app-service-api/src/main/java/gov/nih/nci/hpc/service/HpcDataManagementService.java Adds API to list auto-tiering-enabled data management configurations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +59 to +65
// TODO: Implement actual query logic using Trino to scan external archive
// For now, return dummy paths for testing
List<String> filePaths = new ArrayList<>();
filePaths.add("/Eran/eran-file-1.txt");
filePaths.add("/Eran/eran-file-2.txt");
filePaths.add("/Eran/eran-file-3.txt");

Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This DAO currently returns hard-coded dummy file paths and contains a TODO for the real Trino query. Shipping placeholder data here will trigger auto-tiering migrations for non-existent paths and makes the feature behavior incorrect in production. Please implement the real query logic (or gate this behind a feature flag) and remove the hard-coded test paths before merging.

Suggested change
// TODO: Implement actual query logic using Trino to scan external archive
// For now, return dummy paths for testing
List<String> filePaths = new ArrayList<>();
filePaths.add("/Eran/eran-file-1.txt");
filePaths.add("/Eran/eran-file-2.txt");
filePaths.add("/Eran/eran-file-3.txt");
// TODO: Implement actual query logic using Trino to scan the external archive and
// populate filePaths with files not accessed within the given time period.
List<String> filePaths = new ArrayList<>();
// Optional: allow dummy data for testing when explicitly enabled via a feature flag.
// This flag must NOT be enabled in production.
boolean useDummyData = Boolean.getBoolean("hpc.externalArchive.dummyData.enabled");
if (useDummyData) {
logger.warn("Using dummy external archive file paths; feature flag 'hpc.externalArchive.dummyData.enabled' is enabled. "
+ "Ensure this is disabled in production.");
filePaths.add("/Eran/eran-file-1.txt");
filePaths.add("/Eran/eran-file-2.txt");
filePaths.add("/Eran/eran-file-3.txt");
} else {
logger.warn("Trino-backed external archive query is not yet implemented; returning empty file list.");
}

Copilot uses AI. Check for mistakes.
Comment on lines 451 to +455
dataMigrationTask.getSize(), dataMigrationTask.getRetryTaskId(), dataMigrationTask.getRetryUserId(),
dataMigrationTask.getMetadataFromArchiveFileContainerId(),
dataMigrationTask.getMetadataToArchiveFileContainerId(),
dataMigrationTask.getMetadataArchiveFileIdPattern(), dataMigrationTask.getId(),
dataMigrationTask.getMetadataArchiveFileIdPattern(),
dataMigrationTask.getId(),
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPSERT_DATA_MIGRATION_TASK_RESULT_SQL now includes AUTO_TIERING_* placeholders, but this jdbcTemplate.update(...) call does not supply corresponding parameters for those new columns (in either the UPDATE or INSERT parameter sets). This will cause a runtime SQL parameter count mismatch; add the missing AUTO_TIERING_* arguments (or remove the placeholders from the SQL if not needed).

Copilot uses AI. Check for mistakes.
Comment on lines +173 to +174
migrationTask.setStatus(autoTieringRequest ? HpcDataMigrationStatus.AUTO_TIERING_RECEIVED :
HpcDataMigrationStatus.RECEIVED);
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-tiering tasks are being created with status AUTO_TIERING_RECEIVED, but the migration processing flows shown in this PR only poll/process status RECEIVED (no code paths query for AUTO_TIERING_RECEIVED). As a result, these tasks can be assigned to a server but never picked up for processing. Either keep the status as RECEIVED for data-object migration tasks, or add corresponding processing/queries for AUTO_TIERING_RECEIVED.

Suggested change
migrationTask.setStatus(autoTieringRequest ? HpcDataMigrationStatus.AUTO_TIERING_RECEIVED :
HpcDataMigrationStatus.RECEIVED);
migrationTask.setStatus(HpcDataMigrationStatus.RECEIVED);

Copilot uses AI. Check for mistakes.
type="hpc-domain-datatransfer:HpcDataTransferType" />
<xsd:element name="s3UploadConfigurationId" type="xsd:string" />
<xsd:element name="s3DefaultDownloadConfigurationId" type="xsd:string" />
<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" />
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3AutoTieringConfigurationId is introduced as a required element (no minOccurs=0), but the code enables auto-tiering only when this ID is present/non-empty. Making it required breaks existing configurations that don't use auto-tiering. Mark it optional (minOccurs=0) or version the schema/DTOs to preserve backward compatibility.

Suggested change
<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" />
<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" minOccurs="0" />

Copilot uses AI. Check for mistakes.
<configuration>

<property name="SCHEDULER_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler.log"/>
<property name="SCHEDULER_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler.log"/>
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The active log file path uses ${catalina.home} while the rolling policy still uses ${catalina.base} in fileNamePattern, which will split current vs rolled logs across different directories. Use the same base variable for both (typically ${catalina.base}) to keep logs/rotation consistent per Tomcat instance.

Suggested change
<property name="SCHEDULER_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler.log"/>
<property name="SCHEDULER_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler.log"/>

Copilot uses AI. Check for mistakes.
<configuration>

<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler-migration.log"/>
<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler-migration.log"/>
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The active log file path uses ${catalina.home} while the rolling policy still uses ${catalina.base} in fileNamePattern, which will split current vs rolled logs across different directories. Use the same base variable for both (typically ${catalina.base}) to keep logs/rotation consistent per Tomcat instance.

Suggested change
<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler-migration.log"/>
<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler-migration.log"/>

Copilot uses AI. Check for mistakes.
Comment on lines 69 to 72
+ "TO_S3_ARCHIVE_LOCATION_FILE_CONTAINER_ID = ?, TO_S3_ARCHIVE_LOCATION_FILE_ID = ?, SERVER_ID = ?, ALIGN_ARCHIVE_PATH = ?, DATA_SIZE = ?, "
+ "RETRY_TASK_ID = ?, RETRY_USER_ID = ?, METADATA_FROM_ARCHIVE_FILE_CONTAINER_ID = ?, "
+ "METADATA_TO_ARCHIVE_FILE_CONTAINER_ID = ?, METADATA_ARCHIVE_FILE_ID_PATTERN = ? "
+ "METADATA_TO_ARCHIVE_FILE_CONTAINER_ID = ?, METADATA_ARCHIVE_FILE_ID_PATTERN = ?, AUTO_TIERING_FILE_CONTAINER_ID = ?, AUTO_TIERING_FILE_ID = ? "
+ "when not matched then insert (ID, PARENT_ID, USER_ID, PATH, CONFIGURATION_ID, FROM_S3_ARCHIVE_CONFIGURATION_ID, "
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPSERT_DATA_MIGRATION_TASK_RESULT_SQL references AUTO_TIERING_FILE_CONTAINER_ID and AUTO_TIERING_FILE_ID columns. No migration in this PR adds these columns to HPC_DATA_MIGRATION_TASK_RESULT (and release-3.23.0/hpc_auto_tiering.sql only alters HPC_DATA_MIGRATION_TASK). Align the SQL with actual schema changes, or add the missing columns/migration.

Copilot uses AI. Check for mistakes.
s3Configuration.setExternalStorage(rs.getBoolean("EXTERNAL_STORAGE"));
s3Configuration.setPosixPath(rs.getString("POSIX_PATH"));
s3Configuration.setAutoTieringSearchPath(rs.getString("AUTO_TIERING_SEARCH_PATH"));
s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AUTO_TIERING_INACTIVITY_MONTHS is nullable (added without NOT NULL), but this mapper uses rs.getInt(...), which returns 0 when the DB value is NULL. That makes it impossible to distinguish “unset” from an actual 0-month value and can cause auto-tiering to proceed with an unintended default. Use rs.getObject(..., Integer.class) or rs.getInt + rs.wasNull() to preserve nulls.

Suggested change
s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));
s3Configuration.setAutoTieringInactivityMonths(
rs.getObject("AUTO_TIERING_INACTIVITY_MONTHS", Integer.class));

Copilot uses AI. Check for mistakes.
<xsd:element name="externalStorage" type="xsd:boolean" />
<xsd:element name="posixPath" type="xsd:string" minOccurs="0"/>
<xsd:element name="posixPath" type="xsd:string" />
<xsd:element name="autoTieringSearchPath" type="xsd:string" />
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoTieringSearchPath is added as a required element (no minOccurs=0), but the DB migration adds AUTO_TIERING_SEARCH_PATH as nullable and the service treats auto-tiering as an optional feature. Keeping this required is a breaking schema change for existing configurations/clients; make it optional (minOccurs=0) or version the schema.

Suggested change
<xsd:element name="autoTieringSearchPath" type="xsd:string" />
<xsd:element name="autoTieringSearchPath" type="xsd:string" minOccurs="0" />

Copilot uses AI. Check for mistakes.
Comment on lines +189 to 190
hpc.service.dataMigration.serverIds=server-id
hpc.service.serverId=server-id
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This properties file now hard-codes hpc.service.dataMigration.serverIds=server-id, while the other shipped properties (e.g., hpc-server.properties / hpc-scheduler.properties) leave serverIds empty for environment-specific configuration. Hard-coding a placeholder here can cause migrations to be assigned incorrectly in non-dev deployments; consider keeping it empty and documenting/overriding via ${catalina.base}/conf instead.

Suggested change
hpc.service.dataMigration.serverIds=server-id
hpc.service.serverId=server-id
# Server IDs are intentionally left empty for environment-specific configuration.
# Override these values via ${catalina.base}/conf in each deployment.
hpc.service.dataMigration.serverIds=
hpc.service.serverId=

Copilot uses AI. Check for mistakes.
dinhys and others added 10 commits February 27, 2026 09:04
…hpc/integration/irods/impl/HpcDataManagementProxyImpl.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
HPCDATAMGM-2166: Collections gets registered sometimes without system metadata
# Conflicts:
#	utils/hpc-client/hpc-cli-3.24.0.jar
#	utils/hpc-client/lib/hpc-common-3.24.0.jar
#	utils/hpc-client/lib/hpc-domain-types-3.24.0.jar
#	utils/hpc-client/lib/hpc-dto-3.24.0.jar
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 69 out of 74 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

<xsd:restriction base="xsd:string">
<xsd:enumeration value="RECEIVED" />
<xsd:enumeration value="IN_PROGRESS" />
<xsd:enumeration value="AUTO_TIERING_RECEIVED" />
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AUTO_TIERING_RECEIVED status is added as a new migration status in HpcDataMigrationTypes.xsd and is used to set the status for auto-tiering data object tasks in createDataObjectMigrationTask. However, the getDataMigrationTasks method in HpcDataMigrationServiceImpl uses a SQL query that fetches tasks by STATUS = ? with a single status value. The processBulkAutoTieringMigrationReceived scheduler method queries with HpcDataMigrationStatus.RECEIVED and type BULK_AUTO_TIERING, which would not include AUTO_TIERING_RECEIVED tasks. It is unclear whether AUTO_TIERING_RECEIVED individual data-object tasks are ever picked up and processed — there is no handler for AUTO_TIERING_RECEIVED data-object tasks in any of the processXxx methods.

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +109
private void dbConnect() throws HpcException {
try {
jdbcTemplate.getDataSource().getConnection();

} catch (Exception e) {
throw new HpcException("Failed to connect to Trino DB. Check connection & credentials config",
HpcErrorType.DATABASE_ERROR, HpcIntegratedSystem.VAST, e);
}
}
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dbConnect() method calls jdbcTemplate.getDataSource().getConnection() without closing the returned connection. This causes a connection leak every time the bean is initialized. The connection should be closed in a finally block or use a try-with-resources statement. Note that the existing Oracle DAO HpcDataManagementConfigurationDAOImpl.dbConnect() has the same pattern, but since this is new code it should be written correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +281 to +284
hpc.dao.trino.host=trino.fairway.nci.nih.gov
hpc.dao.trino.port=443
hpc.dao.trino.username=ncif-hpcdm-svc
hpc.dao.trino.password=< Configure Me >
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The property file contains a literal placeholder password < Configure Me > committed to the repository. Credentials should never be stored in property files in version control, not even as placeholders. This file should reference an environment variable, a vault secret, or an external configuration store. Additionally, this is a template-like properties file checked in for a specific environment, which is a broader concern.

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +59
<xsd:element name="posixPath" type="xsd:string" />
<xsd:element name="autoTieringSearchPath" type="xsd:string" />
<xsd:element name="autoTieringInactivityMonths" type="xsd:int" minOccurs="0" />
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The posixPath field in the HpcS3ArchiveConfiguration complex type was changed from minOccurs="0" (optional) to required (no minOccurs attribute, which defaults to 1). This is a breaking schema change: any existing XML documents that omit posixPath will now fail validation, and any existing configurations in the database that do not have a posixPath set will cause deserialization failures at runtime. If this field is truly now required for all S3 archive configurations, existing database records and any client-side DTOs that build these objects must be updated accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +564 to +579
public void processAutoTiering() throws HpcException {
// obtain a list of HPC data management configurations which are to be included in auto-tiering scan and
// create a task for each.
dataManagementService.getAutoTieringDataManagementConfigurations().forEach(dataManagementConfiguration -> {
try {
logger.info("Create bulk auto-tiering task for data management configuration: {}",
dataManagementConfiguration.getId());
dataMigrationService.createBulkAutoTieringTask(dataManagementConfiguration.getId(),
dataManagementConfiguration.getS3UploadConfigurationId(),
dataManagementConfiguration.getS3AutoTieringConfigurationId(), null);

} catch (HpcException e) {
logger.error("Failed to process auto-tiering for data management configuration: {}",
dataManagementConfiguration.getId(), e);
}
});
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The processAutoTiering() (no-arg, called by the scheduler) uses dataManagementConfiguration.getS3UploadConfigurationId() as the fromS3ArchiveConfigurationId when calling createBulkAutoTieringTask. However, for auto-tiering the source is the external VAST archive, not the S3 upload configuration. The upload configuration ID represents where files are uploaded to, not the external source being scanned. The fromS3ArchiveConfigurationId argument in createBulkAutoTieringTask should probably be the same configuration that has autoTieringSearchPath and autoTieringInactivityMonths defined, which is the auto-tiering source configuration, not the upload configuration.

Copilot uses AI. Check for mistakes.
// Get data objects eligible for auto-tiering and create individual migration tasks
processBulkAutoTieringMigration(bulkAutoTieringTask);

// Mark the bulk metadata update task - in-progress
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 635 says "Mark the bulk metadata update task - in-progress" but the code is actually processing a bulk auto-tiering task, not a bulk metadata update task. This is a copy-paste error from the processBulkMetadataUpdateMigrationReceived method.

Suggested change
// Mark the bulk metadata update task - in-progress
// Mark the bulk auto-tiering task - in-progress

Copilot uses AI. Check for mistakes.
bulkAutoTieringTask.getFromS3ArchiveConfigurationId(),
bulkAutoTieringTask.getToS3ArchiveConfigurationId(),
bulkAutoTieringTask.getId(), bulkAutoTieringTask.getAlignArchivePath(),
null, null, bulkAutoTieringTask.getRetryUserId(),
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In processBulkAutoTieringMigration, bulkAutoTieringTask.getRetryUserId() is passed as the retryUserId parameter to createDataObjectMigrationTask, but for a new auto-tiering task (not a retry), this value would typically be null. However, retryTaskId is passed as null while retryUserId is taken from the bulk task's retryUserId. This is inconsistent — if retryTaskId is null, retryUserId should also be null. This looks like a copy-paste mistake.

Suggested change
null, null, bulkAutoTieringTask.getRetryUserId(),
null, null, null,

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +81
return jdbcTemplate.queryForList(
GET_FILES_NOT_ACCESSED_SQL.replace("{months}", monthsNotAccessed.toString()),
String.class, searchPath, searchPath);
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL query substitutes {months} using plain string replacement (GET_FILES_NOT_ACCESSED_SQL.replace("{months}", monthsNotAccessed.toString())). Even though monthsNotAccessed is an Integer (so there is no direct SQL injection risk from that field), the substitution bypasses JDBC parameterized query handling. Trino's JDBC driver may support parameterized INTERVAL expressions; if so, this should use a proper bind parameter instead of string concatenation. At minimum, a comment should explain why string substitution is used here rather than a bind parameter.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +42
--
-- hpc_auto_tiering.sql
--
-- Copyright SVG, Inc.
-- Copyright Leidos Biomedical Research, Inc
--
-- Distributed under the OSI-approved BSD 3-Clause License.
-- See http://ncip.github.com/HPC/LICENSE.txt for details.
--
--
-- @author <a href="mailto:rosenbergea@nih.gov">Eran Rosenberg</a>
--

-- HPC_S3_ARCHIVE_CONFIGURATION
ALTER TABLE HPC_S3_ARCHIVE_CONFIGURATION add (
AUTO_TIERING_SEARCH_PATH VARCHAR2(2700)
);
COMMENT ON COLUMN HPC_S3_ARCHIVE_CONFIGURATION.AUTO_TIERING_SEARCH_PATH IS 'The search path for auto-tiering external archive files';

ALTER TABLE HPC_S3_ARCHIVE_CONFIGURATION add (
AUTO_TIERING_INACTIVITY_MONTHS NUMBER(10)
);
COMMENT ON COLUMN HPC_S3_ARCHIVE_CONFIGURATION.AUTO_TIERING_INACTIVITY_MONTHS IS 'The inactivity period in months before files are auto-tiered to Glacier Deep Archive';

-- HPC_DATA_MANAGEMENT_CONFIGURATION
ALTER TABLE HPC_DATA_MANAGEMENT_CONFIGURATION add (
S3_AUTO_TIERING_ARCHIVE_CONFIGURATION_ID VARCHAR2(50)
);
COMMENT ON COLUMN HPC_DATA_MANAGEMENT_CONFIGURATION.S3_AUTO_TIERING_ARCHIVE_CONFIGURATION_ID IS 'The S3 archive configuration ID for auto-tiering external archive into';

-- HPC_DATA_MIGRATION_TASK
ALTER TABLE HPC_DATA_MIGRATION_TASK add (
FROM_S3_ARCHIVE_LOCATION_FILE_CONTAINER_ID VARCHAR2(50)
);
COMMENT ON COLUMN HPC_DATA_MIGRATION_TASK.FROM_S3_ARCHIVE_LOCATION_FILE_CONTAINER_ID IS 'The archive file container ID (bucket) migrating from';

ALTER TABLE HPC_DATA_MIGRATION_TASK add (
FROM_S3_ARCHIVE_LOCATION_FILE_ID VARCHAR2(2700)
);
COMMENT ON COLUMN HPC_DATA_MIGRATION_TASK.FROM_S3_ARCHIVE_LOCATION_FILE_ID IS 'The archive file ID (object ID) migrating from';


Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The identical SQL migration script hpc_auto_tiering.sql has been added to both release-3.23.0/ and release-3.24.0/. Only release-3.24.0/ should contain this script, since the feature is part of version 3.24.0. Including it under release-3.23.0/ is likely a mistake and would cause the DDL to be applied twice if both release directories are executed during an upgrade from 3.22.x to 3.24.0, leading to "column already exists" errors.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants