Features/hpcdatamgm 2088 - DON'T MERGE by eranrosenberg · Pull Request #142 · CBIIT/HPC_DME_APIs

eranrosenberg · 2026-02-20T14:40:32Z

No description provided.

# Conflicts: # src/hpc-server/hpc-scheduler-migration/src/main/resources/WEB-INF/spring/hpc-scheduler-migration.properties

… for a given config id

Copilot

Pull request overview

This PR introduces initial support for “auto-tiering” migrations, adding new API/scheduler entry points, schema + DB changes, and a DAO intended to query external archives for inactive files.

Changes:

Add auto-tiering REST endpoint and scheduled tasks to create/process bulk auto-tiering migration tasks.
Extend domain schemas/DB migrations and DAO logic for new auto-tiering-related fields/status/type.
Update S3 path-style access provider list to include VAST_NCIVS.

Reviewed changes

Copilot reviewed 23 out of 28 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
utils/hpc-client/lib/hpc-common-3.23.0.jar	Adds a compiled client JAR artifact.
src/hpc-server/hpc-ws-rs-impl/src/main/resources/WEB-INF/spring/hpc-server.properties	Adds `VAST_NCIVS` to path-style enabled S3 providers.
src/hpc-server/hpc-ws-rs-impl/src/main/java/gov/nih/nci/hpc/ws/rs/impl/HpcDataMigrationRestServiceImpl.java	Implements the new `processAutoTiering` REST method.
src/hpc-server/hpc-ws-rs-api/src/main/java/gov/nih/nci/hpc/ws/rs/HpcDataMigrationRestService.java	Adds `/autoTiering/{configurationId}` endpoint definition.
src/hpc-server/hpc-scheduler/src/main/resources/logback.xml	Changes scheduler log path property from `catalina.base` to `catalina.home`.
src/hpc-server/hpc-scheduler/src/main/resources/WEB-INF/spring/hpc-scheduler.properties	Adds `VAST_NCIVS` to path-style enabled S3 providers.
src/hpc-server/hpc-scheduler-migration/src/main/resources/logback.xml	Changes migration-scheduler log path property from `catalina.base` to `catalina.home`.
src/hpc-server/hpc-scheduler-migration/src/main/resources/WEB-INF/spring/hpc-scheduler-migration.properties	Sets migration serverIds/serverId, adds cron entries, updates S3 providers list.
src/hpc-server/hpc-scheduler-migration/src/main/java/gov/nih/nci/hpc/scheduler/migration/impl/HpcScheduledTasksImpl.java	Adds scheduled executions for auto-tiering and bulk auto-tiering processing.
src/hpc-server/hpc-domain-types/src/main/resources/schema/HpcDataMigrationTypes.xsd	Adds new migration status/type enum values.
src/hpc-server/hpc-domain-model/src/main/resources/schema/HpcDataMigration.xsd	Removes stray whitespace.
src/hpc-server/hpc-domain-model/src/main/resources/schema/HpcDataManagement.xsd	Adds new auto-tiering-related configuration fields to the schema.
src/hpc-server/hpc-dao-impl/src/main/scripts/migration/release-3.23.0/hpc_auto_tiering.sql	Adds DB columns for auto-tiering configuration + migration task source-location fields.
src/hpc-server/hpc-dao-impl/src/main/resources/META-INF/spring/hpc-dao-impl-beans-configuration.xml	Registers `hpcExternalArchiveDAO` bean.
src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/trino/impl/HpcExternalArchiveDAOImpl.java	Introduces a Trino-backed DAO stub for external archive queries (currently dummy).
src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/oracle/impl/HpcDataMigrationDAOImpl.java	Extends migration task/task-result SQL and task mapping for new columns/statuses.
src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/oracle/impl/HpcDataManagementConfigurationDAOImpl.java	Maps new S3 auto-tiering DB columns onto transfer configuration objects.
src/hpc-server/hpc-dao-api/src/main/java/gov/nih/nci/hpc/dao/HpcExternalArchiveDAO.java	Adds external archive DAO interface.
src/hpc-server/hpc-bus-service-impl/src/main/java/gov/nih/nci/hpc/bus/impl/HpcDataMigrationBusServiceImpl.java	Adds bus-service logic to create and process bulk auto-tiering tasks.
src/hpc-server/hpc-bus-service-api/src/main/java/gov/nih/nci/hpc/bus/HpcDataMigrationBusService.java	Adds new bus-service API methods for auto-tiering.
src/hpc-server/hpc-app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataMigrationServiceImpl.java	Adds auto-tiering task creation + “files not accessed” lookup via external archive DAO.
src/hpc-server/hpc-app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataManagementServiceImpl.java	Adds helper to list auto-tiering-enabled configurations.
src/hpc-server/hpc-app-service-api/src/main/java/gov/nih/nci/hpc/service/HpcDataMigrationService.java	Extends service API for auto-tiering and new task-creation signature.
src/hpc-server/hpc-app-service-api/src/main/java/gov/nih/nci/hpc/service/HpcDataManagementService.java	Adds API to list auto-tiering-enabled data management configurations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-20T14:49:22Z

...ver/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/trino/impl/HpcExternalArchiveDAOImpl.java

+		// TODO: Implement actual query logic using Trino to scan external archive
+		// For now, return dummy paths for testing
+		List<String> filePaths = new ArrayList<>();
+		filePaths.add("/Eran/eran-file-1.txt");
+		filePaths.add("/Eran/eran-file-2.txt");
+		filePaths.add("/Eran/eran-file-3.txt");
+


This DAO currently returns hard-coded dummy file paths and contains a TODO for the real Trino query. Shipping placeholder data here will trigger auto-tiering migrations for non-existent paths and makes the feature behavior incorrect in production. Please implement the real query logic (or gate this behind a feature flag) and remove the hard-coded test paths before merging.

Suggested change

// TODO: Implement actual query logic using Trino to scan external archive

// For now, return dummy paths for testing

List<String> filePaths = new ArrayList<>();

filePaths.add("/Eran/eran-file-1.txt");

filePaths.add("/Eran/eran-file-2.txt");

filePaths.add("/Eran/eran-file-3.txt");

// TODO: Implement actual query logic using Trino to scan the external archive and

// populate filePaths with files not accessed within the given time period.

List<String> filePaths = new ArrayList<>();

// Optional: allow dummy data for testing when explicitly enabled via a feature flag.

// This flag must NOT be enabled in production.

boolean useDummyData = Boolean.getBoolean("hpc.externalArchive.dummyData.enabled");

if (useDummyData) {

logger.warn("Using dummy external archive file paths; feature flag 'hpc.externalArchive.dummyData.enabled' is enabled. "

+ "Ensure this is disabled in production.");

filePaths.add("/Eran/eran-file-1.txt");

filePaths.add("/Eran/eran-file-2.txt");

filePaths.add("/Eran/eran-file-3.txt");

} else {

logger.warn("Trino-backed external archive query is not yet implemented; returning empty file list.");

}

Copilot · 2026-02-20T14:49:22Z

...rver/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/oracle/impl/HpcDataMigrationDAOImpl.java

 					dataMigrationTask.getSize(), dataMigrationTask.getRetryTaskId(), dataMigrationTask.getRetryUserId(),
 					dataMigrationTask.getMetadataFromArchiveFileContainerId(),
 					dataMigrationTask.getMetadataToArchiveFileContainerId(),
-					dataMigrationTask.getMetadataArchiveFileIdPattern(), dataMigrationTask.getId(),
+					dataMigrationTask.getMetadataArchiveFileIdPattern(),
+					dataMigrationTask.getId(),


UPSERT_DATA_MIGRATION_TASK_RESULT_SQL now includes AUTO_TIERING_* placeholders, but this jdbcTemplate.update(...) call does not supply corresponding parameters for those new columns (in either the UPDATE or INSERT parameter sets). This will cause a runtime SQL parameter count mismatch; add the missing AUTO_TIERING_* arguments (or remove the placeholders from the SQL if not needed).

Copilot · 2026-02-20T14:49:23Z

...app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataMigrationServiceImpl.java

+		migrationTask.setStatus(autoTieringRequest ? HpcDataMigrationStatus.AUTO_TIERING_RECEIVED :
+		                                             HpcDataMigrationStatus.RECEIVED);


Auto-tiering tasks are being created with status AUTO_TIERING_RECEIVED, but the migration processing flows shown in this PR only poll/process status RECEIVED (no code paths query for AUTO_TIERING_RECEIVED). As a result, these tasks can be assigned to a server but never picked up for processing. Either keep the status as RECEIVED for data-object migration tasks, or add corresponding processing/queries for AUTO_TIERING_RECEIVED.

Suggested change

migrationTask.setStatus(autoTieringRequest ? HpcDataMigrationStatus.AUTO_TIERING_RECEIVED :

HpcDataMigrationStatus.RECEIVED);

migrationTask.setStatus(HpcDataMigrationStatus.RECEIVED);

Copilot · 2026-02-20T14:49:23Z

src/hpc-server/hpc-domain-model/src/main/resources/schema/HpcDataManagement.xsd

 					type="hpc-domain-datatransfer:HpcDataTransferType" />
 				<xsd:element name="s3UploadConfigurationId" type="xsd:string" />
 				<xsd:element name="s3DefaultDownloadConfigurationId" type="xsd:string" />
+				<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" />


s3AutoTieringConfigurationId is introduced as a required element (no minOccurs=0), but the code enables auto-tiering only when this ID is present/non-empty. Making it required breaks existing configurations that don't use auto-tiering. Mark it optional (minOccurs=0) or version the schema/DTOs to preserve backward compatibility.

Suggested change

<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" />

<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" minOccurs="0" />

Copilot · 2026-02-20T14:49:23Z

src/hpc-server/hpc-scheduler/src/main/resources/logback.xml

 <configuration>

-    <property name="SCHEDULER_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler.log"/>
+    <property name="SCHEDULER_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler.log"/>


The active log file path uses ${catalina.home} while the rolling policy still uses ${catalina.base} in fileNamePattern, which will split current vs rolled logs across different directories. Use the same base variable for both (typically ${catalina.base}) to keep logs/rotation consistent per Tomcat instance.

Suggested change

<property name="SCHEDULER_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler.log"/>

<property name="SCHEDULER_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler.log"/>

Copilot · 2026-02-20T14:49:24Z

src/hpc-server/hpc-scheduler-migration/src/main/resources/logback.xml

 <configuration>

-    <property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler-migration.log"/>
+    <property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler-migration.log"/>


The active log file path uses ${catalina.home} while the rolling policy still uses ${catalina.base} in fileNamePattern, which will split current vs rolled logs across different directories. Use the same base variable for both (typically ${catalina.base}) to keep logs/rotation consistent per Tomcat instance.

Suggested change

<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler-migration.log"/>

<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler-migration.log"/>

Copilot · 2026-02-20T14:49:24Z

...rver/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/oracle/impl/HpcDataMigrationDAOImpl.java

 			+ "TO_S3_ARCHIVE_LOCATION_FILE_CONTAINER_ID = ?, TO_S3_ARCHIVE_LOCATION_FILE_ID = ?, SERVER_ID = ?, ALIGN_ARCHIVE_PATH = ?, DATA_SIZE = ?, "
 			+ "RETRY_TASK_ID = ?, RETRY_USER_ID = ?, METADATA_FROM_ARCHIVE_FILE_CONTAINER_ID = ?, "
-			+ "METADATA_TO_ARCHIVE_FILE_CONTAINER_ID = ?, METADATA_ARCHIVE_FILE_ID_PATTERN = ? "
+			+ "METADATA_TO_ARCHIVE_FILE_CONTAINER_ID = ?, METADATA_ARCHIVE_FILE_ID_PATTERN = ?, AUTO_TIERING_FILE_CONTAINER_ID = ?, AUTO_TIERING_FILE_ID = ? "
 			+ "when not matched then insert (ID, PARENT_ID, USER_ID, PATH, CONFIGURATION_ID, FROM_S3_ARCHIVE_CONFIGURATION_ID, "


UPSERT_DATA_MIGRATION_TASK_RESULT_SQL references AUTO_TIERING_FILE_CONTAINER_ID and AUTO_TIERING_FILE_ID columns. No migration in this PR adds these columns to HPC_DATA_MIGRATION_TASK_RESULT (and release-3.23.0/hpc_auto_tiering.sql only alters HPC_DATA_MIGRATION_TASK). Align the SQL with actual schema changes, or add the missing columns/migration.

Copilot · 2026-02-20T14:49:24Z

...mpl/src/main/java/gov/nih/nci/hpc/dao/oracle/impl/HpcDataManagementConfigurationDAOImpl.java

 		s3Configuration.setExternalStorage(rs.getBoolean("EXTERNAL_STORAGE"));
 		s3Configuration.setPosixPath(rs.getString("POSIX_PATH"));
+		s3Configuration.setAutoTieringSearchPath(rs.getString("AUTO_TIERING_SEARCH_PATH"));
+		s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));


AUTO_TIERING_INACTIVITY_MONTHS is nullable (added without NOT NULL), but this mapper uses rs.getInt(...), which returns 0 when the DB value is NULL. That makes it impossible to distinguish “unset” from an actual 0-month value and can cause auto-tiering to proceed with an unintended default. Use rs.getObject(..., Integer.class) or rs.getInt + rs.wasNull() to preserve nulls.

Suggested change

s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));

s3Configuration.setAutoTieringInactivityMonths(

rs.getObject("AUTO_TIERING_INACTIVITY_MONTHS", Integer.class));

Copilot · 2026-02-20T14:49:25Z

src/hpc-server/hpc-domain-model/src/main/resources/schema/HpcDataManagement.xsd

 			<xsd:element name="externalStorage" type="xsd:boolean" />
-			<xsd:element name="posixPath" type="xsd:string" minOccurs="0"/>
+			<xsd:element name="posixPath" type="xsd:string" />
+			<xsd:element name="autoTieringSearchPath" type="xsd:string" />


autoTieringSearchPath is added as a required element (no minOccurs=0), but the DB migration adds AUTO_TIERING_SEARCH_PATH as nullable and the service treats auto-tiering as an optional feature. Keeping this required is a breaking schema change for existing configurations/clients; make it optional (minOccurs=0) or version the schema.

Suggested change

<xsd:element name="autoTieringSearchPath" type="xsd:string" />

<xsd:element name="autoTieringSearchPath" type="xsd:string" minOccurs="0" />

Copilot · 2026-02-20T14:49:25Z

...hpc-scheduler-migration/src/main/resources/WEB-INF/spring/hpc-scheduler-migration.properties

+hpc.service.dataMigration.serverIds=server-id
 hpc.service.serverId=server-id


This properties file now hard-codes hpc.service.dataMigration.serverIds=server-id, while the other shipped properties (e.g., hpc-server.properties / hpc-scheduler.properties) leave serverIds empty for environment-specific configuration. Hard-coding a placeholder here can cause migrations to be assigned incorrectly in non-dev deployments; consider keeping it empty and documenting/overriding via ${catalina.base}/conf instead.

Suggested change

hpc.service.dataMigration.serverIds=server-id

hpc.service.serverId=server-id

# Server IDs are intentionally left empty for environment-specific configuration.

# Override these values via ${catalina.base}/conf in each deployment.

hpc.service.dataMigration.serverIds=

hpc.service.serverId=

… metadata

…hpc/integration/irods/impl/HpcDataManagementProxyImpl.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

HPCDATAMGM-2166: Collections gets registered sometimes without system metadata

# Conflicts: # utils/hpc-client/hpc-cli-3.24.0.jar # utils/hpc-client/lib/hpc-common-3.24.0.jar # utils/hpc-client/lib/hpc-domain-types-3.24.0.jar # utils/hpc-client/lib/hpc-dto-3.24.0.jar

Copilot

Pull request overview

Copilot reviewed 69 out of 74 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-08T17:26:41Z

src/hpc-server/hpc-domain-types/src/main/resources/schema/HpcDataMigrationTypes.xsd

 		<xsd:restriction base="xsd:string">
 			<xsd:enumeration value="RECEIVED" />
 			<xsd:enumeration value="IN_PROGRESS" />
+			<xsd:enumeration value="AUTO_TIERING_RECEIVED" />


The AUTO_TIERING_RECEIVED status is added as a new migration status in HpcDataMigrationTypes.xsd and is used to set the status for auto-tiering data object tasks in createDataObjectMigrationTask. However, the getDataMigrationTasks method in HpcDataMigrationServiceImpl uses a SQL query that fetches tasks by STATUS = ? with a single status value. The processBulkAutoTieringMigrationReceived scheduler method queries with HpcDataMigrationStatus.RECEIVED and type BULK_AUTO_TIERING, which would not include AUTO_TIERING_RECEIVED tasks. It is unclear whether AUTO_TIERING_RECEIVED individual data-object tasks are ever picked up and processed — there is no handler for AUTO_TIERING_RECEIVED data-object tasks in any of the processXxx methods.

Copilot · 2026-03-08T17:26:42Z

...ver/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/trino/impl/HpcExternalArchiveDAOImpl.java

+	private void dbConnect() throws HpcException {
+		try {
+			jdbcTemplate.getDataSource().getConnection();
+
+		} catch (Exception e) {
+			throw new HpcException("Failed to connect to Trino DB. Check connection & credentials config",
+					HpcErrorType.DATABASE_ERROR, HpcIntegratedSystem.VAST, e);
+		}
+	}


The dbConnect() method calls jdbcTemplate.getDataSource().getConnection() without closing the returned connection. This causes a connection leak every time the bean is initialized. The connection should be closed in a finally block or use a try-with-resources statement. Note that the existing Oracle DAO HpcDataManagementConfigurationDAOImpl.dbConnect() has the same pattern, but since this is new code it should be written correctly.

Copilot · 2026-03-08T17:26:42Z

...hpc-scheduler-migration/src/main/resources/WEB-INF/spring/hpc-scheduler-migration.properties

+hpc.dao.trino.host=trino.fairway.nci.nih.gov
+hpc.dao.trino.port=443
+hpc.dao.trino.username=ncif-hpcdm-svc
+hpc.dao.trino.password=< Configure Me >


The property file contains a literal placeholder password < Configure Me > committed to the repository. Credentials should never be stored in property files in version control, not even as placeholders. This file should reference an environment variable, a vault secret, or an external configuration store. Additionally, this is a template-like properties file checked in for a specific environment, which is a broader concern.

Copilot · 2026-03-08T17:26:42Z

src/hpc-server/hpc-domain-model/src/main/resources/schema/HpcDataManagement.xsd

+			<xsd:element name="posixPath" type="xsd:string" />
+			<xsd:element name="autoTieringSearchPath" type="xsd:string" />
+			<xsd:element name="autoTieringInactivityMonths" type="xsd:int" minOccurs="0" />


The posixPath field in the HpcS3ArchiveConfiguration complex type was changed from minOccurs="0" (optional) to required (no minOccurs attribute, which defaults to 1). This is a breaking schema change: any existing XML documents that omit posixPath will now fail validation, and any existing configurations in the database that do not have a posixPath set will cause deserialization failures at runtime. If this field is truly now required for all S3 archive configurations, existing database records and any client-side DTOs that build these objects must be updated accordingly.

Copilot · 2026-03-08T17:26:43Z

...-bus-service-impl/src/main/java/gov/nih/nci/hpc/bus/impl/HpcDataMigrationBusServiceImpl.java

+    public void processAutoTiering() throws HpcException {
+        // obtain a list of HPC data management configurations which are to be included in auto-tiering scan and
+        // create a task for each.
+        dataManagementService.getAutoTieringDataManagementConfigurations().forEach(dataManagementConfiguration -> {
+            try {
+                logger.info("Create bulk auto-tiering task for data management configuration: {}",
+                        dataManagementConfiguration.getId());
+                dataMigrationService.createBulkAutoTieringTask(dataManagementConfiguration.getId(),
+                        dataManagementConfiguration.getS3UploadConfigurationId(),
+                        dataManagementConfiguration.getS3AutoTieringConfigurationId(), null);
+
+            } catch (HpcException e) {
+                logger.error("Failed to process auto-tiering for data management configuration: {}",
+                        dataManagementConfiguration.getId(), e);
+            }
+        });


The processAutoTiering() (no-arg, called by the scheduler) uses dataManagementConfiguration.getS3UploadConfigurationId() as the fromS3ArchiveConfigurationId when calling createBulkAutoTieringTask. However, for auto-tiering the source is the external VAST archive, not the S3 upload configuration. The upload configuration ID represents where files are uploaded to, not the external source being scanned. The fromS3ArchiveConfigurationId argument in createBulkAutoTieringTask should probably be the same configuration that has autoTieringSearchPath and autoTieringInactivityMonths defined, which is the auto-tiering source configuration, not the upload configuration.

Copilot · 2026-03-08T17:26:43Z

...-bus-service-impl/src/main/java/gov/nih/nci/hpc/bus/impl/HpcDataMigrationBusServiceImpl.java

+                            // Get data objects eligible for auto-tiering and create individual migration tasks
+                            processBulkAutoTieringMigration(bulkAutoTieringTask);
+
+                            // Mark the bulk metadata update task - in-progress


The comment on line 635 says "Mark the bulk metadata update task - in-progress" but the code is actually processing a bulk auto-tiering task, not a bulk metadata update task. This is a copy-paste error from the processBulkMetadataUpdateMigrationReceived method.

Suggested change

// Mark the bulk metadata update task - in-progress

// Mark the bulk auto-tiering task - in-progress

Copilot · 2026-03-08T17:26:43Z

...-bus-service-impl/src/main/java/gov/nih/nci/hpc/bus/impl/HpcDataMigrationBusServiceImpl.java

+                    bulkAutoTieringTask.getFromS3ArchiveConfigurationId(),
+                    bulkAutoTieringTask.getToS3ArchiveConfigurationId(),
+                    bulkAutoTieringTask.getId(), bulkAutoTieringTask.getAlignArchivePath(),
+                    null, null, bulkAutoTieringTask.getRetryUserId(),


In processBulkAutoTieringMigration, bulkAutoTieringTask.getRetryUserId() is passed as the retryUserId parameter to createDataObjectMigrationTask, but for a new auto-tiering task (not a retry), this value would typically be null. However, retryTaskId is passed as null while retryUserId is taken from the bulk task's retryUserId. This is inconsistent — if retryTaskId is null, retryUserId should also be null. This looks like a copy-paste mistake.

Suggested change

null, null, bulkAutoTieringTask.getRetryUserId(),

null, null, null,

Copilot · 2026-03-08T17:26:43Z

...ver/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/trino/impl/HpcExternalArchiveDAOImpl.java

+			return jdbcTemplate.queryForList(
+					GET_FILES_NOT_ACCESSED_SQL.replace("{months}", monthsNotAccessed.toString()),
+					String.class, searchPath, searchPath);


The SQL query substitutes {months} using plain string replacement (GET_FILES_NOT_ACCESSED_SQL.replace("{months}", monthsNotAccessed.toString())). Even though monthsNotAccessed is an Integer (so there is no direct SQL injection risk from that field), the substitution bypasses JDBC parameterized query handling. Trino's JDBC driver may support parameterized INTERVAL expressions; if so, this should use a proper bind parameter instead of string concatenation. At minimum, a comment should explain why string substitution is used here rather than a bind parameter.

Copilot · 2026-03-08T17:26:44Z

src/hpc-server/hpc-dao-impl/src/main/scripts/migration/release-3.23.0/hpc_auto_tiering.sql

+--
+-- hpc_auto_tiering.sql
+--
+-- Copyright SVG, Inc.
+-- Copyright Leidos Biomedical Research, Inc
+--
+-- Distributed under the OSI-approved BSD 3-Clause License.
+-- See http://ncip.github.com/HPC/LICENSE.txt for details.
+--
+--
+-- @author <a href="mailto:rosenbergea@nih.gov">Eran Rosenberg</a>
+--
+
+-- HPC_S3_ARCHIVE_CONFIGURATION
+ALTER TABLE HPC_S3_ARCHIVE_CONFIGURATION add (
+	AUTO_TIERING_SEARCH_PATH VARCHAR2(2700)
+	);
+COMMENT ON COLUMN HPC_S3_ARCHIVE_CONFIGURATION.AUTO_TIERING_SEARCH_PATH IS 'The search path for auto-tiering external archive files';
+
+ALTER TABLE HPC_S3_ARCHIVE_CONFIGURATION add (
+	AUTO_TIERING_INACTIVITY_MONTHS NUMBER(10)
+	);
+COMMENT ON COLUMN HPC_S3_ARCHIVE_CONFIGURATION.AUTO_TIERING_INACTIVITY_MONTHS IS 'The inactivity period in months before files are auto-tiered to Glacier Deep Archive';
+
+-- HPC_DATA_MANAGEMENT_CONFIGURATION
+ALTER TABLE HPC_DATA_MANAGEMENT_CONFIGURATION add (
+	S3_AUTO_TIERING_ARCHIVE_CONFIGURATION_ID VARCHAR2(50)
+	);
+COMMENT ON COLUMN HPC_DATA_MANAGEMENT_CONFIGURATION.S3_AUTO_TIERING_ARCHIVE_CONFIGURATION_ID IS 'The S3 archive configuration ID for auto-tiering external archive into';
+
+-- HPC_DATA_MIGRATION_TASK
+ALTER TABLE HPC_DATA_MIGRATION_TASK add (
+	FROM_S3_ARCHIVE_LOCATION_FILE_CONTAINER_ID VARCHAR2(50)
+	);
+COMMENT ON COLUMN HPC_DATA_MIGRATION_TASK.FROM_S3_ARCHIVE_LOCATION_FILE_CONTAINER_ID IS 'The archive file container ID (bucket) migrating from';
+
+ALTER TABLE HPC_DATA_MIGRATION_TASK add (
+	FROM_S3_ARCHIVE_LOCATION_FILE_ID VARCHAR2(2700)
+	);
+COMMENT ON COLUMN HPC_DATA_MIGRATION_TASK.FROM_S3_ARCHIVE_LOCATION_FILE_ID IS 'The archive file ID (object ID) migrating from';
+
+


The identical SQL migration script hpc_auto_tiering.sql has been added to both release-3.23.0/ and release-3.24.0/. Only release-3.24.0/ should contain this script, since the feature is part of version 3.24.0. Including it under release-3.23.0/ is likely a mistake and would cause the DDL to be applied twice if both release directories are executed during an upgrade from 3.22.x to 3.24.0, leading to "column already exists" errors.

Eran Rosenberg added 8 commits February 2, 2026 13:01

Initial code w/ DB script

b5ac515

Merge branch 'releases/3.23.0' into features/HPCDATAMGM-2088

308b18a

# Conflicts: # src/hpc-server/hpc-scheduler-migration/src/main/resources/WEB-INF/spring/hpc-scheduler-migration.properties

Merge branch 'releases/3.23.0' into features/HPCDATAMGM-2088

1b788c6

Done w/ auto-tiering bulk task and a new API to kick off auto-tiering…

cea5f4f

… for a given config id

Merge branch 'releases/3.23.0' into features/HPCDATAMGM-2088

bb489d8

Progress so far - bus and app services

a369e12

Merge branch 'releases/3.23.0' into features/HPCDATAMGM-2088

7a6d8ee

Changes so far to include 2 scheduled tasks

8d3a05a

Copilot AI review requested due to automatic review settings February 20, 2026 14:40

Copilot started reviewing on behalf of eranrosenberg February 20, 2026 14:41 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

dinhys and others added 10 commits February 27, 2026 09:04

Created branch 3.24.0.

d153b43

HPCDATAMGM-2166: Collections gets registered sometimes without system…

ec6916c

… metadata

Update src/hpc-server/hpc-integration-impl/src/main/java/gov/nih/nci/…

800e812

…hpc/integration/irods/impl/HpcDataManagementProxyImpl.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

HPCDATAMGM-2167: Upgrade jackson-core from 2.18.1 to 2.18.6

0040d9c

Merge pull request #148 from CBIIT/features/HPCDATAMGM-2166

a6f7eda

HPCDATAMGM-2166: Collections gets registered sometimes without system metadata

Design doc update

08009d2

Merge branch 'releases/3.23.0' into features/HPCDATAMGM-2088

227af59

Merge branch 'releases/3.24.0' into features/HPCDATAMGM-2088

3112712

# Conflicts: # utils/hpc-client/hpc-cli-3.24.0.jar # utils/hpc-client/lib/hpc-common-3.24.0.jar # utils/hpc-client/lib/hpc-domain-types-3.24.0.jar # utils/hpc-client/lib/hpc-dto-3.24.0.jar

Move scripts

043a9bf

Completed DAO w/ Trino

1cc8173

eranrosenberg requested a review from Copilot March 8, 2026 17:22

Copilot started reviewing on behalf of eranrosenberg March 8, 2026 17:23 View session

Copilot AI reviewed Mar 8, 2026

View reviewed changes

		migrationTask.setStatus(autoTieringRequest ? HpcDataMigrationStatus.AUTO_TIERING_RECEIVED :
		HpcDataMigrationStatus.RECEIVED);

	migrationTask.setStatus(autoTieringRequest ? HpcDataMigrationStatus.AUTO_TIERING_RECEIVED :
	HpcDataMigrationStatus.RECEIVED);
	migrationTask.setStatus(HpcDataMigrationStatus.RECEIVED);

	<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" />
	<xsd:element name="s3AutoTieringConfigurationId" type="xsd:string" minOccurs="0" />

	<property name="SCHEDULER_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler.log"/>
	<property name="SCHEDULER_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler.log"/>

	<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.home}/logs/hpc-server/hpc-scheduler-migration.log"/>
	<property name="SCHEDULER_MIGRATION_SERVER_LOG" value="${catalina.base}/logs/hpc-server/hpc-scheduler-migration.log"/>

	s3Configuration.setAutoTieringInactivityMonths(rs.getInt("AUTO_TIERING_INACTIVITY_MONTHS"));
	s3Configuration.setAutoTieringInactivityMonths(
	rs.getObject("AUTO_TIERING_INACTIVITY_MONTHS", Integer.class));

	<xsd:element name="autoTieringSearchPath" type="xsd:string" />
	<xsd:element name="autoTieringSearchPath" type="xsd:string" minOccurs="0" />

		hpc.service.dataMigration.serverIds=server-id
		hpc.service.serverId=server-id

	// Mark the bulk metadata update task - in-progress
	// Mark the bulk auto-tiering task - in-progress

	null, null, bulkAutoTieringTask.getRetryUserId(),
	null, null, null,

Conversation

eranrosenberg commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants