Features/hpcdatamgm 2097 version2#155
Features/hpcdatamgm 2097 version2#155saradachintala wants to merge 29 commits intoreleases/3.24.0from
Conversation
…tion&download from an external archive
…ODS file when deleting the Archive Link
… insted of App Layer
…ode between download of dataObjects and Collections
…taManagementConfigurationId
There was a problem hiding this comment.
Pull request overview
This PR adds support for downloading data objects and collections from an external archive into a Globus destination by introducing new “external source” download endpoints and propagating an externalArchiveFlag/RECEIVED_EXTERNAL state through the transfer/task pipeline.
Changes:
- Added REST API endpoints for external-source downloads (collection, data object, and bulk).
- Extended schemas/types to include
externalArchiveFlagand added a newRECEIVED_EXTERNALstatus for download task workflows. - Updated scheduler + bus/app/DAO layers to persist and process the new external-archive download flow.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| src/hpc-server/hpc-ws-rs-impl/src/main/java/gov/nih/nci/hpc/ws/rs/impl/HpcDataManagementRestServiceImpl.java | Wires new REST methods to bus-service methods for external-source downloads. |
| src/hpc-server/hpc-ws-rs-api/src/main/java/gov/nih/nci/hpc/ws/rs/HpcDataManagementRestService.java | Adds new /ext/.../download REST endpoints. |
| src/hpc-server/hpc-scheduler/src/main/resources/WEB-INF/spring/hpc-scheduler.properties | Adds cron property for archive-link download task starter. |
| src/hpc-server/hpc-scheduler/src/main/java/gov/nih/nci/hpc/scheduler/impl/HpcScheduledTasksImpl.java | Adds scheduled task to start processing RECEIVED_EXTERNAL data object downloads. |
| src/hpc-server/hpc-dto/src/main/resources/schema/HpcDataManagement.v2.xsd | Adds optional externalArchiveFlag to download request DTOs. |
| src/hpc-server/hpc-domain-types/src/main/resources/schema/HpcDataTransferTypes.xsd | Adds RECEIVED_EXTERNAL statuses and externalArchiveFlag elements to transfer domain types. |
| src/hpc-server/hpc-dao-impl/src/main/scripts/migration/release-3.20.0/hpc_data_object_download_task_update.sql | Adds DB column for external archive flag (currently only for data object tasks). |
| src/hpc-server/hpc-dao-impl/src/main/java/gov/nih/nci/hpc/dao/oracle/impl/HpcDataDownloadDAOImpl.java | Persists/loads EXTERNAL_ARCHIVE_FLAG and updates SQL for tasks. |
| src/hpc-server/hpc-bus-service-impl/src/main/java/gov/nih/nci/hpc/bus/impl/HpcSystemBusServiceImpl.java | Processes external collection tasks and adds RECEIVED_EXTERNAL handling for data object tasks. |
| src/hpc-server/hpc-bus-service-impl/src/main/java/gov/nih/nci/hpc/bus/impl/HpcDataManagementBusServiceImpl.java | Implements external-source download methods and propagates the external flag into transfer service calls. |
| src/hpc-server/hpc-bus-service-api/src/main/java/gov/nih/nci/hpc/bus/HpcSystemBusService.java | Exposes startArchiveLinkDataObjectDownloadTasks() in the system bus API. |
| src/hpc-server/hpc-bus-service-api/src/main/java/gov/nih/nci/hpc/bus/HpcDataManagementBusService.java | Exposes new external-source download methods in the bus API. |
| src/hpc-server/hpc-app-service-impl/src/test/java/gov/nih/nci/hpc/service/impl/HpcDataTransferServiceImplTest.java | Updates tests for changed downloadDataObject signature. |
| src/hpc-server/hpc-app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataTransferServiceImpl.java | Adds externalArchiveFlag plumbing, new external-status transition method, and status handling in resets. |
| src/hpc-server/hpc-app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataManagementServiceImpl.java | Adds lookup for S3 archive configuration by external POSIX path. |
| src/hpc-server/hpc-app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataManagementConfigurationLocator.java | Exposes S3 archive configurations for the external-path lookup. |
| src/hpc-server/hpc-app-service-api/src/main/java/gov/nih/nci/hpc/service/HpcDataTransferService.java | Extends service signatures and adds new external-status transition method. |
| src/hpc-server/hpc-app-service-api/src/main/java/gov/nih/nci/hpc/service/HpcDataManagementService.java | Adds API to find transfer configuration for an external path. |
Comments suppressed due to low confidence (1)
src/hpc-server/hpc-app-service-impl/src/main/java/gov/nih/nci/hpc/service/impl/HpcDataManagementServiceImpl.java:79
- There are duplicate imports for
gov.nih.nci.hpc.domain.model.HpcDataTransferConfiguration(imported twice). Please remove the duplicate to keep the imports clean and avoid potential style/lint failures.
import gov.nih.nci.hpc.domain.model.HpcDataManagementConfiguration;
import gov.nih.nci.hpc.domain.model.HpcDataTransferConfiguration;
import gov.nih.nci.hpc.domain.model.HpcDataObjectRegistrationRequest;
import gov.nih.nci.hpc.domain.model.HpcDataObjectRegistrationResult;
import gov.nih.nci.hpc.domain.model.HpcDataTransferConfiguration;
import gov.nih.nci.hpc.domain.model.HpcStorageRecoveryConfiguration;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…as removed by Copilot
…t accidentally changed the string logic
dinhys
left a comment
There was a problem hiding this comment.
I've left a few comments, we can review it in our Dev meeting today.
| ALTER TABLE HPC_COLLECTION_DOWNLOAD_TASK add ( | ||
| EXTERNAL_ARCHIVE_FLAG CHAR default '0' | ||
| ); | ||
| COMMENT ON COLUMN HPC_COLLECTION_DOWNLOAD_TASK.EXTERNAL_ARCHIVE_FLAG IS 'Indicates if the data is stored externally'; |
There was a problem hiding this comment.
Do we want to retain this EXTERNAL_ARCHIVE_FLAG in HPC_DOWNLOAD_TASK_RESULT?
| globusDownloadDestination, s3DownloadDestination, googleDriveDownloadDestination, | ||
| googleCloudStorageDownloadDestination, asperaDownloadDestination, boxDownloadDestination, false, | ||
| false, userId, retryItem.getDestinationLocation(), collectionDownloadTaskId); | ||
| false, userId, retryItem.getDestinationLocation(), collectionDownloadTaskId, false); |
There was a problem hiding this comment.
Just confirming whether collection retries should be disabled for download from external archive.
| downloadTask.setExternalArchiveFlag(true); | ||
| dataTransferService.updateCollectionDownloadTask(downloadTask); | ||
| } | ||
|
|
There was a problem hiding this comment.
Process Collection Download task will now process all RECEIVED_EXTERNAL records first before processing RECEIVED. Is this ok if we have a big backlog for RECEIVED_EXTERNAL?
| * {@code "pathWithPosixPathRemoved"}. | ||
| * @throws HpcException on service failure or invalid configuration. | ||
| */ | ||
| private Map<String, String> externalDownloadDetailsMapping(HpcDataTransferConfiguration s3ArchiveConfiguration, String path) throws HpcException { |
There was a problem hiding this comment.
Should we name this with a verb getExternalDownloadDetailsMapping or Config instead of Detail? Just a thought.
| directoryScanRegistrationItem.setS3ArchiveConfigurationId(s3ArchiveConfiguration.getId()); | ||
| HpcBulkDataObjectRegistrationRequestDTO registrationBulkRequestDTO = new HpcBulkDataObjectRegistrationRequestDTO(); | ||
| registrationBulkRequestDTO.getDirectoryScanRegistrationItems().add(directoryScanRegistrationItem); | ||
| registerDataObjects(registrationBulkRequestDTO); |
There was a problem hiding this comment.
Do we need anything from the reponseDTO like task ID?
…is true in the S3Configuration
…nForExternalPath to look through dataManagementConfigurations rather than S3 configurations
This Pull Request contains the code for downloading a single DataObject and a single Collection from an External Archive to a Globus Destination.
The tickets associated with this PR are
HPCDATAMGM-2097: Implement External Download API for downloading a dataObject to Globus
HPCDATAMGM-2168: Implement Download API to download a dataObject from an External Archive to Globus
HPCDATAMGM-2172: Implement Download API to download a Collection from an External Archive to Globus