Skip to content

Conversation

@edmundmiller
Copy link
Owner

This commit adds a new fromDataset() operator to the nf-tower plugin
that allows downloading datasets from Seqera Platform.

Key Features:

  • Downloads dataset files from Seqera Platform via the API
  • Supports version specification (defaults to version 1)
  • Supports custom file names (defaults to data.csv)
  • Returns dataset content as a String for further processing
  • Integrates seamlessly with nf-schema and other tools

Usage Examples:

// Basic usage - download default file from dataset
def content = Channel.fromDataset('my-dataset-id')

// With nf-schema integration
ch_input = Channel.fromList(
    samplesheetToList(Channel.fromDataset(params.input), "assets/schema_input.json")
)

// Specify version and filename
def dataset = Channel.fromDataset(
    datasetId: 'my-dataset-id',
    version: '2',
    fileName: 'samples.csv'
)

Implementation Details:

  • DatasetHelper: Handles API communication with Seqera Platform
  • TowerChannelExtension: Provides the Channel extension method
  • Uses Groovy extension module mechanism for seamless integration
  • Properly handles authentication via TOWER_ACCESS_TOKEN
  • Comprehensive error handling for HTTP errors (404, 403, 500, etc.)

TODOs for future enhancements:

  • Add support for listing datasets (using /datasets API endpoint)
  • Auto-detect latest version when not specified
  • Query dataset metadata to determine actual filename

Related to PR nextflow-io#6515 (dataset upload functionality)

Signed-off-by: Edmund Miller [email protected]

This commit adds a new `fromDataset()` operator to the nf-tower plugin
that allows downloading datasets from Seqera Platform.

Key Features:
- Downloads dataset files from Seqera Platform via the API
- Supports version specification (defaults to version 1)
- Supports custom file names (defaults to data.csv)
- Returns dataset content as a String for further processing
- Integrates seamlessly with nf-schema and other tools

Usage Examples:
```groovy
// Basic usage - download default file from dataset
def content = Channel.fromDataset('my-dataset-id')

// With nf-schema integration
ch_input = Channel.fromList(
    samplesheetToList(Channel.fromDataset(params.input), "assets/schema_input.json")
)

// Specify version and filename
def dataset = Channel.fromDataset(
    datasetId: 'my-dataset-id',
    version: '2',
    fileName: 'samples.csv'
)
```

Implementation Details:
- DatasetHelper: Handles API communication with Seqera Platform
- TowerChannelExtension: Provides the Channel extension method
- Uses Groovy extension module mechanism for seamless integration
- Properly handles authentication via TOWER_ACCESS_TOKEN
- Comprehensive error handling for HTTP errors (404, 403, 500, etc.)

TODOs for future enhancements:
- Add support for listing datasets (using /datasets API endpoint)
- Auto-detect latest version when not specified
- Query dataset metadata to determine actual filename

Related to PR nextflow-io#6515 (dataset upload functionality)

Signed-off-by: Edmund Miller <[email protected]>
The previous tests were not actually testing anything meaningful:
- They didn't properly mock DatasetHelper construction
- They would try to make real network calls
- Assertions were weak (just checking != null or noExceptionThrown)

Kept only the meaningful tests:
- Parameter validation tests (which actually work)
- One test that verifies parameter conversion without mocking hell

The core logic is properly tested in DatasetHelperTest which has
real mocks and assertions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants