Skip to content

Conversation

@sosoihd
Copy link

@sosoihd sosoihd commented Nov 27, 2025

Add support for latest-generation Google Cloud machine families and boot disk type configuration

Problem

Two critical limitations prevented full utilization of Google Cloud Batch capabilities:

1. Missing support for latest-generation machine families

Google Cloud has introduced several new general-purpose machine families that are not currently supported by Nextflow:

These families offer significant improvements in:

  • Price-performance ratio (up to 10% better than previous generations)
  • Memory bandwidth
  • Network throughput
  • Energy efficiency

Without this support, users cannot leverage:

  • Latest Intel Sapphire Rapids processors (C4, N4)
  • Latest AMD EPYC Genoa processors (C4A, C4D, N4A, N4D)
  • Improved performance characteristics of these newer families

2. Inability to specify boot disk type

Currently, Nextflow only allows configuring the boot disk size via google.batch.bootDiskSize, but not the disk type. This creates several issues:

Compatibility problems:

  • The new C4, C4A, C4D, N4, N4A, and N4D families do not support pd-balanced disks (the Google Cloud default)
  • These families require alternative disk types like hyperdisk-balanced or pd-ssd
  • This makes it impossible to use these new machine families at all

Performance optimization:

  • High-I/O workloads may benefit from pd-ssd (higher IOPS)
  • Cost-sensitive workflows may prefer pd-standard (lower cost)
  • Users cannot optimize disk performance for their specific workloads

Reference: Google Cloud Disk Types Documentation

Solution

This PR addresses both issues with a comprehensive solution:

1. Add support for latest-generation machine families

Machine type recognition:

  • Added C4, C4A, C4D, N4, N4A, N4D families to GENERAL_PURPOSE_FAMILIES
  • Updated local SSD handling for C4/C4A/C4D families with -lssd suffix
  • Disabled local SSD for N4, N4A, N4D families (not supported by hardware)

Testing:

  • Added comprehensive tests for all new machine families
  • Verified local SSD behavior (supported/not supported) per family

2. Add bootDiskType configuration option

New configuration parameter:

google {
    project = 'your-project-id'
    location = 'us-central1'
    batch {
        bootDiskSize = '50 GB'
        bootDiskType = 'hyperdisk-balanced'  // NEW: Specify disk type
    }
}

Supported disk types (Google Cloud documentation):

  • pd-standard - Standard persistent disk (HDD, lowest cost)
  • pd-balanced - Balanced persistent disk (SSD, default for most instances)
  • pd-ssd - SSD persistent disk (highest performance)
  • hyperdisk-balanced - Hyperdisk balanced (required for C4/N4 families)

Key features:

  • Optional configuration (backward compatible)
  • Works with bootDiskImage when both are specified
  • Ignored when using instance templates (with warning)
  • Enables use of new machine families that require specific disk types

Changes

Core Implementation

GoogleBatchMachineTypeSelector.groovy

  • Added GENERAL_PURPOSE_FAMILIES constant for C4/N4 family detection
  • Implemented isHyperdiskOnly() method to identify families requiring Hyperdisk
  • Updated findValidLocalSSDSize() to handle C4/C4D local SSD variants
  • Added logic to disable local SSD for N4/N4A/N4D families

BatchConfig.groovy

  • Added bootDiskType field with @ConfigOption annotation
  • Comprehensive documentation including machine family compatibility notes
  • Constructor initialization for the new parameter

GoogleBatchTaskHandler.groovy

  • Updated boot disk configuration logic to apply bootDiskType when specified
  • Added warning when bootDiskType is used with instance templates
  • Consolidated boot disk builder logic for clarity

Tests

GoogleBatchMachineTypeSelectorTest.groovy

  • Added tests for C4, C4A, C4D families with local SSD support
  • Added tests for N4, N4A, N4D families (no local SSD support)
  • Verified isHyperdiskOnly() behavior for all new families

BatchConfigTest.groovy

  • Added test for bootDiskType parsing from configuration
  • Added test for bootDiskType combined with other boot disk options
  • Verified default behavior (null when not specified)

GoogleBatchTaskHandlerTest.groovy

  • Added test for boot disk type configuration alone
  • Added test for boot disk type combined with boot disk image
  • Verified proper protobuf generation in job requests

Documentation

docs/reference/config.md

  • Added google.batch.bootDiskType to configuration reference
  • Documented all supported disk types with links to Google Cloud docs

docs/google.md

  • Updated disk directive documentation
  • Removed outdated limitation about disk type configuration

Compatibility

  • Fully backward compatible - All changes are additive
  • No breaking changes - Existing configurations work unchanged
  • Optional parameters - New features are opt-in
  • Instance template support - Proper handling with warnings

Testing

All tests pass:

  • ✅ 31 new tests for new machine families (local SSD handling)
  • ✅ 3 new tests for bootDiskType configuration
  • ✅ 110+ existing tests continue to pass
  • ✅ Integration with existing features verified

Test coverage:

  • Machine family detection and validation
  • Local SSD size validation per family
  • Boot disk type configuration parsing
  • Boot disk type in job request generation
  • Combined boot disk image + type scenarios
  • Instance template compatibility

Use Cases Enabled

1. Using latest-generation machines:

process myTask {
    machineType 'c4-standard-4'  // Now works!
    memory '16 GB'
    
    script:
    """
    # High-performance workload on latest Intel Sapphire Rapids
    """
}

2. Optimizing for cost:

google.batch.bootDiskType = 'pd-standard'  // Lower cost HDD boot disk

3. Optimizing for performance:

google.batch.bootDiskType = 'pd-ssd'  // High-performance SSD boot disk

4. Using new machine families:

process highPerf {
    machineType 'c4a-standard-8'  // AMD EPYC Genoa
    
    script:
    """
    # Requires hyperdisk-balanced or pd-ssd
    """
}

google.batch.bootDiskType = 'hyperdisk-balanced'  // Compatible with C4A

References

docs: update Google Batch documentation to include bootDiskType option

fix: ensure compatibility with machine families requiring Hyperdisk

test(nf-google): add unit tests for boot disk type configurations

chore: update .gitignore to exclude 'mise.toml'

build: update build-info properties for nextflow module
Signed-off-by: Sofiane Ihaddadene <[email protected]>
@sosoihd sosoihd requested a review from a team as a code owner November 27, 2025 14:57
@netlify
Copy link

netlify bot commented Nov 27, 2025

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 590ab80
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/695ce08194c2c700085e29ff
😎 Deploy Preview https://deploy-preview-6616--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Collaborator

@christopher-hakkaart christopher-hakkaart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on the code. But the docs are well written and no objections from me.

Copy link
Contributor

@jorgee jorgee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done some tests and I suggest not allowing the setting of the boot disk type. It requires a strict instance-boot-disk-type validation, as any instance type has a different set of supported disks.

A user likely defines a boot disk type without defining the instance type, and in this case, jobs are stuck at scheduled state because the disk type is not supported by the instance selected in GoogleBatchTaskHandler.findBestMachineType.

To have proper boot disk type support, the MachineTypeSelector must be aware of what disk types are supported per instance, and this is not the case in the current PR. If the goal of the PR is just allow the use of the new instance types, it is better to just set the hyperdisk for those instances that do not support the default disk type.

* Families that only support Hyperdisk (no standard PD)
* LAST UPDATE 2024-05-22
*/
private static final List<String> GENERAL_PURPOSE_FAMILIES = ['c4-*', 'c4a-*', 'c4d-*', 'n4-*', 'n4d-*', 'n4a-*']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename as HYPERDISK_ONLY_FAMILIES.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jorgee for the feedback and testing. I agree—adding the boot disk type configuration adds too much complexity right now regarding validation.

I will remove the bootDiskType feature from this PR and stick to just enabling the new instance types (automatically setting Hyperdisk where needed). I'll handle the generic boot disk configuration in a separate, dedicated PR later to ensure the MachineTypeSelector logic is robust.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also rename the constant to HYPERDISK_ONLY_FAMILIES as suggested, as it is much more descriptive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for Google Cloud new generation machine families (C3, C4, N4) and Hyperdisk

5 participants