Skip to content

Conversation

@ehigham
Copy link
Member

@ehigham ehigham commented Jul 23, 2025

This change removes uses of HailContext.backend​ as part of an effort to remove the HailContext​ singleton. Instead, the current Backend​ is accessed by threading ExecuteContext​.

This change cannot impact the Hail Batch instance as deployed by Broad Institute in GCP

@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch from 76131f2 to 20d92fc Compare July 23, 2025 03:41
@ehigham ehigham force-pushed the ehigham/branching-factor-flags branch from 3adc65e to 11403ed Compare July 23, 2025 03:41
@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch from 20d92fc to 219a9e3 Compare July 23, 2025 03:49
@ehigham ehigham force-pushed the ehigham/branching-factor-flags branch from 11403ed to 2e67e7b Compare July 23, 2025 03:49
@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch from 219a9e3 to c5ac897 Compare July 23, 2025 15:32
@ehigham ehigham force-pushed the ehigham/branching-factor-flags branch from 2e67e7b to 417e8a3 Compare July 23, 2025 15:32
@ehigham ehigham force-pushed the ehigham/branching-factor-flags branch from 417e8a3 to e739136 Compare July 30, 2025 20:07
@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch 2 times, most recently from d69c346 to c870e2a Compare July 31, 2025 16:12
@ehigham ehigham force-pushed the ehigham/branching-factor-flags branch 2 times, most recently from 2d38548 to c3bb8a1 Compare July 31, 2025 16:38
@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch from c870e2a to d32eac1 Compare July 31, 2025 16:38
@ehigham ehigham force-pushed the ehigham/branching-factor-flags branch from c3bb8a1 to 24a200f Compare July 31, 2025 17:41
@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch from d32eac1 to 214074f Compare July 31, 2025 17:41
@ehigham ehigham force-pushed the ehigham/branching-factor-flags branch from 24a200f to ae470ba Compare July 31, 2025 17:41
@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch from 214074f to 4d26e71 Compare July 31, 2025 17:41
@ehigham ehigham force-pushed the ehigham/branching-factor-flags branch from ae470ba to c164f2f Compare August 1, 2025 14:58
@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch from 4d26e71 to 32b7556 Compare August 1, 2025 14:58
@ehigham ehigham force-pushed the ehigham/branching-factor-flags branch from c164f2f to 4538bfd Compare August 1, 2025 22:11
@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch 2 times, most recently from dfa72f0 to aaf34a2 Compare August 1, 2025 22:16
@ehigham ehigham force-pushed the ehigham/branching-factor-flags branch 2 times, most recently from 9d81d62 to 4f0cd2b Compare August 2, 2025 04:21
@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch from aaf34a2 to c75ea75 Compare August 2, 2025 04:21
@ehigham ehigham force-pushed the ehigham/branching-factor-flags branch from 4f0cd2b to c070764 Compare August 2, 2025 04:59
@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch from c75ea75 to 3cc18ef Compare August 2, 2025 04:59
@ehigham ehigham force-pushed the ehigham/branching-factor-flags branch from c070764 to 8834837 Compare August 2, 2025 05:02
@ehigham ehigham force-pushed the ehigham/move-check-rvd-keys branch 2 times, most recently from 33d0b37 to 840e82a Compare September 17, 2025 18:01
@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch from 8760f47 to 2d586d3 Compare September 17, 2025 18:01
Comment on lines -195 to -202
def parseVCFMetadata(fs: FS, file: String): Map[String, Map[String, Map[String, String]]] =
LoadVCF.parseHeaderMetadata(fs, Set.empty, TFloat64, file)

def pyParseVCFMetadataJSON(fs: FS, file: String): String = {
val metadata = LoadVCF.parseHeaderMetadata(fs, Set.empty, TFloat64, file)
implicit val formats = defaultJSONFormats
JsonMethods.compact(Extraction.decompose(metadata))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused/implemented in BackendRpc

Copy link
Collaborator

@chrisvittal chrisvittal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a thought to avoid the 'hack' you put in. Fantastic work though.

def pathsUsed: Seq[String] = FastSeq(params.path)

val getNumPartitions: Int = params.nPartitions.getOrElse(HailContext.backend.defaultParallelism)
val getNumPartitions: Int = params.nPartitions.getOrElse(4)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought, make this parameter required, and then supply it with a default from python.

This is where we construct this node in python. What if we were to get the parallelism from the branching_factor flag here?

@typecheck_method(n_partitions=nullable(int), maximum_cache_memory_in_bytes=nullable(int))
def to_table_row_major(self, n_partitions=None, maximum_cache_memory_in_bytes=None):
"""Returns a table where each row represents a row in the block matrix.
The resulting table has the following fields:
- **row_idx** (:py:data.`tint64`, key field) -- Row index
- **entries** (:py:class:`.tarray` of :py:data:`.tfloat64`) -- Entries for the row
Examples
--------
>>> import numpy as np
>>> block_matrix = BlockMatrix.from_numpy(np.array([[1, 2], [3, 4], [5, 6]]), 2)
>>> t = block_matrix.to_table_row_major()
>>> t.show()
+---------+---------------------+
| row_idx | entries |
+---------+---------------------+
| int64 | array<float64> |
+---------+---------------------+
| 0 | [1.00e+00,2.00e+00] |
| 1 | [3.00e+00,4.00e+00] |
| 2 | [5.00e+00,6.00e+00] |
+---------+---------------------+
Parameters
----------
n_partitions : int or None
Number of partitions of the table.
maximum_cache_memory_in_bytes : int or None
The amount of memory to reserve, per partition, to cache rows of the
matrix in memory. This value must be at least large enough to hold
one row of the matrix in memory. If this value is exactly the size of
one row, then a partition makes a network request for every row of
every block. Larger values reduce the number of network requests. If
memory permits, setting this value to the size of one output
partition permits one network request per block per partition.
Notes
-----
Does not support block-sparse matrices.
Returns
-------
:class:`.Table`
Table where each row corresponds to a row in the block matrix.
"""
path = new_temp_file()
if maximum_cache_memory_in_bytes and maximum_cache_memory_in_bytes > (1 << 31) - 1:
raise ValueError(
f'maximum_cache_memory_in_bytes must be less than 2^31 -1, was: {maximum_cache_memory_in_bytes}'
)
self.write(path, overwrite=True, force_row_major=True)
reader = TableFromBlockMatrixNativeReader(path, n_partitions, maximum_cache_memory_in_bytes)
return Table(TableRead(reader))

val remainingPartitions =
contexts.indices.filterNot(k => cachedResults.containsOrdered[Int](k, _ < _, _._2))

val backend = HailContext.backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you get rid of this upstack, but I'm curious if there's a simple rule for when we get the backend via an ExecuteContext, vs when we still need to get a HailContext (for now, later using a different mechanism). Is it just a compile-time vs runtime distinction?

Copy link
Member Author

@ehigham ehigham Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been proceeding on the basis of avoiding "global" mutable fields entirely, instead favouring dependency injection in code that we maintain (via ExecuteContext in this case). For generated code, using a constant pool or some such is probably the right thing to do, so long as it doesn't depend on non-generated code.

For Backend specifically, my intention is that that ref in the upcoming change should only be used by BackendUtils.collectDArray . I hope is to remove the mutable ref eventually, either by

  • code-generating parallelizeAndComputeWithIndex,
  • by initialising a "constant" field in the generated code with either
    • the backend in a similar way to reference genomes etc
    • the BackendContext that's passed to collectDArray

I'm not sure if that answers your question properly...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might just do that last one now...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that answers your question properly...

It does, thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch 3 times, most recently from c84b0e3 to 1fb57c6 Compare September 25, 2025 16:30
@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch 3 times, most recently from f8dccee to 7bd6dc4 Compare September 25, 2025 17:09
Copy link
Member

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change!

@ehigham ehigham force-pushed the ehigham/remove-hail-context-references branch from 7bd6dc4 to 52f5a2d Compare September 25, 2025 18:37
@hail-ci-robot hail-ci-robot merged commit 69d8951 into main Sep 25, 2025
2 checks passed
@hail-ci-robot hail-ci-robot deleted the ehigham/remove-hail-context-references branch September 25, 2025 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants