-
Notifications
You must be signed in to change notification settings - Fork 256
[query] remove uses of HailContext.backend
#14964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
76131f2 to
20d92fc
Compare
3adc65e to
11403ed
Compare
20d92fc to
219a9e3
Compare
11403ed to
2e67e7b
Compare
219a9e3 to
c5ac897
Compare
2e67e7b to
417e8a3
Compare
417e8a3 to
e739136
Compare
d69c346 to
c870e2a
Compare
2d38548 to
c3bb8a1
Compare
c870e2a to
d32eac1
Compare
c3bb8a1 to
24a200f
Compare
d32eac1 to
214074f
Compare
24a200f to
ae470ba
Compare
214074f to
4d26e71
Compare
ae470ba to
c164f2f
Compare
4d26e71 to
32b7556
Compare
c164f2f to
4538bfd
Compare
dfa72f0 to
aaf34a2
Compare
9d81d62 to
4f0cd2b
Compare
aaf34a2 to
c75ea75
Compare
4f0cd2b to
c070764
Compare
c75ea75 to
3cc18ef
Compare
c070764 to
8834837
Compare
33d0b37 to
840e82a
Compare
8760f47 to
2d586d3
Compare
| def parseVCFMetadata(fs: FS, file: String): Map[String, Map[String, Map[String, String]]] = | ||
| LoadVCF.parseHeaderMetadata(fs, Set.empty, TFloat64, file) | ||
|
|
||
| def pyParseVCFMetadataJSON(fs: FS, file: String): String = { | ||
| val metadata = LoadVCF.parseHeaderMetadata(fs, Set.empty, TFloat64, file) | ||
| implicit val formats = defaultJSONFormats | ||
| JsonMethods.compact(Extraction.decompose(metadata)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused/implemented in BackendRpc
2d586d3 to
c92b622
Compare
chrisvittal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a thought to avoid the 'hack' you put in. Fantastic work though.
| def pathsUsed: Seq[String] = FastSeq(params.path) | ||
|
|
||
| val getNumPartitions: Int = params.nPartitions.getOrElse(HailContext.backend.defaultParallelism) | ||
| val getNumPartitions: Int = params.nPartitions.getOrElse(4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought, make this parameter required, and then supply it with a default from python.
This is where we construct this node in python. What if we were to get the parallelism from the branching_factor flag here?
hail/hail/python/hail/linalg/blockmatrix.py
Lines 1773 to 1827 in e899546
| @typecheck_method(n_partitions=nullable(int), maximum_cache_memory_in_bytes=nullable(int)) | |
| def to_table_row_major(self, n_partitions=None, maximum_cache_memory_in_bytes=None): | |
| """Returns a table where each row represents a row in the block matrix. | |
| The resulting table has the following fields: | |
| - **row_idx** (:py:data.`tint64`, key field) -- Row index | |
| - **entries** (:py:class:`.tarray` of :py:data:`.tfloat64`) -- Entries for the row | |
| Examples | |
| -------- | |
| >>> import numpy as np | |
| >>> block_matrix = BlockMatrix.from_numpy(np.array([[1, 2], [3, 4], [5, 6]]), 2) | |
| >>> t = block_matrix.to_table_row_major() | |
| >>> t.show() | |
| +---------+---------------------+ | |
| | row_idx | entries | | |
| +---------+---------------------+ | |
| | int64 | array<float64> | | |
| +---------+---------------------+ | |
| | 0 | [1.00e+00,2.00e+00] | | |
| | 1 | [3.00e+00,4.00e+00] | | |
| | 2 | [5.00e+00,6.00e+00] | | |
| +---------+---------------------+ | |
| Parameters | |
| ---------- | |
| n_partitions : int or None | |
| Number of partitions of the table. | |
| maximum_cache_memory_in_bytes : int or None | |
| The amount of memory to reserve, per partition, to cache rows of the | |
| matrix in memory. This value must be at least large enough to hold | |
| one row of the matrix in memory. If this value is exactly the size of | |
| one row, then a partition makes a network request for every row of | |
| every block. Larger values reduce the number of network requests. If | |
| memory permits, setting this value to the size of one output | |
| partition permits one network request per block per partition. | |
| Notes | |
| ----- | |
| Does not support block-sparse matrices. | |
| Returns | |
| ------- | |
| :class:`.Table` | |
| Table where each row corresponds to a row in the block matrix. | |
| """ | |
| path = new_temp_file() | |
| if maximum_cache_memory_in_bytes and maximum_cache_memory_in_bytes > (1 << 31) - 1: | |
| raise ValueError( | |
| f'maximum_cache_memory_in_bytes must be less than 2^31 -1, was: {maximum_cache_memory_in_bytes}' | |
| ) | |
| self.write(path, overwrite=True, force_row_major=True) | |
| reader = TableFromBlockMatrixNativeReader(path, n_partitions, maximum_cache_memory_in_bytes) | |
| return Table(TableRead(reader)) |
| val remainingPartitions = | ||
| contexts.indices.filterNot(k => cachedResults.containsOrdered[Int](k, _ < _, _._2)) | ||
|
|
||
| val backend = HailContext.backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you get rid of this upstack, but I'm curious if there's a simple rule for when we get the backend via an ExecuteContext, vs when we still need to get a HailContext (for now, later using a different mechanism). Is it just a compile-time vs runtime distinction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been proceeding on the basis of avoiding "global" mutable fields entirely, instead favouring dependency injection in code that we maintain (via ExecuteContext in this case). For generated code, using a constant pool or some such is probably the right thing to do, so long as it doesn't depend on non-generated code.
For Backend specifically, my intention is that that ref in the upcoming change should only be used by BackendUtils.collectDArray . I hope is to remove the mutable ref eventually, either by
- code-generating
parallelizeAndComputeWithIndex, - by initialising a "constant" field in the generated code with either
- the backend in a similar way to reference genomes etc
- the
BackendContextthat's passed tocollectDArray
I'm not sure if that answers your question properly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might just do that last one now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if that answers your question properly...
It does, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c84b0e3 to
1fb57c6
Compare
f8dccee to
7bd6dc4
Compare
patrick-schultz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great change!
7bd6dc4 to
52f5a2d
Compare

This change removes uses of
HailContext.backend as part of an effort to remove theHailContext singleton. Instead, the currentBackend is accessed by threadingExecuteContext.This change cannot impact the Hail Batch instance as deployed by Broad Institute in GCP