Skip to content

Latest commit

 

History

History
113 lines (96 loc) · 6.53 KB

30.abstracting_simulation_types.md

File metadata and controls

113 lines (96 loc) · 6.53 KB

Abstracting Simulation Types

Chunking and Decomposition Strategies

Reading data, particularly data that will not be utilized in a computation, can incur susbtantial overhead, particularly if the data is spread over multiple files on a networked filesystem, where metadata queries can dominate the cost of IO. yt takes the approach of building a coarse-grained index based on the discretization method of the data (particle, grid, octree, unstructured mesh), combining this with datapoint-level indexing for selection processes.

To supplement this, methods in yt that process data utilize a system of data "chunking," whereby segments of data identified during coarse-grained indexing are subdivided by one of a few different schemes and yielded to the iterating function; these schemes can include a limited number of tuning parameters or arguments. These three chunking methods are all, spatial and io. The all method simply returns a single, one-dimensional array, and the number of chunks is always exactly one; this enables both non-parallel algorithms and simple access to small datasets. spatial chunking yields three-dimensional arrays. For grid-based datasets, these are the grids, while for particle and octree datasets they are leaf-by-leaf collections of particles or mesh values. Optionally, the spatial chunking method can return "ghost zones" around regions, for computation of stencils. The final type of chunking, io, is designed to iterate over sets of data in a manner that is most conducive to pipelined IO. These will not always be load-balanced in size of the returned chunks, however. In some cases, io chunking may return one file at a time (in the case of spreading items across many different files), while in others it may be returning sub-components of a single file. This chunking type is the most common strategy for parallel-decomposition.

Necessarily, both indexing and selection methods must be implemented to expose these different chunking interfaces; yt utilizes specific methods for each of the primary data types that it can access. We detail these below, specifically describing how they are implemented and how they can be improved in future iterations.

Grid Analysis

yt was originally written to support the Enzo code, which is a patch-based Adaptive Mesh Refinement (AMR) simulation platform. Analysis of grid-based data is the most frequent application of yt. While we discuss much of the techniques implemented for datasets consisting of multiple, potentially overlapping grids, yt also supports single-grid datasets (such as FITS cubes) and is able to decompose them for parallel analysis.

yt also supports other grid patch codes insert list here

yt supports several different "features" of patch-based codes. These include grids that span multiple parent objects, grids that overlap with coarser data (i.e., AMR), grids that overlap with other grids that provide the same level of resolution of data (i.e., grids at the same AMR level), refinement factors that vary based on level, and edge- and vertex-centered data. For the cases of overlapping grids (either on the same or higher refinement levels) masks are generated that indicate which data is considered authoritative.

As noted in Data Objects, the process of selecting points is multi-step, starting at coarse selection that may be at the file level, and proceeding to selection of specific data points that are included in a selector. For grid-based data, the coarse selection stage proceeds in an extremely simple fashion, by iterating over flat arrays of left and right grid edges and creating a bitmap of the selected grids. Because this method -- while not taking advantage of any data structures of even mild sophistication -- is able to take advantage of pipelining and cache-optimization, we have found that it is sufficiently performant in most geometries up to approximately $10^6$ grid objects. In those cases, the distinction between "wide and shallow" grid structures (where refinement occurs essentially everywhere, but not to a great degree) and "thin and deep" grid structures (where refinement occurs in essentially one location but to very high levels), as well as the specific selection process, impact the overall performance. The second-stage selection occurs within individual grids, where points are selected based on the data point center. In the case of cell-centered data, this returns an array of size $N$ where $N$ is the number of points selected; in the case of 3D vertex-centered data, this would be $(N,8)$. Andrew Myers: check this?

Indexing grid data in yt is optimized for systems of grids that tend to have larger grid patches, rather than smaller; specifically, in yt each grid patch consists of a Python object, which adds a bit of overhead. In the limit of many more cells than grid objects, this overhead is small, but in cases where the number of grids is $O(10^7)$ this can become prohibitive. These cases are becoming more common even for medium-scale simulations.

To address both the memory overhead and the python overhead, as well as more generally address potential scalability issues with grid selection, we have begun implementation of a more sophisticated "grid visitors" indexing and selection method. This draws on the approach used by the oct-visitors (described below). A spatial tree is constructed, wherein parent/child relationships are established between grids. Each process -- selection, copying of data, generation of coordinates -- is represented by an instance of a GridVisitor object. The tree is recursively traversed, and for all selected points the object is called. This allows grids, their relationships, and the data masks to be stored in structures and forms that are both optimized and compressed. This method is essential for scaling to a large number of grid patches; the storage requirements of a single grid patch Python object are around 1K per object (about one gigabyte per million grids), whereas the optimized storage reduces this to approximately 140 bytes (about one gigabyte per eight million grids), with further reductions possible; for selection operations, we are also able to reduce the number of temporary arrays and utilize compressed mask representation, bringing peak memory usage down further. The spatial-tree optimization substantially increases performance for "wide and shallow" dataset selection.

Octree Analysis {#sec:octree_analysis}

SPH Analysis

Unstructured Mesh Analysis

Non-Cartesian Coordinates {#sec:noncartesian}