Explore modified communication routines for the improved staggered Dslash operator

At the moment, each application of the improved staggered Dslash operator involves the communication of a quark-field boundary layer 3 lattice-sites wide between GPUs operating on neighboring lattice domains. To date,  linear solves using half-precision outside of the preconditioner have proven to be unstable, while solvers that use mixed single-double precision work. However, perhaps we could use half precision instead of single precision in the Dslash communication routines.  We could even use a mixed-precision communication routine, where 1/3 of the boundary-layer data is communicated in single-precision and the remainder - the data needed for the three-hop term - is communicated in half-precision.  This is motivated by the fact that the three-hop coefficient is numerically very small and omitting this term in the preconditioner has a negligible effect on solver convergence. Similarly, we might consider using reconstruct 9 for the long links in the single-precision exterior Dslash kernels.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explore modified communication routines for the improved staggered Dslash operator #119

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Explore modified communication routines for the improved staggered Dslash operator #119

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions