STORMM Source Documentation
|
An object to manage the spatial decomposition of a system of particles. The general strategy is to arrange particles in a grid of cells, each at least half the direct-space cutoff in width in all directions. A work unit entails computing all interactions assigned to one of the cells. This is done using a neutral-territory decomposition the lines of the "tower-plate" arrangement, which consists of seventeen cells: More...
#include <cellgrid.h>
Public Member Functions | |
int | getSystemCount () const |
Get the number of systems in the object. | |
int | getTotalCellCount () const |
Get the total number of cells allocated in the object. | |
uint | getCellBaseCapacity () const |
Get the base capacity of any particular cell in the grid. All systems' cells share the same base capacity. | |
double | getCutoff () const |
Get the particle-particle cutoff defining the neighbor list. | |
double | getPadding () const |
Get the padding incorporated into each neighbor list cell. | |
double | getEffectiveCutoff () const |
Get the effective cutoff. | |
NonbondedTheme | getTheme () const |
Get the non-bonded theme of this cell grid–does it hold atoms with only electrostatic or van-der Waals properties? | |
TinyBoxPresence | getTinyBoxPresence () const |
Get the presence of a tiny unit cell anywhere in the synthesis. | |
CoordinateCycle | getCyclePosition () const |
Get the position in the time cycle. | |
int | getPositionScalingBits () const |
Get the number of bits in the fixed-precision format. | |
int | getMeshSubdivisions () const |
Get the number of mesh elements spanning each spatial decomposition cell. The "mesh" refers to a particle-mesh interaction grid, e.g. the PME reciprocal space lattice. | |
const PhaseSpaceSynthesis * | getCoordinateSynthesisPointer () const |
Get a const pointer to the coordinate synthesis served by this object. | |
const AtomGraphSynthesis * | getTopologySynthesisPointer () const |
Get a const pointer to the topology synthesis served by this object. | |
const CellGrid< T, Tacc, Tcalc, T4 > * | getSelfPointer () const |
Get a pointer to the object itself, useful if the object is only available by const reference. | |
void | initializeForces (HybridTargetLevel tier=HybridTargetLevel::HOST, const GpuDetails &gpu=null_gpu) |
Initialize forces for the cell grid, on the CPU host or GPU device. This standalone feature provides a means for performing this activity at will, although in the most optimized code the process it performs will be fused with some other kernel. This will initialize forces for the current cell layout and should only be called once the contents of image_cell_limits or image_cell_limits_alt have been settled, whichever is appropriate to the current time cycle. There is only one array of forces in the CellGrid, as once they are accumulated in this object they are added to one of the two time-cycle dependent arrays in the accompanying PhaseSpaceSynthesis. | |
void | markImageRelevance (HybridTargetLevel tier=HybridTargetLevel::HOST, const GpuDetails &gpu=null_gpu) |
Compute whether various atoms may be within range of the tower when they are in the plate of various neutral territory decompositions. Bitwise information (1 for relevant, 0 for irrelevant) is recorded in a 16-bit unsigned integer for each atom in the current image, as indicated by its stage in the coordinate cycle. | |
CellGrid (const PhaseSpaceSynthesis *poly_ps_ptr_in, const AtomGraphSynthesis *poly_ag_ptr_in, double cutoff_in, double padding_in, int mesh_subdivisions_in, NonbondedTheme theme_in, uint cell_base_capacity_in=default_cellgrid_base_capacity, ExceptionResponse policy_in=ExceptionResponse::WARN) | |
The constructor accepts an associated PhaseSpaceSynthesis object and a precision model. The system count and scaling factors will be set by the coordinate synthesis. The total cell count and cell layout can be modulated by specifying a cutoff and a padding related to how much larger each cell should be sized. | |
CellGrid (const PhaseSpaceSynthesis *poly_ps_ptr_in, const AtomGraphSynthesis &poly_ag_ptr_in, double cutoff_in, double padding_in, int mesh_subdivisions_in, NonbondedTheme theme_in, uint cell_base_capacity_in=default_cellgrid_base_capacity, ExceptionResponse policy_in=ExceptionResponse::WARN) | |
CellGrid (const PhaseSpaceSynthesis &poly_ps_ptr_in, const AtomGraphSynthesis &poly_ag_ptr_in, double cutoff_in, double padding_in, int mesh_subdivisions_in, NonbondedTheme theme_in, uint cell_base_capacity_in=default_cellgrid_base_capacity, ExceptionResponse policy_in=ExceptionResponse::WARN) | |
CellGrid (const CellGrid &original) | |
The presence of POINTER-kind Hybrid objects in the cell origin rulers invalidates the default copy and move constructors as well as assignment operators. Manual implementations are needed. | |
CellGrid (CellGrid &&original) | |
CellGrid & | operator= (const CellGrid &original) |
CellGrid & | operator= (CellGrid &&original) |
int | getCellCount (int index) const |
Get the number of cells allotted to any one system. | |
int | getCellCount (int index, UnitCellAxis axis) const |
int | getCellCount (int index, CartesianDimension axis) const |
uint | getImageIndex (int system_index, int atom_index, CoordinateCycle orientation) const |
Find the image location of an atom based on its system and topological atom number. | |
uint | getImageIndex (int system_index, int atom_index) const |
int4 | getCellLocation (int system_index, int atom_index, CoordinateCycle orientation) const |
Find the cell grid location of an atom based on its system and topological atom number. The result will be returned as the index of the cell within the grid belonging to the system of interest, with the indices along the A, B, and C axes placed in the "x", "y", and "z" members of the tuple. The overall index of the cell in the entire CellGrid object (up to 2^28 cells) will be returned in the "w" member of the tuple. All of this information is available in bit-packed form in the system_cell_limits array. Overloading and descriptions of parameters follow from getImageIndex() above. | |
int4 | getCellLocation (int system_index, int atom_index) const |
int2 | getChainLocation (int system_index, int atom_index, CoordinateCycle orientation) const |
Get the chain index of an atom based on its system and topological atom index. The chain index is returned in the "x" member of the tuple while the offset within the chain is returned in the "y" member. Overloading and descriptions of parameters follow from getImageIndex() above. | |
int2 | getChainLocation (int system_index, int atom_index) const |
const PsSynthesisBorders | getUnitCellTransforms (CoordinateCycle orientation, HybridTargetLevel tier=HybridTargetLevel::HOST) const |
Get a miniature abstract of the underlying coordinate synthesis containing pointers to the arays of unit cell transformation matrices for each system. | |
const PsSynthesisBorders | getUnitCellTransforms (HybridTargetLevel tier=HybridTargetLevel::HOST) const |
CellGridWriter< T, Tacc, Tcalc, T4 > | data (CoordinateCycle orientation, HybridTargetLevel tier=HybridTargetLevel::HOST) |
Obtain the object's abstract in order to access its members in C-programming fashion, whether on the CPU host or GPU device. | |
CellGridWriter< T, Tacc, Tcalc, T4 > | data (HybridTargetLevel tier=HybridTargetLevel::HOST) |
const CellGridReader< T, Tacc, Tcalc, T4 > | data (CoordinateCycle orientation, HybridTargetLevel tier=HybridTargetLevel::HOST) const |
const CellGridReader< T, Tacc, Tcalc, T4 > | data (HybridTargetLevel tier=HybridTargetLevel::HOST) const |
CellGridWriter< void, void, void, void > | templateFreeData (CoordinateCycle orientation, HybridTargetLevel tier=HybridTargetLevel::HOST) |
Obtain the object's abstract with templating removed. Overloading and descriptions of parameters follow from data() above. | |
CellGridWriter< void, void, void, void > | templateFreeData (HybridTargetLevel tier=HybridTargetLevel::HOST) |
const CellGridReader< void, void, void, void > | templateFreeData (CoordinateCycle orientation, HybridTargetLevel tier=HybridTargetLevel::HOST) const |
const CellGridReader< void, void, void, void > | templateFreeData (HybridTargetLevel tier=HybridTargetLevel::HOST) const |
const CellOriginsReader | getRulers (CoordinateCycle orientation, HybridTargetLevel tier=HybridTargetLevel::HOST) const |
Obtain the object's rulers for determining the origins of each neighbor list cell in terms of the fixed-precision coordinate system of the accompanying PhaseSpaceSynthesis. | |
const CellOriginsReader | getRulers (HybridTargetLevel tier=HybridTargetLevel::HOST) const |
CellOriginsWriter | getRulers (CoordinateCycle orientation, HybridTargetLevel tier=HybridTargetLevel::HOST) |
CellOriginsWriter | getRulers (HybridTargetLevel tier=HybridTargetLevel::HOST) |
void | contributeForces (PhaseSpaceSynthesis *dest, HybridTargetLevel tier, const GpuDetails &gpu) const |
Contribute forces accumulated in the cell grid back to a coordinate synthesis with its own force accumulators and forces from other sources. | |
void | contributeForces (HybridTargetLevel tier=HybridTargetLevel::HOST, const GpuDetails &gpu=null_gpu) const |
void | updatePositions (PhaseSpaceSynthesis *dest, HybridTargetLevel tier, const GpuDetails &gpu) |
Update the positions and cells of residence for all particles based on the current positions in the associated synthesis. Overloading and descriptions of input parameters follow from contributeForces(), above. | |
void | updatePositions (HybridTargetLevel tier=HybridTargetLevel::HOST, const GpuDetails &gpu=null_gpu) |
void | updateCyclePosition () |
Update the object's cycle position as it participates in the WHITE:BLACK tick-tock mechanics of other coordinate objects. | |
void | updateCyclePosition (CoordinateCycle time_point) |
An object to manage the spatial decomposition of a system of particles. The general strategy is to arrange particles in a grid of cells, each at least half the direct-space cutoff in width in all directions. A work unit entails computing all interactions assigned to one of the cells. This is done using a neutral-territory decomposition the lines of the "tower-plate" arrangement, which consists of seventeen cells:
---X--- ^ ---X--- ^ ------- ^ ---X--- | ---X--- | ------- |
View along A: —XXX- c View along B: -XXXXX- c View along C: —XXX- b —X— —X— -XXXXX- —X— b-> —X— a-> -XXXXX- a->
As with other grids in STORMM, the cell grid index increments most rapidly along the A axis. Furthermore, the atoms of all cells in a stack along A are arranged in contiguous memory, with no spacing between them. Each row of cells along a mesh's A axis is arranged with a degree of padding so that the population of any particular row may fluctuate within some anticipated bounds. Memory access is optimized by arranging the tower along the C axis, providing continuous atom flow for 3, 5, and 5 cells when loading the larger plate. Atoms in the tower will interact with the most other atoms and require one global write (atomicAdd) to commit their forces back to the totals at the end of the work unit. Therefore, it is most efficient to have these atoms be the least contiguous of the whole.
In all cells, cache behavior is optimized by storing coordinates and properties as four-tuples such that only eight atoms in single-precision mode (or four in double-precision mode) are required to fill a cache line.
The work cycle is as follows: 1 .) A KERNEL COMPUTING VALENCE INTERACTIONS will accumulate all forces, including non-bonded contributions originating in this kernel on a previous cycle, and move particles. Following moves, this VALENCE INTERACTION KERNEL will compute new cell addresses for each atom and populate the vestibules of particles entering and leaving each cell. The VALENCE INTERACTION KERNEL will also deposit the new, scaled coordinates of each cell in the previous cycle's atoms array, which is no longer in use. 2.) A NEIGHBOR LIST UPDATE KERNEL will fire off ahead of the non-bonded kernels. This will handle re-imaging of coordinates and updates to cells. Taking the (now complete) vestibules of entering and exiting particles, the kernel begins by performing an out-of-place sort to adjust each cell in the mesh. Each warp performs a prefix sum over the adjustments in the particular cell's column in order to arrive at the limits within which to transfer particles, first from the cell's contents in the old atoms array (so long as they are to be retained) and then by referencing the prior indices of particles that are moving in and transferring the particles from their prior cells, but will likely be hidden by the memory accesses happening at the same time. Particles exit cells, causing contractions of the list of bitmasks and then of the bitmasks themselves. Particles leaving the plate cell prunes the array of bitmasks while particles leaving the tower prunes the bitmasks themselves–see notes in this library's template implementation file. Likewise, particles entering the tower cells will add bits to the back of each mask, and particles entering cells of the plate will expand the arrays of bitmasks in various places. The exact list of particles in each cell will be in flux, but with the identities of particles entering and the indices (positions within the cell grid image) of particles exiting already established in the preceding kernel, the non-bonded lists can be updated without risk of any race conditions. One warp will be tasked with updating each cell's atom coordinates while a second will be tasked with updating its exclusion masks. Both the atom coordinate update and the neighbor list update are performed out-of-place. 3 .) Taking the newly updated neighbor lists, the RECIPROCAL-SPACE KERNEL will proceed to assemble the the FFT pencils and use strong barriers in order to complete the convolution. 4 .) The DIRECT-SPACE KERNEL will launch to complete the work, and perform interpolation of long-ranged particle-particle interactions if that is required.
stormm::energy::CellGrid< T, Tacc, Tcalc, T4 >::CellGrid | ( | const PhaseSpaceSynthesis * | poly_ps_ptr_in, |
const AtomGraphSynthesis * | poly_ag_ptr_in, | ||
double | cutoff_in, | ||
double | padding_in, | ||
int | mesh_subdivisions_in, | ||
NonbondedTheme | theme_in, | ||
uint | cell_base_capacity_in = default_cellgrid_base_capacity, | ||
ExceptionResponse | policy_in = ExceptionResponse::WARN ) |
The constructor accepts an associated PhaseSpaceSynthesis object and a precision model. The system count and scaling factors will be set by the coordinate synthesis. The total cell count and cell layout can be modulated by specifying a cutoff and a padding related to how much larger each cell should be sized.
mesh_subdivisions_in | The number of mesh elements along each axis of one of the spatial decomposition cell. This value, times the cell grid dimensions themselves, defines the size of a particle-mesh interaction grid associated with the neighbor list, i.e. the PME reciprocal space grid. |
theme_in | The non-bonded potential that will be computed based on the neighbor list represented in the cell grid |
gpu | Details of the GPU that will utilize and manipulate the cell grid |
cell_base_capacity_in | The atom capacity of an individual decomposition cell. This is set to be larger than the atom content is likely to go, and in all cases where it might be relevant multiple cells will pool their capacities to ensure that overflow is extremely unlikely. |
policy_in | Course of action in the case of bad input |
stormm::energy::CellGrid< T, Tacc, Tcalc, T4 >::CellGrid | ( | const CellGrid< T, Tacc, Tcalc, T4 > & | original | ) |
The presence of POINTER-kind Hybrid objects in the cell origin rulers invalidates the default copy and move constructors as well as assignment operators. Manual implementations are needed.
original | The original object to copy or move |
other | Another object placed on the right hand side of the assignment statement |
void stormm::energy::CellGrid< T, Tacc, Tcalc, T4 >::contributeForces | ( | PhaseSpaceSynthesis * | dest, |
HybridTargetLevel | tier, | ||
const GpuDetails & | gpu ) const |
Contribute forces accumulated in the cell grid back to a coordinate synthesis with its own force accumulators and forces from other sources.
Overloaded:
dest | Specify a different synthesis to receive the accumulated forces |
tier | Indicate whether to initialize forces on the CPU host or GPU device |
gpu | Details of the GPU that will perform the initialization of device memory |
CellGridWriter< T, Tacc, Tcalc, T4 > stormm::energy::CellGrid< T, Tacc, Tcalc, T4 >::data | ( | CoordinateCycle | orientation, |
HybridTargetLevel | tier = HybridTargetLevel::HOST ) |
Obtain the object's abstract in order to access its members in C-programming fashion, whether on the CPU host or GPU device.
Overloaded:
orientation | The point in the time cycle at which to obtain some pointers |
tier | Indicate whether to target pointers to memory on the CPU host or GPU device |
int stormm::energy::CellGrid< T, Tacc, Tcalc, T4 >::getCellCount | ( | int | index | ) | const |
Get the number of cells allotted to any one system.
Overloaded:
index | The system of interest |
axis | The axis of interest |
uint stormm::energy::CellGrid< T, Tacc, Tcalc, T4 >::getImageIndex | ( | int | system_index, |
int | atom_index, | ||
CoordinateCycle | orientation ) const |
Find the image location of an atom based on its system and topological atom number.
Overloaded:
system_index | The system of interest within the associated synthesis |
atom_index | Topological atom index within the system |
orientation | Specify whether to look in the primary or |
const CellOriginsReader stormm::energy::CellGrid< T, Tacc, Tcalc, T4 >::getRulers | ( | CoordinateCycle | orientation, |
HybridTargetLevel | tier = HybridTargetLevel::HOST ) const |
Obtain the object's rulers for determining the origins of each neighbor list cell in terms of the fixed-precision coordinate system of the accompanying PhaseSpaceSynthesis.
Overloaded:
orientation | The point in the time cycle at which to obtain the rulers |
tier | Indicate whether to target pointers to memory on the CPU host or GPU device |
const PsSynthesisBorders stormm::energy::CellGrid< T, Tacc, Tcalc, T4 >::getUnitCellTransforms | ( | CoordinateCycle | orientation, |
HybridTargetLevel | tier = HybridTargetLevel::HOST ) const |
Get a miniature abstract of the underlying coordinate synthesis containing pointers to the arays of unit cell transformation matrices for each system.
Overloaded:
orientation | Indicate whether to get the transformation matrices for the WHITE or BLACK stages of the coordinate cycle in the underlying synthesis. If unspecified, the coordinate cycle position of the CellGrid itself will be submitted. |
tier | Indicate whether to retrieve pointers to memory on the CPU host or GPU device |
void stormm::energy::CellGrid< T, Tacc, Tcalc, T4 >::initializeForces | ( | HybridTargetLevel | tier = HybridTargetLevel::HOST, |
const GpuDetails & | gpu = null_gpu ) |
Initialize forces for the cell grid, on the CPU host or GPU device. This standalone feature provides a means for performing this activity at will, although in the most optimized code the process it performs will be fused with some other kernel. This will initialize forces for the current cell layout and should only be called once the contents of image_cell_limits or image_cell_limits_alt have been settled, whichever is appropriate to the current time cycle. There is only one array of forces in the CellGrid, as once they are accumulated in this object they are added to one of the two time-cycle dependent arrays in the accompanying PhaseSpaceSynthesis.
tier | Indicate whether to initialize forces on the CPU host or GPU device |
gpu | Details of the GPU that will perform the initialization of device memory |
void stormm::energy::CellGrid< T, Tacc, Tcalc, T4 >::markImageRelevance | ( | HybridTargetLevel | tier = HybridTargetLevel::HOST, |
const GpuDetails & | gpu = null_gpu ) |
Compute whether various atoms may be within range of the tower when they are in the plate of various neutral territory decompositions. Bitwise information (1 for relevant, 0 for irrelevant) is recorded in a 16-bit unsigned integer for each atom in the current image, as indicated by its stage in the coordinate cycle.
On GPU resources, the functionality in this routine is fused with particle migration.
The bitwise map is such that any work unit with a given home cell and twelve surrounding "plate" cells 0 (-2 cells along the A axis, -2 cells along the B axis relative to the home cell), 1 (-1, -2 cells along the A and B axes), ..., 4 (+2, -2), ..., 9 (+2, -1), 10 (-2, 0) and 11 (-1, 0) will look in bits 0, 1, ..., 11 to determine whether the atom is relevant. The relevance of atoms in the "tower" cells 0 (-2 along the C axis relative to the home cell), 1 (-1 along the C axis), 3 (+1 along the C axis), and 4 (+2 along the C axis) can be tested by looking at bits 12, 13, 14, and 15, respectively. Only atoms in the bottom and top cells of the tower might be culled due to being more than the cutoff from the nearest plane bounding the plate. Similarly, atoms in the "inner" four cells of the plate (6, 7, 8, and 10) will almost always be relevant. These facts will be utilized when filling the bit mask: only two cells of the tower (for which the calculation is a simple comparison of the Cartesian Z position and the height of a neighbor list cell) and eight cells of the plate are needed. If the unit cell is orthorhombic, the calculations for eight plate cells are likewise very simple. If it is not, the calculations involve finding the location of a point along which to situate a line parallel to the unit cell C axis, then finding the distance of the particle to that line.
tier | Indicate whether to update the relevance array on the CPU host or GPU device |
gpu | Specifications of the GPU that will carry out calculations |
void stormm::energy::CellGrid< T, Tacc, Tcalc, T4 >::updateCyclePosition | ( | ) |
Update the object's cycle position as it participates in the WHITE:BLACK tick-tock mechanics of other coordinate objects.
Overloaded:
time_point | The point in the time cycle which the object is to take as "current" |