STORMM Source Documentation
Loading...
Searching...
No Matches
stormm::synthesis::NonbondedWorkUnit Class Reference

Collect a series of tiles for non-bonded computations as well as the required atom imports to carry them out. This will accomplish the task of planning the non-bonded computation, given a single topology or a synthesis of topologies, to optimize the thread utilization on a GPU. More...

#include <nonbonded_workunit.h>

Public Member Functions

int getTileCount () const
 Get the tile count of this work units.
 
int getImportCount () const
 Get the number of tile_length atom imports needed for this work units.
 
int getInitializationMask () const
 Get the mask for initializing per-atom properties on any of the imported atom groups.
 
int4 getTileLimits (int index) const
 Get the abscissa and ordinate atom limits for a tile from within this work unit. The abscissa limits are returned in the x and y members, the ordinate limits in the z and w members.
 
const std::vector< uint2 > & getTileInstructions () const
 Get the list of tile instructions for this work unit.
 
std::vector< int > getAbstract (int instruction_start=0) const
 Get an abstract for this work unit, layed out to work within an AtomGraphSynthesis. For TILE_GROUPS-type work units serving systems in isolated boundary conditions, the non-bonded abstract has the following format:
 
void setInitializationMask (int mask_in)
 Set the initialization mask for atomic properties that contribute once to an accumulated sum, i.e. contributions of baseline atomic radii to the Generalized Born effective radii.
 
void setRefreshAtomIndex (int index_in)
 Set the atom index at which to begin accumulator refreshing. The actual work done depends on the accumulator refresh code set by the next member function.
 
void setRefreshWorkCode (int code_in)
 Set the accumulator refreshing instructions for this work unit, i.e. "set X force accumulators to zero for this many atoms." The refreshing starts at the atom index set by the preceding member function.
 
 NonbondedWorkUnit (const StaticExclusionMask &se, const std::vector< int3 > &tile_list)
 The constructor accepts an exclusion mask and a list of nonbonded interaction tiles to compute. Non-bonded interaction tiles go according to "abscissa atoms" and "ordinate atoms," although both abscissa and ordinate indices refer to the same list of imported atoms. There are three cases to consider:
 
 NonbondedWorkUnit (const StaticExclusionMask &se, int abscissa_start, int ordinate_start)
 
 NonbondedWorkUnit (const StaticExclusionMaskSynthesis &se, const std::vector< int3 > &tile_list)
 
 NonbondedWorkUnit (const StaticExclusionMaskSynthesis &se, int abscissa_start, int ordinate_start, int system_index)
 
 NonbondedWorkUnit (const NonbondedWorkUnit &original)=default
 Take the default copy and move constructors as well as assignment operators for this object, which uses only Standard Template Library member variable types.
 
 NonbondedWorkUnit (NonbondedWorkUnit &&original)=default
 
NonbondedWorkUnitoperator= (const NonbondedWorkUnit &other)=default
 
NonbondedWorkUnitoperator= (NonbondedWorkUnit &&other)=default
 

Detailed Description

Collect a series of tiles for non-bonded computations as well as the required atom imports to carry them out. This will accomplish the task of planning the non-bonded computation, given a single topology or a synthesis of topologies, to optimize the thread utilization on a GPU.

Constructor & Destructor Documentation

◆ NonbondedWorkUnit() [1/2]

stormm::synthesis::NonbondedWorkUnit::NonbondedWorkUnit ( const StaticExclusionMask & se,
const std::vector< int3 > & tile_list )

The constructor accepts an exclusion mask and a list of nonbonded interaction tiles to compute. Non-bonded interaction tiles go according to "abscissa atoms" and "ordinate atoms," although both abscissa and ordinate indices refer to the same list of imported atoms. There are three cases to consider:

Static exclusion mask, tiny up to large work units: This will be the majority of the cases with implicit-solvent systems. In practice, the non-bonded work unit becomes a list of 50 integers. A complete description is available in the documentation for the getAbstract() member function of this object. These cases will run with a 256-thread block size.

Static exclusion mask, huge work units: This will handle cases of implicit-solvent systems with sizes so large that millions of smaller work units would be required to cover everything. In practice, the non-bonded work unit is reduced to a list of 8 integers, now representing the lower limits of the abscissa and ordinate atoms to import, and the numbers of atoms to import along each axis, in a supertile for which the work unit is to compute all interactions. There is no meaningful list of all interactions in this case, as it might be prohibitive even to store such a thing. Instead, the work unit will proceed over all tiles in the supertile after computing whether it lies along the diagonal. This will require a larger block size (512 threads minimum, up to 768 depending on the architecture).

Forward exclusion mask: This will handle cases of neighbor list-based non-bonded work units. The work unit assumes a honeycomb-packed image of all atoms in or about the primary unit cell (some honeycomb pencils will straddle the unit cell boundary but their positions will be known as part of the decomposition). The work unit will consist of thirty integers: seven starting locations of atom imports, seven bit-packed integers detailing the lengths of each stretch of atoms (first 20 bits) and the obligatory y- and z- imaging moves to make with such atoms (last 12 bits), seven integers detailing segments of each stretch of imported atoms to replicate in +x (and where in the list of imported atoms to put them), and finally seven integers detailing segments of each stretch of imported atoms to replicate in -x (and where to put the replicas). The final two integers state the starting and ending indices of a list of tile instructions to process. The tile instructions for the neighbor list-based non-bonded work units are much more complex than those for non-bonded work based on a static exclusion mask.

Parameters
seStatic exclusion mask, or synthesis thereof, for one or more systems in isolated boundary conditions
tile_listPaired with a static exclusion mask and non-huge tiles, the specific list of tiles to include for this work unit in the x- and y-members, plus the system index number in the z member (if more than one system is present in a synthesis). Among them, the tiles must not call for importing more than small_block_max_atoms atoms.
abscissa_startThe abscissa axis start of the supertile to process. Paired with a static exclusion mask in extreme circumstances of very large implicit solvent systems. This will call for importing 512 atoms (2 x supertile_length) in the most general case and will require larger thread blocks. If computing for a synthesis of static exclusion masks, the abscissa starting point is given as a relative index within the local system to which the supertile belongs.
ordinate_startThe ordinate axis start of the supertile to process. If computing for a synthesis of static exclusion masks, the ordinate starting point is given as a relative index within the local system to which the supertile belongs.

◆ NonbondedWorkUnit() [2/2]

stormm::synthesis::NonbondedWorkUnit::NonbondedWorkUnit ( const StaticExclusionMaskSynthesis & se,
int abscissa_start,
int ordinate_start,
int system_index )

Obtain the system index

Member Function Documentation

◆ getAbstract()

std::vector< int > stormm::synthesis::NonbondedWorkUnit::getAbstract ( int instruction_start = 0) const

Get an abstract for this work unit, layed out to work within an AtomGraphSynthesis. For TILE_GROUPS-type work units serving systems in isolated boundary conditions, the non-bonded abstract has the following format:

Slot Description


0 The total number of atom group imports for various tiles. Each group is up to 16 atoms, and all atoms pertain to a single system in the synthesis. 1-20 Starting atom indices of each tile group in the topology / coordinate synthesis (the extent of this segment is set by small_block_max_imports) 21-25 Counts of the numbers of atoms in each tile group, with four groups' counts packed into each int. In this scheme, the tile length may be extended up to 128 atoms (256 if these values are read as unsigned ints), but a tile length of 16 appears to be optimal for NVIDIA and probably other architectures as well. 26-27 Starting and ending locations of the tile instruction list. After reading up to 20 tile atom groups, each work unit tries to combine the groups into different combinations of actual tiles. The two interacting groups are given in the x-member of a uint2 tuple, while the starting location of exclusion bit masks for the tile is given in the y-member. The difference between values in slots 26 and 27 indicates the number of tiles to perform, and is designed to be a multiple of eight if possible (anticipating blocks of four or eight warps, with one warp handling each tile). 28-47 System indices within the synthesis to which the atoms in each tile group pertain 48 Bitmask indicating whether the current work unit is the first to import a particular group of atoms and therefore can be tasked with contributing certain per-atom effects when accumulating forces or radii due to those atoms 49-50 Lower and upper limits of atoms for which the work unit is tasked with initializing force or other property accumulators to have them ready for use in future iterations of the molecular mechanics force / energy evaluation cycle. The integer in slot 49 indicates the first atom of a contiguous list, and need to be an atom that the work unit imported for one of its tile computations. The integer in slot 50 is a bit-packed value, with the low 16 bits indicating up to 65,504 (not 65,536) atoms following the first index. The high 16 bits of the integer in slot 50 indicate whether to initialize force X, Y, or Z as well as psi or sum_deijda accumulators for the next iteration of the cycle (one bit per accumulator), as well as whether to perform random number caching for up to 15 cycles. These two slots of the work unit can be altered after the non-bonded work unit list is constructed.

Parameters
instruction_startThe starting point of instructions for this group of tiles, if the work unit will fit on a small thread block, having less than huge_nbwu_tiles tiles. Otherwise no instruction start is needed.

◆ getTileLimits()

int4 stormm::synthesis::NonbondedWorkUnit::getTileLimits ( int index) const

Get the abscissa and ordinate atom limits for a tile from within this work unit. The abscissa limits are returned in the x and y members, the ordinate limits in the z and w members.

Parameters
indexIndex of the tile from within the work unit

◆ setInitializationMask()

void stormm::synthesis::NonbondedWorkUnit::setInitializationMask ( int mask_in)

Set the initialization mask for atomic properties that contribute once to an accumulated sum, i.e. contributions of baseline atomic radii to the Generalized Born effective radii.

Parameters
mask_inThe new mask to use. If the jth bit of this mask is set to 1, the work unit will perform initialization protocols on its jth set of atom imports, i.e. the Born radii derivatives for those atoms will be contributed to the force accumulators.

◆ setRefreshAtomIndex()

void stormm::synthesis::NonbondedWorkUnit::setRefreshAtomIndex ( int index_in)

Set the atom index at which to begin accumulator refreshing. The actual work done depends on the accumulator refresh code set by the next member function.

Parameters
index_inThe starting atom index relevant to this work unit

◆ setRefreshWorkCode()

void stormm::synthesis::NonbondedWorkUnit::setRefreshWorkCode ( int code_in)

Set the accumulator refreshing instructions for this work unit, i.e. "set X force accumulators to zero for this many atoms." The refreshing starts at the atom index set by the preceding member function.

Parameters
code_inThe refesh code to execute

The documentation for this class was generated from the following files: