STORMM Source Documentation
|
A class to guide the implementation of GPU kernels, with selected thread counts per block and block counts per launch grid for a specific GPU based on the workload. This object is constructed in a stepwise manner, with each kind of work unit contributing new launch specifications. More...
#include <core_kernel_manager.h>
Public Member Functions | |
int | getArchBlockMultiplier () const |
Get the architecture-specific block multiplier. This will run a minimum number of blocks per streaming multiprocessor on some cards, specifically NVIDIA's GTX 1080-Ti, when large blocks cannot use more than 32k registers in all. | |
int2 | getValenceKernelDims (PrecisionModel prec, EvaluateForce eval_force, EvaluateEnergy eval_nrg, AccumulationMethod acc_meth, VwuGoal purpose, ClashResponse collision_handling) const |
Get the block and thread counts for a valence kernel. | |
int2 | getPmeValenceKernelDims (PrecisionModel prec, PrecisionModel neighbor_prec, NeighborListKind neighbor_layout, EvaluateEnergy eval_nrg, AccumulationMethod acc_meth, ClashResponse collision_handling) const |
Get the block and thread counts for a PME-compatible valence kernel. All such kernels will take abstracts from one or two neighbor list objects to gather non-bonded forces on particles. Because particle movement is an inherent feature of these kernels, evaluation of forces due to valence interactions is obligatory. | |
int2 | getIntegrationKernelDims (PrecisionModel prec, AccumulationMethod acc_meth, IntegrationStage process) const |
Get the block and thread counts for a stand-alone velocity constraint kernel. | |
int2 | getNonbondedKernelDims (PrecisionModel prec, NbwuKind kind, EvaluateForce eval_force, EvaluateEnergy eval_nrg, AccumulationMethod acc_meth, ImplicitSolventModel igb, ClashResponse collision_handling) const |
Get the block and thread counts for a non-bonded kernel. Parameter descriptions for this function follow from getValenceKernelDims() above, with the addition of: | |
int2 | getBornRadiiKernelDims (PrecisionModel prec, NbwuKind kind, AccumulationMethod acc_meth, ImplicitSolventModel igb) const |
Get the block and thread counts for a Born radii computation kernel. Parameter descriptions for this function follow from getValenceKernelDims() and getNonbondedKernelDims(), above. | |
int2 | getBornDerivativeKernelDims (PrecisionModel prec, NbwuKind kind, AccumulationMethod acc_meth, ImplicitSolventModel igb) const |
Get the block and thread counts for a Born derivative computation kernel. Parameter descriptions for this function follow from getValenceKernelDims() and getNonbondedKernelDims(), above. | |
int2 | getReductionKernelDims (PrecisionModel prec, ReductionGoal purpose, ReductionStage process) const |
Get the block and thread counts for a reduction kernel. | |
int2 | getVirtualSiteKernelDims (PrecisionModel prec, VirtualSiteActivity purpose) const |
Get the block and thread counts for a virtual site placement or force transmission kernel. | |
int2 | getRMSDKernelDims (PrecisionModel prec, RMSDTask order) const |
Get the launch parameters for an RMSD calculation kernel. | |
int2 | getPMEPairsKernelDims (PrecisionModel coord_prec, PrecisionModel calc_prec, NeighborListKind grid_configuration, TinyBoxPresence has_tiny_box, EvaluateForce eval_frc, EvaluateEnergy eval_nrg, ClashResponse mitigation) const |
Get the block and thread counts for a PME pair interactions kernel evaluating the purview of each nieghbor list cell in a neutral territory decomposition. | |
int2 | getMigrationKernelDims (PrecisionModel coord_prec, NeighborListKind grid_configuration, int stage_number, int gpos_bits=0, int chain_count=0) const |
Get the block and thread counts for a particle migration kernel needed to update particle positions and cell locations as part of a molecular dynamics time step. | |
CoreKlManager (const GpuDetails &gpu_in, const AtomGraphSynthesis *poly_ag) | |
The constructor will fill in values as if this were a single-threaded CPU "launch.". | |
CoreKlManager (const GpuDetails &gpu_in, const AtomGraphSynthesis &poly_ag) | |
int2 | getDensityMappingKernelDims (QMapMethod approach, PrecisionModel prec, size_t cg_tmat, int order) const |
Get the launch parameters for a general-purpose particle-mesh density mapping kernel. These kernels loop over atoms in a cell grid (see cellgrid.h) and issues atomic instructions to accumulate density on a set of particle-mesh interaction grids. | |
int2 | getDensityMappingKernelDims (QMapMethod approach, PrecisionModel calc_prec, PrecisionModel acc_prec, bool overflow, size_t cg_tmat, int order) const |
![]() | |
const GpuDetails & | getGpu () const |
Get the GPU information for the active GPU. | |
void | printLaunchParameters (const std::string &k_key=std::string(""), const char *true_class_name=generic_kernel_manager_name) const |
Print out the kernel launch parameters found for this workload. | |
Additional Inherited Members | |
![]() | |
KernelManager (const GpuDetails &gpu_in=null_gpu) | |
The constructor for this base class takes the GPU specifications. | |
virtual | ~KernelManager () |
A virtual destructor ensures proper behavior in the destructors of derived classes for managing particular groups of kernels. | |
![]() | |
GpuDetails | gpu |
The details of the GPU in use are simply copied into this object. | |
std::map< std::string, KernelFormat > | k_dictionary |
A class to guide the implementation of GPU kernels, with selected thread counts per block and block counts per launch grid for a specific GPU based on the workload. This object is constructed in a stepwise manner, with each kind of work unit contributing new launch specifications.
stormm::card::CoreKlManager::CoreKlManager | ( | const GpuDetails & | gpu_in, |
const AtomGraphSynthesis * | poly_ag ) |
The constructor will fill in values as if this were a single-threaded CPU "launch.".
gpu_in | Details of the GPU in use (this is relevant, as it will be used to interpret the layout of any kernels) |
poly_ag | Topologies for all systems, offering details of the workload |
int2 stormm::card::CoreKlManager::getDensityMappingKernelDims | ( | QMapMethod | approach, |
PrecisionModel | prec, | ||
size_t | cg_tmat, | ||
int | order ) const |
Get the launch parameters for a general-purpose particle-mesh density mapping kernel. These kernels loop over atoms in a cell grid (see cellgrid.h) and issues atomic instructions to accumulate density on a set of particle-mesh interaction grids.
Overloaded:
approach | The mapping approach guiding which kernel to choose |
prec | Precision in which all calculations will be performed. The precision of the coordinate representation is a separate parameter, but does not tend to affect the number of registers used by the cards and is thus not a factor when determining the launch grid size. |
calc_prec | Precision in which all calculations will be performed (supply this for the ACC_SHARED approach) |
acc_prec | Precision in which accumulations will be expressed. 95-bit accumulators are used for real-valued double-precision grids, 63-bit accumulators are used for real-valued single-precision grids) |
overflow | Indicate whether the overflow bits might be engaged by the fixed-precision model |
cg_tmat | Type identifier code for the coordinate representation in the associated cell grid (a neighbor list spatial decomposition of the coordinate synthesis) |
order | Order of the B-spline based interpolation |
int2 stormm::card::CoreKlManager::getIntegrationKernelDims | ( | PrecisionModel | prec, |
AccumulationMethod | acc_meth, | ||
IntegrationStage | process ) const |
Get the block and thread counts for a stand-alone velocity constraint kernel.
prec | The precision model for arithmetic to be used in applying constraints |
acc_meth | The method in which the velocities will be stored in shared memory arrays. The distinction comes about due to the multi-functional nature of the core code in this kernel: stand-alone or included in a valence interaction kernel. |
process | The part of the integration workflow to perform |
int2 stormm::card::CoreKlManager::getMigrationKernelDims | ( | PrecisionModel | coord_prec, |
NeighborListKind | grid_configuration, | ||
int | stage_number, | ||
int | gpos_bits = 0, | ||
int | chain_count = 0 ) const |
Get the block and thread counts for a particle migration kernel needed to update particle positions and cell locations as part of a molecular dynamics time step.
coord_prec | Level of detail in which particle coordinate representations and accumulation will take place |
grid_configuration | Indicate whether there are one or two neighbor list grids to evaluate |
stage_number | The stage of the calculation |
gpos_bits | The number of fixed-precision bits after the point used in the global particle position representation. This number is relevant when stage_number is 1. If left at its default value, kernel launch dimensions for the more laborious form of the kernel (invoking the overflow bits of the fixed-precision representation) will be provided. |
chain_count | The number of chains in the cell grids, each of which will become the subject of one thread block and one work unit. This can be provided when retrieving the launch parameters for modification of the launch grid based on the exact neighbor list situation. |
int2 stormm::card::CoreKlManager::getNonbondedKernelDims | ( | PrecisionModel | prec, |
NbwuKind | kind, | ||
EvaluateForce | eval_force, | ||
EvaluateEnergy | eval_nrg, | ||
AccumulationMethod | acc_meth, | ||
ImplicitSolventModel | igb, | ||
ClashResponse | collision_handling ) const |
Get the block and thread counts for a non-bonded kernel. Parameter descriptions for this function follow from getValenceKernelDims() above, with the addition of:
kind | The type of non-bonded work unit: tile groups, supertiles, or honeycomb |
igb | Type of implicit solvent model to use ("NONE" in periodic boundary conditions) |
int2 stormm::card::CoreKlManager::getPMEPairsKernelDims | ( | PrecisionModel | coord_prec, |
PrecisionModel | calc_prec, | ||
NeighborListKind | grid_configuration, | ||
TinyBoxPresence | has_tiny_box, | ||
EvaluateForce | eval_frc, | ||
EvaluateEnergy | eval_nrg, | ||
ClashResponse | mitigation ) const |
Get the block and thread counts for a PME pair interactions kernel evaluating the purview of each nieghbor list cell in a neutral territory decomposition.
coord_prec | Level of detail in which particle coordinate representations and accumulation will take place |
calc_prec | Level of detail in which arithmetic calculations will take place |
grid_configuration | Indicate whether there are one or two neighbor list grids to evaluate |
has_tiny_box | Indicate whether there is a very small box somewhere within the synthesis of systems |
eval_frc | Indicate whether to evaluate the forces on all particle |
eval_nrg | Indicate whether to evaluate each system's non-bonded energy |
mitigation | Action to take in response to clashes in the structure |
int2 stormm::card::CoreKlManager::getPmeValenceKernelDims | ( | PrecisionModel | prec, |
PrecisionModel | neighbor_prec, | ||
NeighborListKind | neighbor_layout, | ||
EvaluateEnergy | eval_nrg, | ||
AccumulationMethod | acc_meth, | ||
ClashResponse | collision_handling ) const |
Get the block and thread counts for a PME-compatible valence kernel. All such kernels will take abstracts from one or two neighbor list objects to gather non-bonded forces on particles. Because particle movement is an inherent feature of these kernels, evaluation of forces due to valence interactions is obligatory.
prec | Indicate whether arithmetic is to be performed in either single- or double-precision floating point numbers |
neighbor_prec | Precision of the neighbor list coordinates. This implies the type of the fixed-precision primary accumulator, single-precision corresponding to int32_t and double-precisin corresponding to int64_t accumulators. In either case, the neighbor list will have an int32_t overflow accumulator for each force component. |
neighbor_layout | Indicate whether there is a single neighbor list for all such non-bonded forces or if there are two neighbor lists for electrostatic and Lennard-Jones forces |
eval_nrg | Indication of whether to evaluate the energy of the system as a whole |
acc_meth | The force accumulation method (SPLIT or WHOLE, AUTOMATIC will produce an error in this context) |
collision_handling | Indication of whether clashes are to be dampened |
int2 stormm::card::CoreKlManager::getReductionKernelDims | ( | PrecisionModel | prec, |
ReductionGoal | purpose, | ||
ReductionStage | process ) const |
Get the block and thread counts for a reduction kernel.
prec | The type of floating point numbers represented by the kernel's substrate (all reductions happenin double-precision real numbers) |
purpose | The procedure requested, for which the appropriate kernel shall be found |
process | The reduction step to perform |
int2 stormm::card::CoreKlManager::getRMSDKernelDims | ( | PrecisionModel | prec, |
RMSDTask | order ) const |
Get the launch parameters for an RMSD calculation kernel.
prec | The precision model in which to perform calculations and present results |
order | The order of the calculation (all to reference, or all to all) |
int2 stormm::card::CoreKlManager::getValenceKernelDims | ( | PrecisionModel | prec, |
EvaluateForce | eval_force, | ||
EvaluateEnergy | eval_nrg, | ||
AccumulationMethod | acc_meth, | ||
VwuGoal | purpose, | ||
ClashResponse | collision_handling ) const |
Get the block and thread counts for a valence kernel.
prec | The type of floating point numbers in which the kernel shall work |
eval_force | Indication of whether the kernel will evaluate forces on atoms |
eval_nrg | Indication of whether to evaluate the energy of the system as a whole |
acc_meth | The force accumulation method (SPLIT or WHOLE, AUTOMATIC will produce an error in this context) |
collision_handling | Indication of whether clashes are to be dampened |
int2 stormm::card::CoreKlManager::getVirtualSiteKernelDims | ( | PrecisionModel | prec, |
VirtualSiteActivity | purpose ) const |
Get the block and thread counts for a virtual site placement or force transmission kernel.
prec | Implies a level of detail in the fixed-precision coordinate representation, and specifies the precision in which to perform the placement or force transmission operations |
purpose | The process to perform with standalone virtual site kernels: place virtual sites or transmit forces from them to frame atoms with mass |