STORMM Source Documentation
Loading...
Searching...
No Matches
stormm::trajectory::CoreKlManager Class Reference

A class to guide the implementation of GPU kernels, with selected thread counts per block and block counts per launch grid for a specific GPU based on the workload. This object is constructed in a stepwise manner, with each kind of work unit contributing new launch specifications. More...

#include <core_kernel_manager.h>

Inheritance diagram for stormm::trajectory::CoreKlManager:
Collaboration diagram for stormm::trajectory::CoreKlManager:

Public Member Functions

int getArchBlockMultiplier () const
 Get the architecture-specific block multiplier. This will run a minimum number of blocks per streaming multiprocessor on some cards, specifically NVIDIA's GTX 1080-Ti, when large blocks cannot use more than 32k registers in all.
 
int2 getValenceKernelDims (PrecisionModel prec, EvaluateForce eval_force, EvaluateEnergy eval_nrg, AccumulationMethod acc_meth, VwuGoal purpose, ClashResponse collision_handling) const
 Get the block and thread counts for a valence kernel.
 
int2 getPmeValenceKernelDims (PrecisionModel prec, PrecisionModel neighbor_prec, NeighborListKind neighbor_layout, EvaluateEnergy eval_nrg, AccumulationMethod acc_meth, ClashResponse collision_handling) const
 Get the block and thread counts for a PME-compatible valence kernel. All such kernels will take abstracts from one or two neighbor list objects to gather non-bonded forces on particles. Because particle movement is an inherent feature of these kernels, evaluation of forces due to valence interactions is obligatory.
 
int2 getIntegrationKernelDims (PrecisionModel prec, AccumulationMethod acc_meth, IntegrationStage process) const
 Get the block and thread counts for a stand-alone velocity constraint kernel.
 
int2 getNonbondedKernelDims (PrecisionModel prec, NbwuKind kind, EvaluateForce eval_force, EvaluateEnergy eval_nrg, AccumulationMethod acc_meth, ImplicitSolventModel igb, ClashResponse collision_handling) const
 Get the block and thread counts for a non-bonded kernel. Parameter descriptions for this function follow from getValenceKernelDims() above, with the addition of:
 
int2 getBornRadiiKernelDims (PrecisionModel prec, NbwuKind kind, AccumulationMethod acc_meth, ImplicitSolventModel igb) const
 Get the block and thread counts for a Born radii computation kernel. Parameter descriptions for this function follow from getValenceKernelDims() and getNonbondedKernelDims(), above.
 
int2 getBornDerivativeKernelDims (PrecisionModel prec, NbwuKind kind, AccumulationMethod acc_meth, ImplicitSolventModel igb) const
 Get the block and thread counts for a Born derivative computation kernel. Parameter descriptions for this function follow from getValenceKernelDims() and getNonbondedKernelDims(), above.
 
int2 getReductionKernelDims (PrecisionModel prec, ReductionGoal purpose, ReductionStage process) const
 Get the block and thread counts for a reduction kernel.
 
int2 getVirtualSiteKernelDims (PrecisionModel prec, VirtualSiteActivity purpose) const
 Get the block and thread counts for a virtual site placement or force transmission kernel.
 
int2 getRMSDKernelDims (PrecisionModel prec, RMSDTask order) const
 Get the launch parameters for an RMSD calculation kernel.
 
int2 getPMEPairsKernelDims (PrecisionModel coord_prec, PrecisionModel calc_prec, NeighborListKind grid_configuration, TinyBoxPresence has_tiny_box, EvaluateForce eval_frc, EvaluateEnergy eval_nrg, ClashResponse mitigation) const
 Get the block and thread counts for a PME pair interactions kernel evaluating the purview of each nieghbor list cell in a neutral territory decomposition.
 
int2 getMigrationKernelDims (PrecisionModel coord_prec, NeighborListKind grid_configuration, int stage_number, int gpos_bits=0, int chain_count=0) const
 Get the block and thread counts for a particle migration kernel needed to update particle positions and cell locations as part of a molecular dynamics time step.
 
 CoreKlManager (const GpuDetails &gpu_in, const AtomGraphSynthesis *poly_ag)
 The constructor will fill in values as if this were a single-threaded CPU "launch.".
 
 CoreKlManager (const GpuDetails &gpu_in, const AtomGraphSynthesis &poly_ag)
 
int2 getDensityMappingKernelDims (QMapMethod approach, PrecisionModel prec, size_t cg_tmat, int order) const
 Get the launch parameters for a general-purpose particle-mesh density mapping kernel. These kernels loop over atoms in a cell grid (see cellgrid.h) and issues atomic instructions to accumulate density on a set of particle-mesh interaction grids.
 
int2 getDensityMappingKernelDims (QMapMethod approach, PrecisionModel calc_prec, PrecisionModel acc_prec, bool overflow, size_t cg_tmat, int order) const
 
- Public Member Functions inherited from stormm::card::KernelManager
const GpuDetailsgetGpu () const
 Get the GPU information for the active GPU.
 
void printLaunchParameters (const std::string &k_key=std::string(""), const char *true_class_name=generic_kernel_manager_name) const
 Print out the kernel launch parameters found for this workload.
 

Additional Inherited Members

- Protected Member Functions inherited from stormm::card::KernelManager
 KernelManager (const GpuDetails &gpu_in=null_gpu)
 The constructor for this base class takes the GPU specifications.
 
virtual ~KernelManager ()
 A virtual destructor ensures proper behavior in the destructors of derived classes for managing particular groups of kernels.
 
- Protected Attributes inherited from stormm::card::KernelManager
GpuDetails gpu
 The details of the GPU in use are simply copied into this object.
 
std::map< std::string, KernelFormatk_dictionary
 

Detailed Description

A class to guide the implementation of GPU kernels, with selected thread counts per block and block counts per launch grid for a specific GPU based on the workload. This object is constructed in a stepwise manner, with each kind of work unit contributing new launch specifications.

Constructor & Destructor Documentation

◆ CoreKlManager()

stormm::card::CoreKlManager::CoreKlManager ( const GpuDetails & gpu_in,
const AtomGraphSynthesis * poly_ag )

The constructor will fill in values as if this were a single-threaded CPU "launch.".

Parameters
gpu_inDetails of the GPU in use (this is relevant, as it will be used to interpret the layout of any kernels)
poly_agTopologies for all systems, offering details of the workload

Member Function Documentation

◆ getDensityMappingKernelDims()

int2 stormm::card::CoreKlManager::getDensityMappingKernelDims ( QMapMethod approach,
PrecisionModel prec,
size_t cg_tmat,
int order ) const

Get the launch parameters for a general-purpose particle-mesh density mapping kernel. These kernels loop over atoms in a cell grid (see cellgrid.h) and issues atomic instructions to accumulate density on a set of particle-mesh interaction grids.

Overloaded:

  • Provide only the calculation precision (valid for GENERAL_PURPOSE and ACC_REGISTER kernels)
  • Also provide the accumulation precision (needed for ACC_SHARED kernels)
Parameters
approachThe mapping approach guiding which kernel to choose
precPrecision in which all calculations will be performed. The precision of the coordinate representation is a separate parameter, but does not tend to affect the number of registers used by the cards and is thus not a factor when determining the launch grid size.
calc_precPrecision in which all calculations will be performed (supply this for the ACC_SHARED approach)
acc_precPrecision in which accumulations will be expressed. 95-bit accumulators are used for real-valued double-precision grids, 63-bit accumulators are used for real-valued single-precision grids)
overflowIndicate whether the overflow bits might be engaged by the fixed-precision model
cg_tmatType identifier code for the coordinate representation in the associated cell grid (a neighbor list spatial decomposition of the coordinate synthesis)
orderOrder of the B-spline based interpolation

◆ getIntegrationKernelDims()

int2 stormm::card::CoreKlManager::getIntegrationKernelDims ( PrecisionModel prec,
AccumulationMethod acc_meth,
IntegrationStage process ) const

Get the block and thread counts for a stand-alone velocity constraint kernel.

Parameters
precThe precision model for arithmetic to be used in applying constraints
acc_methThe method in which the velocities will be stored in shared memory arrays. The distinction comes about due to the multi-functional nature of the core code in this kernel: stand-alone or included in a valence interaction kernel.
processThe part of the integration workflow to perform

◆ getMigrationKernelDims()

int2 stormm::card::CoreKlManager::getMigrationKernelDims ( PrecisionModel coord_prec,
NeighborListKind grid_configuration,
int stage_number,
int gpos_bits = 0,
int chain_count = 0 ) const

Get the block and thread counts for a particle migration kernel needed to update particle positions and cell locations as part of a molecular dynamics time step.

Parameters
coord_precLevel of detail in which particle coordinate representations and accumulation will take place
grid_configurationIndicate whether there are one or two neighbor list grids to evaluate
stage_numberThe stage of the calculation
gpos_bitsThe number of fixed-precision bits after the point used in the global particle position representation. This number is relevant when stage_number is 1. If left at its default value, kernel launch dimensions for the more laborious form of the kernel (invoking the overflow bits of the fixed-precision representation) will be provided.
chain_countThe number of chains in the cell grids, each of which will become the subject of one thread block and one work unit. This can be provided when retrieving the launch parameters for modification of the launch grid based on the exact neighbor list situation.

◆ getNonbondedKernelDims()

int2 stormm::card::CoreKlManager::getNonbondedKernelDims ( PrecisionModel prec,
NbwuKind kind,
EvaluateForce eval_force,
EvaluateEnergy eval_nrg,
AccumulationMethod acc_meth,
ImplicitSolventModel igb,
ClashResponse collision_handling ) const

Get the block and thread counts for a non-bonded kernel. Parameter descriptions for this function follow from getValenceKernelDims() above, with the addition of:

Parameters
kindThe type of non-bonded work unit: tile groups, supertiles, or honeycomb
igbType of implicit solvent model to use ("NONE" in periodic boundary conditions)

◆ getPMEPairsKernelDims()

int2 stormm::card::CoreKlManager::getPMEPairsKernelDims ( PrecisionModel coord_prec,
PrecisionModel calc_prec,
NeighborListKind grid_configuration,
TinyBoxPresence has_tiny_box,
EvaluateForce eval_frc,
EvaluateEnergy eval_nrg,
ClashResponse mitigation ) const

Get the block and thread counts for a PME pair interactions kernel evaluating the purview of each nieghbor list cell in a neutral territory decomposition.

Parameters
coord_precLevel of detail in which particle coordinate representations and accumulation will take place
calc_precLevel of detail in which arithmetic calculations will take place
grid_configurationIndicate whether there are one or two neighbor list grids to evaluate
has_tiny_boxIndicate whether there is a very small box somewhere within the synthesis of systems
eval_frcIndicate whether to evaluate the forces on all particle
eval_nrgIndicate whether to evaluate each system's non-bonded energy
mitigationAction to take in response to clashes in the structure

◆ getPmeValenceKernelDims()

int2 stormm::card::CoreKlManager::getPmeValenceKernelDims ( PrecisionModel prec,
PrecisionModel neighbor_prec,
NeighborListKind neighbor_layout,
EvaluateEnergy eval_nrg,
AccumulationMethod acc_meth,
ClashResponse collision_handling ) const

Get the block and thread counts for a PME-compatible valence kernel. All such kernels will take abstracts from one or two neighbor list objects to gather non-bonded forces on particles. Because particle movement is an inherent feature of these kernels, evaluation of forces due to valence interactions is obligatory.

Parameters
precIndicate whether arithmetic is to be performed in either single- or double-precision floating point numbers
neighbor_precPrecision of the neighbor list coordinates. This implies the type of the fixed-precision primary accumulator, single-precision corresponding to int32_t and double-precisin corresponding to int64_t accumulators. In either case, the neighbor list will have an int32_t overflow accumulator for each force component.
neighbor_layoutIndicate whether there is a single neighbor list for all such non-bonded forces or if there are two neighbor lists for electrostatic and Lennard-Jones forces
eval_nrgIndication of whether to evaluate the energy of the system as a whole
acc_methThe force accumulation method (SPLIT or WHOLE, AUTOMATIC will produce an error in this context)
collision_handlingIndication of whether clashes are to be dampened

◆ getReductionKernelDims()

int2 stormm::card::CoreKlManager::getReductionKernelDims ( PrecisionModel prec,
ReductionGoal purpose,
ReductionStage process ) const

Get the block and thread counts for a reduction kernel.

Parameters
precThe type of floating point numbers represented by the kernel's substrate (all reductions happenin double-precision real numbers)
purposeThe procedure requested, for which the appropriate kernel shall be found
processThe reduction step to perform

◆ getRMSDKernelDims()

int2 stormm::card::CoreKlManager::getRMSDKernelDims ( PrecisionModel prec,
RMSDTask order ) const

Get the launch parameters for an RMSD calculation kernel.

Parameters
precThe precision model in which to perform calculations and present results
orderThe order of the calculation (all to reference, or all to all)

◆ getValenceKernelDims()

int2 stormm::card::CoreKlManager::getValenceKernelDims ( PrecisionModel prec,
EvaluateForce eval_force,
EvaluateEnergy eval_nrg,
AccumulationMethod acc_meth,
VwuGoal purpose,
ClashResponse collision_handling ) const

Get the block and thread counts for a valence kernel.

Parameters
precThe type of floating point numbers in which the kernel shall work
eval_forceIndication of whether the kernel will evaluate forces on atoms
eval_nrgIndication of whether to evaluate the energy of the system as a whole
acc_methThe force accumulation method (SPLIT or WHOLE, AUTOMATIC will produce an error in this context)
collision_handlingIndication of whether clashes are to be dampened

◆ getVirtualSiteKernelDims()

int2 stormm::card::CoreKlManager::getVirtualSiteKernelDims ( PrecisionModel prec,
VirtualSiteActivity purpose ) const

Get the block and thread counts for a virtual site placement or force transmission kernel.

Parameters
precImplies a level of detail in the fixed-precision coordinate representation, and specifies the precision in which to perform the placement or force transmission operations
purposeThe process to perform with standalone virtual site kernels: place virtual sites or transmit forces from them to frame atoms with mass

The documentation for this class was generated from the following files: