A class to guide the implementation of GPU kernels, with selected thread counts per block and block counts per launch grid for a specific GPU based on the workload. This object is constructed in a stepwise manner, with each kind of work unit contributing new launch specifications. More...

#include <core_kernel_manager.h>

Inheritance diagram for stormm::mm::CoreKlManager:

[legend]

Collaboration diagram for stormm::mm::CoreKlManager:

[legend]

Public Member Functions
int	getArchBlockMultiplier () const
	Get the architecture-specific block multiplier. This will run a minimum number of blocks per streaming multiprocessor on some cards, specifically NVIDIA's GTX 1080-Ti, when large blocks cannot use more than 32k registers in all.

int2	getValenceKernelDims (PrecisionModel prec, EvaluateForce eval_force, EvaluateEnergy eval_nrg, AccumulationMethod acc_meth, VwuGoal purpose, ClashResponse collision_handling) const
	Get the block and thread counts for a valence kernel.

int2	getPmeValenceKernelDims (PrecisionModel prec, PrecisionModel neighbor_prec, NeighborListKind neighbor_layout, EvaluateEnergy eval_nrg, AccumulationMethod acc_meth, ClashResponse collision_handling) const
	Get the block and thread counts for a PME-compatible valence kernel. All such kernels will take abstracts from one or two neighbor list objects to gather non-bonded forces on particles. Because particle movement is an inherent feature of these kernels, evaluation of forces due to valence interactions is obligatory.

int2	getIntegrationKernelDims (PrecisionModel prec, AccumulationMethod acc_meth, IntegrationStage process) const
	Get the block and thread counts for a stand-alone velocity constraint kernel.

int2	getNonbondedKernelDims (PrecisionModel prec, NbwuKind kind, EvaluateForce eval_force, EvaluateEnergy eval_nrg, AccumulationMethod acc_meth, ImplicitSolventModel igb, ClashResponse collision_handling) const
	Get the block and thread counts for a non-bonded kernel. Parameter descriptions for this function follow from getValenceKernelDims() above, with the addition of:

int2	getBornRadiiKernelDims (PrecisionModel prec, NbwuKind kind, AccumulationMethod acc_meth, ImplicitSolventModel igb) const
	Get the block and thread counts for a Born radii computation kernel. Parameter descriptions for this function follow from getValenceKernelDims() and getNonbondedKernelDims(), above.

int2	getBornDerivativeKernelDims (PrecisionModel prec, NbwuKind kind, AccumulationMethod acc_meth, ImplicitSolventModel igb) const
	Get the block and thread counts for a Born derivative computation kernel. Parameter descriptions for this function follow from getValenceKernelDims() and getNonbondedKernelDims(), above.

int2	getReductionKernelDims (PrecisionModel prec, ReductionGoal purpose, ReductionStage process) const
	Get the block and thread counts for a reduction kernel.

int2	getVirtualSiteKernelDims (PrecisionModel prec, VirtualSiteActivity purpose) const
	Get the block and thread counts for a virtual site placement or force transmission kernel.

int2	getRMSDKernelDims (PrecisionModel prec, RMSDTask order) const
	Get the launch parameters for an RMSD calculation kernel.

int2	getPMEPairsKernelDims (PrecisionModel coord_prec, PrecisionModel calc_prec, NeighborListKind grid_configuration, TinyBoxPresence has_tiny_box, EvaluateForce eval_frc, EvaluateEnergy eval_nrg, ClashResponse mitigation) const
	Get the block and thread counts for a PME pair interactions kernel evaluating the purview of each nieghbor list cell in a neutral territory decomposition.

int2	getMigrationKernelDims (PrecisionModel coord_prec, NeighborListKind grid_configuration, int stage_number, int gpos_bits=0, int chain_count=0) const
	Get the block and thread counts for a particle migration kernel needed to update particle positions and cell locations as part of a molecular dynamics time step.


	CoreKlManager (const GpuDetails &gpu_in, const AtomGraphSynthesis *poly_ag)
	The constructor will fill in values as if this were a single-threaded CPU "launch.".

	CoreKlManager (const GpuDetails &gpu_in, const AtomGraphSynthesis &poly_ag)


int2	getDensityMappingKernelDims (QMapMethod approach, PrecisionModel prec, size_t cg_tmat, int order) const
	Get the launch parameters for a general-purpose particle-mesh density mapping kernel. These kernels loop over atoms in a cell grid (see cellgrid.h) and issues atomic instructions to accumulate density on a set of particle-mesh interaction grids.

int2	getDensityMappingKernelDims (QMapMethod approach, PrecisionModel calc_prec, PrecisionModel acc_prec, bool overflow, size_t cg_tmat, int order) const

Public Member Functions inherited from stormm::card::KernelManager
const GpuDetails &	getGpu () const
	Get the GPU information for the active GPU.

void	printLaunchParameters (const std::string &k_key=std::string(""), const char *true_class_name=generic_kernel_manager_name) const
	Print out the kernel launch parameters found for this workload.

Additional Inherited Members
Protected Member Functions inherited from stormm::card::KernelManager
	KernelManager (const GpuDetails &gpu_in=null_gpu)
	The constructor for this base class takes the GPU specifications.

virtual	~KernelManager ()
	A virtual destructor ensures proper behavior in the destructors of derived classes for managing particular groups of kernels.

Protected Attributes inherited from stormm::card::KernelManager
GpuDetails	gpu
	The details of the GPU in use are simply copied into this object.

std::map< std::string, KernelFormat >	k_dictionary

Detailed Description

A class to guide the implementation of GPU kernels, with selected thread counts per block and block counts per launch grid for a specific GPU based on the workload. This object is constructed in a stepwise manner, with each kind of work unit contributing new launch specifications.

Constructor & Destructor Documentation

◆ CoreKlManager()

stormm::card::CoreKlManager::CoreKlManager	(	const GpuDetails &	gpu_in,
		const AtomGraphSynthesis *	poly_ag )

The constructor will fill in values as if this were a single-threaded CPU "launch.".

Parameters

gpu_in	Details of the GPU in use (this is relevant, as it will be used to interpret the layout of any kernels)
poly_ag	Topologies for all systems, offering details of the workload

Member Function Documentation

◆ getDensityMappingKernelDims()

int2 stormm::card::CoreKlManager::getDensityMappingKernelDims	(	QMapMethod	approach,
		PrecisionModel	prec,
		size_t	cg_tmat,
		int	order ) const

Get the launch parameters for a general-purpose particle-mesh density mapping kernel. These kernels loop over atoms in a cell grid (see cellgrid.h) and issues atomic instructions to accumulate density on a set of particle-mesh interaction grids.

Overloaded:

Provide only the calculation precision (valid for GENERAL_PURPOSE and ACC_REGISTER kernels)
Also provide the accumulation precision (needed for ACC_SHARED kernels)

Parameters

approach	The mapping approach guiding which kernel to choose
prec	Precision in which all calculations will be performed. The precision of the coordinate representation is a separate parameter, but does not tend to affect the number of registers used by the cards and is thus not a factor when determining the launch grid size.
calc_prec	Precision in which all calculations will be performed (supply this for the ACC_SHARED approach)
acc_prec	Precision in which accumulations will be expressed. 95-bit accumulators are used for real-valued double-precision grids, 63-bit accumulators are used for real-valued single-precision grids)
overflow	Indicate whether the overflow bits might be engaged by the fixed-precision model
cg_tmat	Type identifier code for the coordinate representation in the associated cell grid (a neighbor list spatial decomposition of the coordinate synthesis)
order	Order of the B-spline based interpolation

◆ getIntegrationKernelDims()

int2 stormm::card::CoreKlManager::getIntegrationKernelDims	(	PrecisionModel	prec,
		AccumulationMethod	acc_meth,
		IntegrationStage	process ) const

Get the block and thread counts for a stand-alone velocity constraint kernel.

Parameters

prec	The precision model for arithmetic to be used in applying constraints
acc_meth	The method in which the velocities will be stored in shared memory arrays. The distinction comes about due to the multi-functional nature of the core code in this kernel: stand-alone or included in a valence interaction kernel.
process	The part of the integration workflow to perform

◆ getMigrationKernelDims()

int2 stormm::card::CoreKlManager::getMigrationKernelDims	(	PrecisionModel	coord_prec,
		NeighborListKind	grid_configuration,
		int	stage_number,
		int	gpos_bits = 0,
		int	chain_count = 0 ) const

Get the block and thread counts for a particle migration kernel needed to update particle positions and cell locations as part of a molecular dynamics time step.

Parameters

coord_prec	Level of detail in which particle coordinate representations and accumulation will take place
grid_configuration	Indicate whether there are one or two neighbor list grids to evaluate
stage_number	The stage of the calculation
gpos_bits	The number of fixed-precision bits after the point used in the global particle position representation. This number is relevant when stage_number is 1. If left at its default value, kernel launch dimensions for the more laborious form of the kernel (invoking the overflow bits of the fixed-precision representation) will be provided.
chain_count	The number of chains in the cell grids, each of which will become the subject of one thread block and one work unit. This can be provided when retrieving the launch parameters for modification of the launch grid based on the exact neighbor list situation.

◆ getNonbondedKernelDims()

int2 stormm::card::CoreKlManager::getNonbondedKernelDims	(	PrecisionModel	prec,
		NbwuKind	kind,
		EvaluateForce	eval_force,
		EvaluateEnergy	eval_nrg,
		AccumulationMethod	acc_meth,
		ImplicitSolventModel	igb,
		ClashResponse	collision_handling ) const

Get the block and thread counts for a non-bonded kernel. Parameter descriptions for this function follow from getValenceKernelDims() above, with the addition of:

Parameters

kind	The type of non-bonded work unit: tile groups, supertiles, or honeycomb
igb	Type of implicit solvent model to use ("NONE" in periodic boundary conditions)

◆ getPMEPairsKernelDims()

int2 stormm::card::CoreKlManager::getPMEPairsKernelDims	(	PrecisionModel	coord_prec,
		PrecisionModel	calc_prec,
		NeighborListKind	grid_configuration,
		TinyBoxPresence	has_tiny_box,
		EvaluateForce	eval_frc,
		EvaluateEnergy	eval_nrg,
		ClashResponse	mitigation ) const

Get the block and thread counts for a PME pair interactions kernel evaluating the purview of each nieghbor list cell in a neutral territory decomposition.

Parameters

coord_prec	Level of detail in which particle coordinate representations and accumulation will take place
calc_prec	Level of detail in which arithmetic calculations will take place
grid_configuration	Indicate whether there are one or two neighbor list grids to evaluate
has_tiny_box	Indicate whether there is a very small box somewhere within the synthesis of systems
eval_frc	Indicate whether to evaluate the forces on all particle
eval_nrg	Indicate whether to evaluate each system's non-bonded energy
mitigation	Action to take in response to clashes in the structure

◆ getPmeValenceKernelDims()

int2 stormm::card::CoreKlManager::getPmeValenceKernelDims	(	PrecisionModel	prec,
		PrecisionModel	neighbor_prec,
		NeighborListKind	neighbor_layout,
		EvaluateEnergy	eval_nrg,
		AccumulationMethod	acc_meth,
		ClashResponse	collision_handling ) const

Get the block and thread counts for a PME-compatible valence kernel. All such kernels will take abstracts from one or two neighbor list objects to gather non-bonded forces on particles. Because particle movement is an inherent feature of these kernels, evaluation of forces due to valence interactions is obligatory.

Parameters

prec	Indicate whether arithmetic is to be performed in either single- or double-precision floating point numbers
neighbor_prec	Precision of the neighbor list coordinates. This implies the type of the fixed-precision primary accumulator, single-precision corresponding to int32_t and double-precisin corresponding to int64_t accumulators. In either case, the neighbor list will have an int32_t overflow accumulator for each force component.
neighbor_layout	Indicate whether there is a single neighbor list for all such non-bonded forces or if there are two neighbor lists for electrostatic and Lennard-Jones forces
eval_nrg	Indication of whether to evaluate the energy of the system as a whole
acc_meth	The force accumulation method (SPLIT or WHOLE, AUTOMATIC will produce an error in this context)
collision_handling	Indication of whether clashes are to be dampened

◆ getReductionKernelDims()

int2 stormm::card::CoreKlManager::getReductionKernelDims	(	PrecisionModel	prec,
		ReductionGoal	purpose,
		ReductionStage	process ) const

Get the block and thread counts for a reduction kernel.

Parameters

prec	The type of floating point numbers represented by the kernel's substrate (all reductions happenin double-precision real numbers)
purpose	The procedure requested, for which the appropriate kernel shall be found
process	The reduction step to perform

◆ getRMSDKernelDims()

int2 stormm::card::CoreKlManager::getRMSDKernelDims	(	PrecisionModel	prec,
		RMSDTask	order ) const

Get the launch parameters for an RMSD calculation kernel.

Parameters

prec	The precision model in which to perform calculations and present results
order	The order of the calculation (all to reference, or all to all)

◆ getValenceKernelDims()

int2 stormm::card::CoreKlManager::getValenceKernelDims	(	PrecisionModel	prec,
		EvaluateForce	eval_force,
		EvaluateEnergy	eval_nrg,
		AccumulationMethod	acc_meth,
		VwuGoal	purpose,
		ClashResponse	collision_handling ) const

Get the block and thread counts for a valence kernel.

Parameters

prec	The type of floating point numbers in which the kernel shall work
eval_force	Indication of whether the kernel will evaluate forces on atoms
eval_nrg	Indication of whether to evaluate the energy of the system as a whole
acc_meth	The force accumulation method (SPLIT or WHOLE, AUTOMATIC will produce an error in this context)
collision_handling	Indication of whether clashes are to be dampened

◆ getVirtualSiteKernelDims()

int2 stormm::card::CoreKlManager::getVirtualSiteKernelDims	(	PrecisionModel	prec,
		VirtualSiteActivity	purpose ) const

Get the block and thread counts for a virtual site placement or force transmission kernel.

Parameters

prec	Implies a level of detail in the fixed-precision coordinate representation, and specifies the precision in which to perform the placement or force transmission operations
purpose	The process to perform with standalone virtual site kernels: place virtual sites or transmit forces from them to frame atoms with mass

The documentation for this class was generated from the following files:

src/Accelerator/core_kernel_manager.h
src/Accelerator/core_kernel_manager.cpp

Public Member Functions

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ CoreKlManager()

Member Function Documentation

◆ getDensityMappingKernelDims()

◆ getIntegrationKernelDims()

◆ getMigrationKernelDims()

◆ getNonbondedKernelDims()

◆ getPMEPairsKernelDims()

◆ getPmeValenceKernelDims()

◆ getReductionKernelDims()

◆ getRMSDKernelDims()

◆ getValenceKernelDims()

◆ getVirtualSiteKernelDims()