Encapsulate the operations to store and retrieve information about a kernel's format. More...

Public Member Functions
int2	getLaunchParameters () const
	Get the optimal block and grid sizes for kernel launches with the present GPU.

int	getRegisterUsage () const
	Get the register usage of the kernel.

int	getBlockSizeLimit () const
	Get the maximum thread count for a single block in the kernel launch.

int	getSharedMemoryRequirement () const
	Get the amount of shared memory needed by any one block.

const std::string &	getKernelName () const
	Get the name of this kernel.


	KernelFormat ()
	The constructor takes launch bounds and other information that can be plucked from a cudaFuncAttributes object.

	KernelFormat (int lb_max_threads_per_block, int lb_min_blocks_per_smp, int register_usage_in, int shared_usage_in, int block_subdivision, const GpuDetails &gpu, const std::string &kernel_name_in=std::string(""))

	KernelFormat (int lb_max_threads_per_block, int lb_min_blocks_per_smp, int register_usage_in, int shared_usage_in, const GpuDetails &gpu, const std::string &kernel_name_in=std::string(""))


	KernelFormat (const KernelFormat &original)=default
	Take the default copy and move constructors as well as assignment operators.

	KernelFormat (KernelFormat &&original)=default

KernelFormat &	operator= (const KernelFormat &other)=default

KernelFormat &	operator= (KernelFormat &&other)=default

Detailed Description

Encapsulate the operations to store and retrieve information about a kernel's format.

Constructor & Destructor Documentation

stormm::card::KernelFormat::KernelFormat ( )

The constructor takes launch bounds and other information that can be plucked from a cudaFuncAttributes object.

Overloaded:

Construct a blank object
Provide explicit instructions on whether to consider breaking up the blocks into smaller units
Assume that the largest possible block size is always to be used

Parameters

lb_max_threads_per_block	Maximum threads per block, as stated in the launch bounds
lb_min_blocks_per_smp	Minimum blocks per multiprocessor, from the launch bounds
register_usage_in	Input register usage
shared_usage_in	Input shared memory usage
block_subdivision	Preferred block multiplicity (this will compound the input minimum number of blocks per multiprocessor)
attr	Result of a CUDA runtime query to get kernel specifications
gpu	Details of the available GPU (likely passed in from a CoreKlManager struct containing many KernelFormat objects)
kernel_name_in	Name of the kernel, for reporting purposes later (optional)

The documentation for this class was generated from the following files: