STORMM Source Documentation
Loading...
Searching...
No Matches
stormm::card::KernelFormat Class Reference

Encapsulate the operations to store and retrieve information about a kernel's format. More...

#include <kernel_format.h>

Public Member Functions

int2 getLaunchParameters () const
 Get the optimal block and grid sizes for kernel launches with the present GPU.
 
int getRegisterUsage () const
 Get the register usage of the kernel.
 
int getBlockSizeLimit () const
 Get the maximum thread count for a single block in the kernel launch.
 
int getSharedMemoryRequirement () const
 Get the amount of shared memory needed by any one block.
 
const std::string & getKernelName () const
 Get the name of this kernel.
 
 KernelFormat ()
 The constructor takes launch bounds and other information that can be plucked from a cudaFuncAttributes object.
 
 KernelFormat (int lb_max_threads_per_block, int lb_min_blocks_per_smp, int register_usage_in, int shared_usage_in, int block_subdivision, const GpuDetails &gpu, const std::string &kernel_name_in=std::string(""))
 
 KernelFormat (int lb_max_threads_per_block, int lb_min_blocks_per_smp, int register_usage_in, int shared_usage_in, const GpuDetails &gpu, const std::string &kernel_name_in=std::string(""))
 
 KernelFormat (const KernelFormat &original)=default
 Take the default copy and move constructors as well as assignment operators.
 
 KernelFormat (KernelFormat &&original)=default
 
KernelFormatoperator= (const KernelFormat &other)=default
 
KernelFormatoperator= (KernelFormat &&other)=default
 

Detailed Description

Encapsulate the operations to store and retrieve information about a kernel's format.

Constructor & Destructor Documentation

◆ KernelFormat()

stormm::card::KernelFormat::KernelFormat ( )

The constructor takes launch bounds and other information that can be plucked from a cudaFuncAttributes object.

Overloaded:

  • Construct a blank object
  • Provide explicit instructions on whether to consider breaking up the blocks into smaller units
  • Assume that the largest possible block size is always to be used
Parameters
lb_max_threads_per_blockMaximum threads per block, as stated in the launch bounds
lb_min_blocks_per_smpMinimum blocks per multiprocessor, from the launch bounds
register_usage_inInput register usage
shared_usage_inInput shared memory usage
block_subdivisionPreferred block multiplicity (this will compound the input minimum number of blocks per multiprocessor)
attrResult of a CUDA runtime query to get kernel specifications
gpuDetails of the available GPU (likely passed in from a CoreKlManager struct containing many KernelFormat objects)
kernel_name_inName of the kernel, for reporting purposes later (optional)

The documentation for this class was generated from the following files: