STORMM Source Documentation
|
Encapsulate the operations to store and retrieve information about a kernel's format. More...
#include <kernel_format.h>
Public Member Functions | |
int2 | getLaunchParameters () const |
Get the optimal block and grid sizes for kernel launches with the present GPU. | |
int | getRegisterUsage () const |
Get the register usage of the kernel. | |
int | getBlockSizeLimit () const |
Get the maximum thread count for a single block in the kernel launch. | |
int | getSharedMemoryRequirement () const |
Get the amount of shared memory needed by any one block. | |
const std::string & | getKernelName () const |
Get the name of this kernel. | |
KernelFormat () | |
The constructor takes launch bounds and other information that can be plucked from a cudaFuncAttributes object. | |
KernelFormat (int lb_max_threads_per_block, int lb_min_blocks_per_smp, int register_usage_in, int shared_usage_in, int block_subdivision, const GpuDetails &gpu, const std::string &kernel_name_in=std::string("")) | |
KernelFormat (int lb_max_threads_per_block, int lb_min_blocks_per_smp, int register_usage_in, int shared_usage_in, const GpuDetails &gpu, const std::string &kernel_name_in=std::string("")) | |
KernelFormat (const KernelFormat &original)=default | |
Take the default copy and move constructors as well as assignment operators. | |
KernelFormat (KernelFormat &&original)=default | |
KernelFormat & | operator= (const KernelFormat &other)=default |
KernelFormat & | operator= (KernelFormat &&other)=default |
Encapsulate the operations to store and retrieve information about a kernel's format.
stormm::card::KernelFormat::KernelFormat | ( | ) |
The constructor takes launch bounds and other information that can be plucked from a cudaFuncAttributes object.
Overloaded:
lb_max_threads_per_block | Maximum threads per block, as stated in the launch bounds |
lb_min_blocks_per_smp | Minimum blocks per multiprocessor, from the launch bounds |
register_usage_in | Input register usage |
shared_usage_in | Input shared memory usage |
block_subdivision | Preferred block multiplicity (this will compound the input minimum number of blocks per multiprocessor) |
attr | Result of a CUDA runtime query to get kernel specifications |
gpu | Details of the available GPU (likely passed in from a CoreKlManager struct containing many KernelFormat objects) |
kernel_name_in | Name of the kernel, for reporting purposes later (optional) |