|
STORMM Source Documentation
|
Encapsulate the operations to store and retrieve information about a kernel's format. More...
#include <kernel_format.h>
Public Member Functions | |
| int2 | getLaunchParameters () const |
| Get the optimal block and grid sizes for kernel launches with the present GPU. | |
| int | getRegisterUsage () const |
| Get the register usage of the kernel. | |
| int | getBlockSizeLimit () const |
| Get the maximum thread count for a single block in the kernel launch. | |
| int | getSharedMemoryRequirement () const |
| Get the amount of shared memory needed by any one block. | |
| const std::string & | getKernelName () const |
| Get the name of this kernel. | |
| KernelFormat () | |
| The constructor takes launch bounds and other information that can be plucked from a cudaFuncAttributes object. | |
| KernelFormat (int lb_max_threads_per_block, int lb_min_blocks_per_smp, int register_usage_in, int shared_usage_in, int block_subdivision, const GpuDetails &gpu, const std::string &kernel_name_in=std::string("")) | |
| KernelFormat (int lb_max_threads_per_block, int lb_min_blocks_per_smp, int register_usage_in, int shared_usage_in, const GpuDetails &gpu, const std::string &kernel_name_in=std::string("")) | |
| KernelFormat (const KernelFormat &original)=default | |
| Take the default copy and move constructors as well as assignment operators. | |
| KernelFormat (KernelFormat &&original)=default | |
| KernelFormat & | operator= (const KernelFormat &other)=default |
| KernelFormat & | operator= (KernelFormat &&other)=default |
Encapsulate the operations to store and retrieve information about a kernel's format.
| stormm::card::KernelFormat::KernelFormat | ( | ) |
The constructor takes launch bounds and other information that can be plucked from a cudaFuncAttributes object.
Overloaded:
| lb_max_threads_per_block | Maximum threads per block, as stated in the launch bounds |
| lb_min_blocks_per_smp | Minimum blocks per multiprocessor, from the launch bounds |
| register_usage_in | Input register usage |
| shared_usage_in | Input shared memory usage |
| block_subdivision | Preferred block multiplicity (this will compound the input minimum number of blocks per multiprocessor) |
| attr | Result of a CUDA runtime query to get kernel specifications |
| gpu | Details of the available GPU (likely passed in from a CoreKlManager struct containing many KernelFormat objects) |
| kernel_name_in | Name of the kernel, for reporting purposes later (optional) |