Low-Priority Recommendations
On version 2.2 of the CUDA Toolkit (and later), use zero-copy operations on integrated GPUs. (
Zero Copy
)
For kernels with long argument lists, place some arguments into constant memory to save shared memory. (
Shared Memory and Memory Banks
)
Use shift operations to avoid expensive division and modulo calculations. (
Division and Modulo Operations
)
Avoid automatic conversion of doubles to floats. (
Other Arithmetic Instructions
)
Make it easy for the compiler to use branch predication in lieu of loops or control statements. (
Branching Predication
)
Parent topic:
Recommendations and Best Practices
Copyright © 2011 NVIDIA Corporation |
www.nvidia.com