- 16 Nov 2024
- 6 Minutes to read
- Print
- DarkLight
- PDF
Tuning for maximum performance
- Updated on 16 Nov 2024
- 6 Minutes to read
- Print
- DarkLight
- PDF
To get the most out of your Composer setup, there are several items to check and configure. Below is a summary of the most common topics.
Composer
- Ensure Composer uses the Lock GPU clocks option in the GPU and rendering tab.
- Use NVDEC for decoding your video inputs. This will reduce the CPU load. (Requires Composer R3 2023 and newer)
- Use NVENC for encoding your RTMP outputs. This will reduce the CPU load. However, there is a limitation on how many instances of NVENC can run in parallel. If you run into this limitation, Composer automatically selects the CPU as the decoder.
- Use the latest version of the Blackmagic Capture and Output components. When available, activate the option "Use pinned memory".
- When possible, use still images instead of video clips. Video clips will render more load because of the continuous memory transfer from the host to the GPU.
- Avoid using CAV files unless your project needs an animated alpha channel. CAV files will render more load than a h264 file using NVDEC.
- Make sure all inputs and targets use the same frame rate.
- Reducing the frame rate will reduce your server's overall load and allow longer processing times.
- Use 48Khz audio instead of 44.1Khz. This will remove the need for audio transcoding and reduce the load.
- Decklink output is somewhat slow compared to other outputs. If you need Decklink out, try activating the Free-running Targets option (Settings/GPU and Rendering Options) (experimental). This can reduce the workload and decrease the total processing time.
Hardware
- Ensure your capture cards are mounted in an x4 or x8 PCI lane. Using a narrower PCI lane will limit your capture card's throughput, and you might experience audio or video stuttering.
- Generally, the faster CPU (clock speed), the more CPU cores, the better. Single-core performance is more critical than multicore performance.
- A two-way CPU system does not always perform better compared to a single CPU system.
- XEON processors are often more expensive than regular workstation processors from Intel and AMD. However, the higher cost does not always mean higher performance. When choosing a CPU, please check single-core (and multicore) performance using 3rd party comparison tools or websites.
- Running multiple low-cost servers is usually more cost-effective than running a “super server”. An AMD 5950x with Nvidia RTX4000 is one example of a cost-efficient setup.
- In many cases, a server's bottleneck is the host-to-gpu memory transfer. A typical NVidia GPU does not support more than two memory transfers in parallel, no matter how many CUDA cores it has. During this memory copy, the GPU is blocked from other operations. This can result in low GPU load but high processing time.
- Make sure your server does not overheat. If the temperature is high, your server might throttle the GPU, CPU performance, or video capture performance.
Selecting a CPU
As previously mentioned, the CPU's single-core performance is an essential aspect when choosing a CPU for your setup. You can use https://www.cpubenchmark.net/singleThread.html to list the single-core performance of different processors. We strongly recommend using a CPU with a score of at least 3000, a minimum of 8 cores (16 threads), and a multi-core score of at least 19000.
Use multiple memory banks.
- If your system supports 4x memory modules, make sure to utilize all of them. For a 32Gb system, it is better to use 4x8Gb instead of 2x16Gb or 1x32Gb.
Use fast memory modules.
- The faster the memory modules, the better. Higher clock speed improves Composer's overall performance.
- User DDR5 or DDR6 if possible. Use single-rank memory modules instead of double-rank memory modules.
- Ensure your Bios is configured to utilize the full speed of your memory modules. For example, your memory modules might support up to 4800MHz, but your Bios is configured to run them at 2133Mhz.
Selecting a GPU
- Any Nvidia GPU compatible with CUDA 12.2 (or later) is compatible with Composer. We do not recommend the consumer version of GPUs (GeForce) for production setups. The Nvidia RTX series (4000, 4500, 5000, and 6000) is preferred.
- The minimum recommended GPU is the RTX A4000, followed by A4500 and A5000.
- Do not use your GPU for your display. By default, your operating system will likely use the fastest GPU available on your server as the display driver. If your system only has one GPU or display adapter, your OS will share the GPU resource with Composer. This will result in lower performance in Composer and higher processing times, which will also cause processing time to vary over time.
Options:
- If possible, use the built-in graphics capabilities of your CPU (Intel only).
- Add a secondary GPU. AMD Ryzen RX6400 is one such example. It is a compact GPU that does not require external power cables (Molex) and only consumes 55W. The small size and low power consumption allow the GPU to be mounted in almost all chassis. However, your motherboard needs to provide two (2) PCIe 16 slots (full length) to fit both the Nvidia GPU and the AMD GPU.
Currently, Composer does not support the utilization of multiple GPUs.
Disable Hyper-V in Bios
- If your system supports Hyper-V, disable Hyper-V in Bios.
Operating System
Activate screen saver (Windows only)
On a single GPU system that uses the NVidia GPU as a display driver, processing time will become more stable as soon as the Windows Screen Saver gets activated.
Windows 11 - GPU Power Management Mode
The default configuration of your Nvidia GPU uses a power-saving option called “Normal” and is optimized for a balance between performance and energy savings. This setting will reduce the GPU's capacity during low load and increase it during high load. This can be a bit confusing, as an operator's processing time can decrease when the load increases, which is somewhat counterintuitive.
This behavior is not wrong per se, but it may be not very clear if you compare the processing time for a particular operator running on two different systems. A faster system (GPU) can have a longer processing time if the GPU load is low. However, this will change as the overall load increases, and the faster system will eventually have a lower processing time.
Fortunately, there are two ways of avoiding this behavior:
Option 1 (preferred): Lock the GPU clocks in settings.
Option 2: It is possible to change the power-saving option to “Performance,” but this will increase power consumption, fan speed, and noise. Chancing to “Performance” mode will not increase the GPU's maximum capacity.
Use the NVidia Control Panel to adjust the setting. Go to 3D Setting->Manage 3D Settings and either select a Global configuration (all applications), or change the Program Setting for Vindral Composer. Scroll down to the Power management mode option and select the Prefer maximum performance option.
Windows 10 does have this behavior.
Remote access (Windows)
During a Remote Access session (Team Viewer, Remote Desktop, or similar), your GPU might be used by the remote access session itself. This might cause additional GPU load and increase Composer processing time.
As soon as the Remote Access session ends, the load will decrease.
Use the runtime version of Composer.
The runtime version might perform slightly better compared to the Workstation version. This is because the application does not have a GUI. However, if your OS uses a separate GPU for your display, the runtime version will perform very similarly to the Workstation version.
Use the Linux version.
The Linux operating system can be more efficient than the Windows OS.
Monitoring (R4 2023)
The best method of monitoring performance in Composer is to use Grafana and the Extended Monitoring plugin. This plugin provides visual graphs of all essential performance metrics.
Extended Monitoring is licensed separately from Composer. Contact RealSprint for more information.