What You Will Learn

  • How NVENC hardware encoding works and its prerequisites
  • The basic usage of h264_nvenc and hevc_nvenc
  • How to choose preset and quality options (-rc, -cq, -b:v)
  • A speed and quality comparison against CPU software encoding
  • Common errors and how to fix them

Tested with: FFmpeg 6.1 (NVIDIA GPU environment)
Target OS: Windows / Linux (a CUDA driver is required)


What Is NVENC

NVENC is the hardware video encoder built into NVIDIA GPUs. Compared to CPU software encoding, it is dramatically faster and consumes virtually no CPU resources. The trade-off is that it tends to be slightly less efficient at compression than software encoding.

ItemNVENC Hardwarelibx264 Software
Speed5–10× fasterBaseline
CPU usageLow (GPU processing)High
Compression efficiencySlightly worseExcellent
Supported codecsH.264, H.265, AV1 (RTX 40 and later)H.264

Prerequisites

To use NVENC, you need the following:

  1. An NVIDIA GPU: Kepler generation (GTX 600) or later
  2. NVIDIA driver: the latest version is recommended (Linux: 520 or higher)
  3. An NVENC-enabled FFmpeg build: confirm that the H264/HEVC NVENC encoders appear with ffmpeg -encoders | grep nvenc

Checking whether NVENC is available:

※ This command requires an NVIDIA GPU environment
ffmpeg -encoders | grep nvenc

Basic h264_nvenc Command

The simplest NVENC encode:

※ This command requires an NVIDIA GPU environment
ffmpeg -i input.mp4 -c:v h264_nvenc -c:a copy output_nvenc.mp4

When specifying a quality parameter (-cq) (0–51, lower is higher quality):

※ This command requires an NVIDIA GPU environment
ffmpeg -i input.mp4 -c:v h264_nvenc -rc vbr -cq 23 -c:a aac -b:a 128k output_nvenc.mp4

hevc_nvenc (H.265)

Reduces file size compared to H.264 at equivalent quality:

※ This command requires an NVIDIA GPU environment
ffmpeg -i input.mp4 -c:v hevc_nvenc -rc vbr -cq 28 -c:a aac -b:a 128k output_hevc_nvenc.mp4

Presets

NVENC presets control the balance between speed and quality:

※ This command requires an NVIDIA GPU environment
ffmpeg -i input.mp4 -c:v h264_nvenc -preset p4 -rc vbr -cq 23 output.mp4

With newer drivers you use -preset p1 (fastest) through -preset p7 (highest quality). With older drivers you use fast, medium, and slow.

PresetSpeedQuality
p1 / fastFastestLower
p4 / mediumBalancedStandard
p7 / slowSlowHigh

Combining GPU Decoding + NVENC Encoding

By performing decoding on the GPU as well, you can lower CPU load even further:

※ This command requires an NVIDIA GPU environment
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -c:v h264_nvenc -preset p4 -rc vbr -cq 23 output.mp4

Specifying -hwaccel_output_format cuda lets you pass the decoded frames in GPU memory directly to NVENC.


Bitrate-Targeted Encoding

When you want to target a specific bitrate:

※ This command requires an NVIDIA GPU environment
ffmpeg -i input.mp4 -c:v h264_nvenc -rc cbr -b:v 4M -maxrate 4M -bufsize 8M output.mp4

Common Errors and Fixes

No NVENC capable devices found

  • The NVIDIA driver is outdated or not installed
  • Check whether the GPU is recognized with nvidia-smi

NVENC: OpenEncodeSessionEx failed: out of memory (10)

  • Another process has used up all available NVENC sessions
  • NVIDIA caps concurrent NVENC sessions per board; the limit varies by GPU, driver, and product line, so check NVIDIA’s official support matrix for your card

Cannot load libcuda.so.1 (Linux)

  • The CUDA library path is not set
  • Verify with ldconfig -v | grep cuda, or check whether it exists in /usr/lib/x86_64-linux-gnu/

Choosing Between Software and Hardware Encoding

Use CaseRecommendation
Archiving / high qualitylibx264 / libx265 (software)
Real-time / live streamingNVENC
Batch conversion / time-criticalNVENC
Environment without a GPUSoftware encoding

Speed and Quality Ballparks (Typical Ranges)

These figures vary widely with GPU generation, input resolution, and preset, so treat them as the typical ranges observed in public benchmarks rather than a single managed result.

  • Encode speed: An RTX 3060-class GPU often encodes 1080p H.264 at roughly 8–12× realtime (several hundred fps), an order of magnitude faster than libx264 -preset medium (CPU), which typically runs around 1.5–3× realtime. At 4K, expect this to drop to roughly 2–4×. Exact numbers depend on GPU, driver, content, preset, and resolution.
  • Quality trade-off: At a matched quality target, NVENC generally needs a higher bitrate than libx264 to reach the same VMAF; the exact gap depends on the generation and the content. It narrows with newer generations, and Ada Lovelace (RTX 40) is reported to close much of it.
  • Where it wins: Because the speed difference is an order of magnitude, streaming, large batch jobs, and deadline-driven work are effectively NVENC-only. For one-time archival encodes you keep long term, the better compression efficiency of libx265 is worth the extra time.

Common Pitfalls

-preset slow has no effect / presets seem ignored

  • Symptom: Specifying -preset slow doesn’t give the speed/quality balance you expected.
  • Cause: Both naming schemes are valid in modern FFmpeg. The old fast/medium/slow names remain as compatibility aliases, but the p1p7 scheme is the current, more predictable control.
  • Fix: Prefer the new presets for reproducibility, e.g. -preset p7 -tune hq. Use p6p7 for high quality, p4 for balanced, and p1p2 for low latency.

B-frames added but compression doesn’t improve

  • Symptom: Adding -bf 3 barely reduces output size, or is ignored.
  • Cause: HEVC B-frame support is generation-dependent — it was added with Turing (RTX 20), so older GPUs may not apply HEVC B-frames. (H.264 B-frames are broadly supported across generations.)
  • Fix: Check your GPU generation and combine -b_ref_mode middle -bf 3. If HEVC B-frames don’t take effect on an older GPU, lean on bitrate or use H.264 to hold quality.

Encoding two or more streams at once fails with out of memory (10)

  • Symptom: Running several parallel NVENC processes in a batch fails once you exceed the board’s session limit with OpenEncodeSessionEx failed.
  • Cause: NVIDIA imposes a per-board cap on concurrent NVENC sessions that varies by GPU, driver, and product line (consumer GeForce currently around a dozen; professional cards vary, some unrestricted). The exact number is not fixed across the lineup.
  • Fix: Check NVIDIA’s official support matrix for your specific GPU/driver/product line to find the concurrent-session cap, then limit parallelism accordingly or update the driver. Producing multiple outputs within a single process can also be more stable.

CPU-decode → GPU-encode isn’t as fast as expected

  • Symptom: You switched to NVENC but CPU usage stays high and speed plateaus.
  • Cause: Decoding is still done in software (CPU), and the CPU→GPU frame transfer becomes the bottleneck.
  • Fix: Add -hwaccel cuda -hwaccel_output_format cuda on the input side so decoding also happens on the GPU and frames are handed to NVENC in GPU memory. This removes the transfer cost and lowers CPU load.


Primary sources: trac.ffmpeg.org/wiki/HWAccelIntro / ffmpeg.org/ffmpeg-codecs.html


FAQ

How does NVENC compare to libx264 in image quality?

At the same bitrate, NVENC is slightly inferior. However, the latest NVENC (Ada Lovelace generation and later) comes close to the level of libx264 medium, and given the speed it is more than sufficient in practice.

NVENC isn’t working

Make sure the GPU is made by NVIDIA, the driver is up to date, and your FFmpeg build includes --enable-nvenc. You can verify this with ffmpeg -encoders | grep nvenc.

Should I use CRF or CQ?

With NVENC you use -cq (-crf is for libx264). The value ranges from 0 to 51, and just like libx264’s CRF, smaller values mean higher quality. 19–23 is typical.

Should I use B-frames?

Yes. Enabling B-frames with -b_ref_mode middle -bf 3 significantly improves compression efficiency. Note that HEVC B-frame support is generation-dependent (added with Turing), so on older GPUs HEVC B-frames may not apply.

Can I do streaming delivery with NVENC?

Yes, it is ideal for low-latency streaming. You can minimize latency by combining -tune ll or -tune ull (ultra-low-latency) with -preset p1.