Demeler Deinterlacer Performance
The
Demeler deinterlacer uses high performance graphics cards to
achieve realtime performance for 1080i input at 60 or 50 frames/sec.
At this performance level, many host CPU system properties can restrict
performance, such as CPU clock rate, number of processors, cache size,
PCIe bandwidth, the number of PCIe lanes, system load, etc.
The amount of motion in the video also has an effect on GPU performance.
For realtime 1080i60 deinterlacing at a high quality level as of November 2013, we recommend an Intel i7
3930K processor (with 40 PCIe lanes) overclocked to 4.2GHz, with three Titan graphics cards, and
one or more Western Digital VelociRaptor disk drives or Samsung SSD 830/840 drives for local video
storage or caching if needed.
Demeler File I/O and Bandwidth Issues for 1080i Deinterlacing
Writing
Demeler output to disk drives may limit throughput for 1080p, even
for reduced chrominance bandwidth
YUV420. See
performance issues
for an in-depth discussion on this subject. The output
rates in the
Demeler throughput table below include using
y4mzip
to read pre-compressed files from an SSD drive, pipe the resulting
uncompressed output into
Demeler, pipe
Demeler output through
y4mzip for
compression, and finally write the compressed output to a Samsung SSD 830 solid-state drive. In the table, we give
average input fields/sec figures for our suite of 1080p
test sequences (with
Meler used to provide interlaced input to
Demeler for testing) and for the listed card
configurations. Like many software-based algorithms, the processing
times for both compression and deinterlacing are image-content-dependent. The standard
deviation of throughput variation on our tests is about 6% of the average.
The table below is for graphics cards as delivered (no further overclocking).
Card
|
Nr. of cards
|
Avg 1080p output frames/sec, deinterlacing file-to-file
|
GTX 690
|
1
|
50
|
GTX 690 |
2(b) |
98
|
Titan
|
1
|
41
|
Titan |
2(b) |
80
|
Titan |
3(b) |
116
|
Notes:
(a) 720i input field rates are about 2.2x the rates given for 1080i.
(b) At the time of writing, multiple GTX cards in
SLI mode gives
no CUDA performance improvement over a single card.
With SLI
disabled,
the software automatically
detects the number of graphics cards, and seamlessly partitions video
processing. Partitioning gives performance almost linear with the
number of graphics cards up to a CPU compute limit. We have verified
1080i deinterlacing performance using a host i7 3930K CPU
(6-core) overclocked to 4.2GHz, with 16 lanes of PCIe 2.X to each of two
graphics
cards, and have also tested a third card with just eight PCIe 2.X lanes.