Demeler Deinterlacer Performance


The Demeler deinterlacer uses high performance graphics cards to achieve realtime performance for 1080i input at 60 or 50 frames/sec. At this performance level, many host CPU system properties can restrict performance, such as CPU clock rate, number of processors, cache size, PCIe bandwidth, the number of PCIe lanes, system load, etc.

The amount of motion in the video also has an effect on GPU performance. For realtime 1080i60 deinterlacing at a high quality level as of November 2013, we recommend an Intel i7 3930K processor (with 40 PCIe lanes) overclocked to 4.2GHz, with three Titan graphics cards, and one or more Western Digital VelociRaptor disk drives or Samsung SSD 830/840 drives for local video storage or caching if needed.

Demeler File I/O and Bandwidth Issues for 1080i Deinterlacing


Writing Demeler output to disk drives may limit throughput for 1080p, even for reduced chrominance bandwidth YUV420. See performance issues for an in-depth discussion on this subject. The output rates in the Demeler throughput table below include using y4mzip to read pre-compressed files from an SSD drive, pipe the resulting uncompressed output into Demeler, pipe Demeler output through y4mzip for compression, and finally write the compressed output to a Samsung SSD 830 solid-state drive. In the table, we give average input fields/sec figures for our suite of 1080p test sequences (with Meler used to provide interlaced input to Demeler for testing) and for the listed card configurations. Like many software-based algorithms, the processing times for both compression and deinterlacing are image-content-dependent. The standard deviation of throughput variation on our tests is about 6% of the average. The table below is for graphics cards as delivered (no further overclocking).

Card
Nr. of cards
Avg 1080p output frames/sec,
deinterlacing file-to-file
GTX 690
1
50
GTX 690 2(b) 98
Titan
1
41
Titan 2(b) 80
Titan 3(b) 116

Notes:
(a) 720i input field rates are about 2.2x the rates given for 1080i.
(b) At the time of writing, multiple GTX cards in SLI mode gives no CUDA performance improvement over a single card.

With SLI disabled, the software automatically detects the number of graphics cards, and seamlessly partitions video processing. Partitioning gives performance almost linear with the number of graphics cards up to a CPU compute limit. We have verified 1080i deinterlacing performance using a host i7 3930K CPU (6-core) overclocked to 4.2GHz, with 16 lanes of PCIe 2.X to each of two graphics cards, and have also tested a third card with just eight PCIe 2.X lanes.

Demeler page