Untangling Frame Rates for a Worldwide High Quality Internet TV Service
The Input Frame Rate [1]
This is the frame rate for the input video stream or file in frames per
second. In input files or streams, the rate may be buried in "metadata", informing a
display system how to play it at the correct average rate. Playing onto a
screen at this rate gives "live" playback (i.e. not "fastmo" or "slowmo").
The Converted Frame Rate [2]
This is the frame rate converter output frame rate, such that playing back
converted video onto a screen at this rate gives live playback. Note: this
value is NOT the same as the speed at which the converter can physically
generate these converted frames (see
[4] below)! For help with
understanding the algorithmic process of frame rate conversion, follow this link:
frame rate conversion
The Legato command
line option:
-r <converted_frame_rate>:<input_frame_rate>
specifies the integer ratio corresponds to
[2] / [1], allowing
control of this aspect of the frame rate converter by the server. Note:
the frame rate processing to convert from input frame rate
[1] to
[2] is dependent only on this ratio. It is purely an algorithmic
process, with some similarities to other types of signal resampling.
However, frame rate conversion is much more complex and compute intensive.
The Refresh Frame Rate [3]
This is the rate at which the client device updates its own display. It is
usually a fixed rate for each device. It is assumed that a client device
knows its own display refresh rate. For large Internet TV displays, it
goes by region: 50Hz in Europe, and "60" (actually 60,000/1001=59.94...)
Hz in North America. For some cell phones, it may be a custom, much lower
rate. In any case, as a client of the content server, the client device
can inform the server of its refresh rate when it requests video.
The Render Rate [4]
The render rate is the target number of new frames per second the
server sends video to the client device. This render rate is embedded
in the served video as metadata back to the client. The client display
device then does its best to render the video based on its refresh
rate. For example, at
[3] = 60Hz refresh rates, smooth results can be obtained for a small number of fixed integer
n=1,2, or 3 repetitions of each render frame at the refresh rate, i.e.
[4] = [3] / n. Bigger display devices generally need higher render rates (i.e.
n=1, so
[4]=60) in order to reduce motion judder. Cell phone display judder may be acceptable with a lower render rate than its refresh rate.
The client device is then supposed to attempt to render at this average
rate, and should do a good job if the server uses the correct
calculations above. The value of
n may
depend on server load, network bandwidth, free service versus paid
subscription, or other considerations. In any case, the client player must be
able to play the video at the metadata render rate sent by the server.
In addition, if
n is not integer, low
frequency components may become visible on the client display, which the eye
also interprets as judder. The Legato option:
-O <render_numerator>:<render_denominator>
(uppercase 'O', not zero) defines the output render rate metadata as an integer
fraction. If the denominator above is equal to 1 (i.e. the
render rate is an integer number of Hz), then the ':1' can be omitted.
Bringing it all Together
The client device usually buffers some number of frames prior to starting
playback. Buffering is an attempt to get smoother playback due to
temporary Internet delivery pauses. O
nce
the client device starts to play, then if its buffer gets too empty it
can request more frames from the server, and if its buffer gets too
full, it can temporarily stops the server from sending new video data.
The important thing is that the average rate of frame delivery
corresponds to the render rate, and that the buffer is big enough so
that it does not frequently empty or overflow. Long network connection
latencies, or data throughput reduction may cause problems with
buffering.
If the server cannot create converted frames at the average rate needed by the
display device, then the buffer will empty and the playback will freeze or
judder.
If the server is much faster than the render rate to the client, then the server will spend more
time waiting for the client device to send a request for more data. This
is a good thing - it allows the server to perform other tasks, such as
serving other clients, while appearing to give the client its undivided
attention.
As part of their initial request, the client could specify a
slowmo value. A slowmo value of 2 means it plays twice as slow, and a slowmo of
1/2 means it plays twice as fast. If a render rate has been chosen by the server, then:
converted_rate [2] = slowmo * render_rate [4]
For live video streaming, slowmo should be 1, (i.e. the converted
rate
[2] is chosen to equal the render rate
[4]) because:
a) the input video cannot be told to stop or change, and
b) the client display render rate cannot easily change without
reprogramming the frame rate converter, potentially introducing judder or
causing client buffer over/underflow.
In a high quality live video streaming situation,
[2] =
[4] =
[3] at the server. Furthermore, if the input video
metadata is already at the desired converted rate (i.e. so no frame
rate conversion is needed), then
[1] =
[2]
, and no frame rate conversion is needed! The computational load is a
lot less in this (hopefully common) case, allowing a server to handle
many more typical requests than its GPU limits would normally impose.
For file input, the slowmo value can be anything you like, without
any buffer over/underflow considerations.
What happens without frame rate conversion?
The input frame rate
[1] may be sent to the client's device directly. If
the input frame rate matches the client's display refresh rate, fine.
Otherwise the display device drops or duplicates frames to match the
average input rate buried in the sent video metadata. This causes visible
judder, and may waste bandwidth if the client system drops frames.