GPU based motion estimation

Motion Estimation, also known as motion vector estimation, is an important method to forecast and / or analyze motion in 2D-Data. It is mostly used in video compression. Examples herefore are the H.263 oder MPEG-encoders that make use of the so called “interframe coding”. Therefore, a reference frame is taken, subdivided in “macroblocks”, typically sized 16×16 pixels, and the movement to the actual frame is calculated. With this data, the next frame(s) could possibly been forcasted – so only the motion vectors have to be transmitted or stored, not the whole frame.

To distinguish between several macro-blocks and find their appearence in the neighbour-blocks while moving, several approaches have been suggested through time. Most typical, the sum of absolute difference (SAD) and different error-values such as mean absolute errors (MAE) or mean squared error (MSE) are used as characterizing values. Most of these values are not measurable by the human eye. Either luminace, hue or a chosen mixture can be used to calculate those values. To get the motion vetor, only this with the most significant affinity to the reference block is taken in the search-space that might be a 64×64-pixel area. Higher search-spaces result in better vectors, but increase computing time drastically.

Because the calculations are mostly a huge amount of integer or float comparisions, the compression – or motion vector estimation – needs a lot of computing power when done on a single CPU. That is where the idea was born to use the highly parallel architecture of modern graphics hardware to speed thing up.

Using existing shader programming techniques, it is possible to speed up the motion estimation by factors. The only limitation is the graphics hardwares fillrate – limiting the maximum size for a single frame. This has to be taken account of when computing HD-material in resolutions of 2K or even 4K.

Pictures on this page are screenshots of a development version that uses the rather old and not very speed-optimized avifile-library to decode the sample-avi-files. The goal for this research was to get motion estimation done in realtime, so I was in need of “realtime”-material, meaning a movie of 24 or 25 fps.

The large-area, green part marks a vertical movement from up to down. Upper left in the picture you see a tombstone that is the cause for this. (clip from "Nightmare before Christmas")

A cpu-based sample implementation needed 0.5 to 0.3 seconds per frame on 512×512 data on a Athlon XP 2500+ and 1 Gb Ram. The same system needed as long as 7 to 8 seconds for 2K-data! By optimizing things a bit and using shader-based algorithms on a rather old GeForce 6600, the speed improvement was huge: 0.04-0.05 s with 512×512, 0.5-1.0s with 2K, that means as much as up to 600 % speed increase! And the only thing done here was moving comparison-operations to the graphics hardware. The calculation of macroblock-values was prepared by the CPU. After moving this calculation to the CPU to, the results were an additional up to 400 % faster – giving 70-80 fps with 512px, 8-10 Hz with 2K material.

Noticable is Sam in the cornfield. His head turns from left to right, thus this area is marked red. (Clip from "Lord of the Rings")

The most interesting challenge is to get large image material of 2K (2048×2048 pixel) moved to the graphics card and computed quickly enough to reach real time. Several methos could be imagined to aquire this, for example breaking larger images up to several smaller images and doing more passes. But those things are beyond the scope of this work. I have successfully shown that motion estimation is doeable much faster with graphics-hardware aid.

The work on libMotionEstimation was done in collaboration with Thomson Grass Valley is protected through a discretion contract – while still being a non-commercial research project. Thus, no sourcecode or more specific papers can be released.

Comments are closed.