An Idea for Doing Background Removal from a Sequence of Stationary Images, Manual-Style [Updated]

Update: Finally looking at implementing this and I realize that thinking of that fully populated tree is probably good for understanding it, I don’t need to store anything but the left edge. When a new frame comes in, I will calculate a new left edge based on the new frame and the previous left edge.

My memory requirements for a size N triangle are then N-1 (I don’t need to save the result if no one will ask for it again) and while calculating I need to store N-frames plus whatever my image processing library uses, etc. The fact this scales linearly with the length of my background history is nice, I can go long for cheap. The time to calculate does scale with the length of the history, but still linear.

Another thought: natural vision systems pretty much only see change, make something stand still long enough and it will go away. It might make sense to spend the linear time and memory to compute a long history, but allow the caller to choose how quickly stationery objects disappear; compute the whole left edge to maintain the chain of history, but choose to look at a more recent step.

A final correction: This is not really parallelizible, the library doing the underlying image processing could well parallelize, but these steps need to be done in sequence.

Back to the original post…

[Warning, this completely techie, musing about computer vision by someone who doesn't really know much about computer vision. But heck, sometimes those who don't know the right way to do something occasionally come up with something cool.]

How about something like this. Maintain a triangular poly tree where at the base is a history of recent frames.

                              x
                             / \
                            x   x
                           / \ / \
                          x   x   x
                         / \ / \ / \
                        x   x   x   x
                       / \ / \ / \ / \
                      x   x   x   x   x
                     / \ / \ / \ / \ / \
                    x   x   x   x   x   x
Newer              / \ / \ / \ / \ / \ / \              Older
 <-               x   x   x   x   x   x   x               ->
                 / \ / \ / \ / \ / \ / \ / \
                x   x   x   x   x   x   x   x
               / \ / \ / \ / \ / \ / \ / \ / \
              x   x   x   x   x   x   x   x   x
             / \ / \ / \ / \ / \ / \ / \ / \ / \
            x   x   x   x   x   x   x   x   x   x
           / \ / \ / \ / \ / \ / \ / \ / \ / \ / \
          x   x   x   x   x   x   x   x   x   x   x
         / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \
        x   x   x   x   x   x   x   x   x   x   x   x
       / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \
      x   x   x   x   x   x   x   x   x   x   x   x   x
     / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \
    x   x   x   x   x   x   x   x   x   x   x   x   x   x
   / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \
  x   x   x   x   x   x   x   x   x   x   x   x   x   x   x
 / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \
x   x   x   x   x   x   x   x   x   x   x   x   x   x   x   x

A 16-base tree then has 120 vertices in it. The 16 Xs along the base are 16 historical frames. At 5fps, this covers a history of three seconds.

The first row of Xs above the base is made by taking the two images below it and doing:

an absdiff;
a threshold of the result to make contours of the areas in common; and
using those contours, a masking of one of the frames to make masked image of what is in common between the two.

This is a reduction operation. We start with whole frames and we produce new frames that are at most whole frames, but quite likely reduced (masked) frame areas.

At this point every pixel that has made it up to the second row is in some sense a good pixel, it has matched some other pixel.

We create the rest of the tree by continuing to do pair-wise operations on the images below, but the operation for the rest of the tree is a bit different from the first of operations.

To begin with we do do the same operation, we do the matching and reduction (for any area in both masks, if the pixels match the they get added to the output mask sent up to the next level).
But then we do a supplementing operation: for any pixels in one input mask but not the other input mask, they get added to the output mask and included in the output sent up to the next level.This continues at each layer to yield one masked image at the top.

I won’t know what this looks like until I see it, but imagine something moving through time, casting a shadow from under the pyramid, maybe tampering with, say 6-frames. Looking up the tree, it can only influence the triangle above it for 5-layers up, then is gets out-voted by constant stuff from before or after it in time.

This poly tree scheme is expensive and so can only go back a short distance in time. The number of operations in that whole triangle, to compute the apex is great, too much of a cheap CPU to calculate per frame at any reasonable frame rates. So instead we trade memory for CPU. We keep most all the output data from each new frame’s computation, and just computer the change for each. What is that computation of change?

- Age out the oldest frame: remove the 16 Xs that go down the right edge;

- Add the newest frame: a new X on the left at the bottom row; and

- Do the 15 calculations necessary to put 15 new Xs up the left edge above that new frame.

Motivation: I have played with OpenCV and the cv2.BackgroundSubtractorMOG() and cv2.BackgroundSubtractorMOG2() background removal functions and I don’t like them.

First, they aren’t working for me: old background information never ages out, a big change in scene continues to be included in the foreground and never displaces the old background.

Second, they are too slow. Particularly MOG2. I can’t keep up with a reasonable frame rate on a Raspberry Pi 2.

Falling back on a simple absdiff for motion detection I discovered I can, in place of the MOG2 that was falling behind, do a stupid loop of 30 absdiff’s and not fall behind. With this scheme I estimate I will have to do about that much work. And, unlike MOG2, this can be parallelized to multiple threads which can run on the multiple CPUs of a Raspberry Pi 2.

There are probably better ways, but it was easier to think up this one than to go read a couple books on computer vision. And it looks pretty easy to build. I just need to find the time to try it. And how do I efficiently represent a polytree in Python without breaking my brain. Will it be easy or hard?Will a nice recursive model work…?

-kb

Tags: background removal, BackgroundSubtractorMOG, BackgroundSubtractorMOG2, cv2, opencv, triangular poly tree

This entry was posted on Monday, January 11th, 2016 at 19:54 and is filed under Tech. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

borg.org

An Idea for Doing Background Removal from a Sequence of Stationary Images, Manual-Style [Updated]

Leave a Reply