USE OF KHOROS FOR DEBLURRING TECHNIQUES IN VIDEO IMAGE PROCESSING

 

Keywords: maximum entropy, total variation, image restoration

AUTHOR: ZENO GERADTS

Netherlands Forensic Institute

Volmerlaan 17

2288 GD Rijswijk
Netherlands

http://forensic.to

 

Email zeno@holmes.nl

 

 

ABSTRACT

In the last decade many systems for forensic images have been developed. Further in banks, shops and roads many video camera's are making images of robberies and other criminal activities. Often images are blurred due to defocusing, movement or poor lighting condition. For this reason often the question for deblurring the images is requested at forensic laboratories.

In this research report different methods for deblurring of digital images are compared Since the computer speed is getting faster each year, more complex methods for deblurring can be used. In the last five years many new methods for deblurring images are developed. They are developed for aerospace and medicine research (Maximum Entropy method) or for military purposes (Total Variation). A comparison is made with other methods as Wiener-filtering. In preliminary research it appeared however that simple methods as local contrast enhancement might give a better result.

It appears that the Maximum Entropy-method in combination with a premodel of the estimated blur function, will give a better result than the Wiener restoration. The resulting image is more stable. Further for larger areas the Total Variation method works well for noise suppression. Sometimes simpler methods can be more suitable for deblurring the images. A problem in forensic images is that often the blur function is not known. More efforts should be made to improve the Total Variation Method and reduce the computing time.

 

1         Table of Contents

 

ABSTRACT. 1

1      Table of Contents. 1

2      Introduction. 2

3      Blur and Noise. 3

4      Maximum Entropy. 3

4.1       Introduction. 3

4.2       Classical Maximum Entropy. 3

4.3       Bayesian probability theory: the Cox axioms. 4

4.4       The axioms of maximum entropy. 4

4.5       Quantification. 6

4.5.1        Choice of regularization parameter 7

4.6       Premodel by Gull 8

5      Minimal Total Variation. 9

5.1       MTV definitions. 10

5.2       Discretization. 13

5.3       MTV algorithm.. 15

6      Experiments. 15

6.1       Image processing package Khoros. 15

6.2       Images for experiments. 16

6.3       Methods used in Experiments. 16

6.3.1        Experiment with Wiener-filter 17

6.3.2        Experiment with Cannon-filter 17

6.3.3        Experiments with Maximum Entropy. 18

6.3.4        Experiments with MTV.. 19

7      Conclusions. 20

 

2         Introduction

In forensic investigations, often the visual information is kept in for example handwriting, hairs, fibers, fingerprints, toolmarks and video-images. At present more often the images are stored and digitized by an image acquisition system. By acquiring the image in the system, there will be often some degradation of the image in the system. For that reason, image restoration and enhancement techniques for improving degraded forensic images attract high attention. Under some conditions these techniques are able to enhance obscured, blurred or noisy images. In practice sometimes the following degradations can be restored:

·        Over- and Under Exposure

·        Pictures Taken out of focus

·        Point Spread Function of the lens

·        Smear by movement of camera or subject

·        Grain structures

·        Noise from system

·        Jitter in the video system

 

Since the techniques for over- and under-exposure are well known, in this report the techniques for the Gaussian blur and noise blur will be handled.

 

In the past often linear techniques (Wiener filter) have been applied to restorate the image. From aerospace research it appeared however that non-linear methods as the Maximum Entropy-method might be more appropriate to restore images. Further Rudin et.al. have published on the Total Variation-techniques to restore blurred image. Many new improvements on those methods have been developed. The developers of those algorithms claim that the results are better than the Wiener restoration or linear techniques. This is explained in the fact video-systems and photographic emulsions have non-linearity.

This research report is a result of a literature and experimental study of two months at the National Research Institute of Police Science in Japan by Zeno Geradts and implementation of the MTV-algorithm at the Netherlands Forensic Science Laboratory. The background and the principles of the different methods are described and some results of artificial and real cases are given. The reason for choosing the non-linear methods will be given. In the first paragraphs the physical backgrounds of blur and noise will be given.

 

The development in integrated image processing systems in new video system also plays an important role in the acquisition of those images. For this reason attention is given to image processing algorithms that are implemented or expected in image acquisition systems. Other sources for degradation in acquiring the digital images are described.

Further the practical implementation of the algorithms in our imaging software is given. A short manual on installation procedures and use of the system is described. Some examples are given for actual cases and simulated cases.

In the final part of this report conclusions and suggestions for further research are given.

 

3         Blur and Noise

In forensic investigation nowadays often images are acquired by digitizing images. Also in image databases as the system Drugfire in which cartridge cases of firearms are stored. Another example are video images that are taken from for example bank robberies. In this system blur and noise can be added in the acquired image.

Since the astronomical, military and medical researcher could use parallel computers, they developed many new algorithms. Nowadays the faster computers are also available for forensic laboratories and the algorithms can be used for actual cases and databases. In forensics research often the whole procedure of image processing has to be proven in court. For this reason it is important to know the background of the source for blur and noise. The results of image processing should be handled with great care and qualified examiners should interpret the results of the image processing algorithms.

The next sources for blur are often described in literature:

·        Motion blur caused by movement of the system or object

·        Gaussian blur caused by defocusing of the system

·        Blur caused by air pollution

·        Blur caused by the lens system

 

Also the next sources for noise are well known:

·        Quantization noise that is caused because the data in each pixel (separate element of an image) is quantified in grey-values from 0 till 255.

·        system noise (white noise), for example noise which is a result of under lightening

 

In the past few decades many image restoration and reconstruction algorithms have been developed. Most of those methods where applied in the astronomy or medical equipment like X-ray photographs. In astronomy the Gaussian blur often can be determined from the system. This is caused because in practice the images of many objects (for example stars) can be used to determine the blur function. In forensic imaging however often the Gaussian blur function is not known. The shape of the blur function is the other unknown parameter.

4         Maximum Entropy

 

4.1    Introduction

In this chapter the maximum entropy will be explained. First the basics of the classical maximum entropy will be given as described by Gull et.al.. Since we use a specially developed toolbox from the NASA research fund for the maximum entropy, the next models will be explained in more detail:

·        Classical Bayesian maximum entropy

·        Premodel by Gull

·        Chi-method of determining regularization parameters by Thompson et.al..

 

The maximum entropy-filter will guarantee positive restoration and is based on modeling the object as a probability density function through Maxwell-Boltzmann statistical arguments. If the object f is normalized to unit energy, then each fi scalar value can be interpreted as a probability (possibly the probability that grey-value is present at the i-th sample of the object). Then the entropy of the object would be given by: Equation 3-1

 

4.2    Classical Maximum Entropy

In this chapter the Maximum Entropy is explained further. Since the maximum Entropy algorithm requires many higher order functions and numerical approximations, this chapter will be rather mathematical. First the base of the Bayesian Maximum Entropy is described, after this the mathematical part is described in detail.

A simple mode of reasoning allows us to construct general theories. If there is a general theory at all, it must apply to particular cases. In particular, if we already know the answer for the simple cases, this constrains the general theory by falsifying all those which gave the wrong answer. If enough such cases can be found to constrain the general theory completely, then there will not be any freedom be left and the theory will be fully defined.

In real life general theories have some "loose ends", such as the cosmological terms in the general relativity. Those loose ends are allowed by experimental errors on the constraining observations. However, sometimes there are some real loose ends in the theory.

 

This mode of reasoning will be used three times here, leading to the Bayesian probability theory, the maximum entropy (MaxEnt) and finally to images. Firstly, if there is a general language for assumptions, it must be that of ordinary probability theory, with our assumptions being quantified as probabilities. So we will use the probability theory . However, probability theory describes how we must modulate our assumptions in the light of evidence, but it does not tell us how to assign the prior probabilities that we need in order to start the scheme. Secondly, if there is a general way of assigning positive additive distributions (such as probability distributions), then it must be MaxEnt: one source of this argument is Shore and Johnson. This argument does not address in the reliability of MaxEnt distributions, in the sense that the distributions are not quantified. Thirdly if there is a general quantification of distributions, their probabilities must be exponential in their entropy. Frieden, Gull and Daniel prove this

 

Of course it remains possible that there is no general theory, in which case all arguments would break down. Different problems need different theories. Although in sociological studies are analyzed in different ways, there are no known examples in which Bayesian methods give a demonstrably incorrect result.

The classic MaxEnt formulae in fact will give much freedom, and the underlying mathematics can be turned into advantage.

The following chapters will summarize the use of Classic MaxEnt in the real image reconstruction.

 

4.3    Bayesian probability theory: the Cox axioms

 

In science we will base theories based on experiences in the past. Logical reasoning, added by mathematics, is the intellectual tool we bring in for making the new theories applicable to our experiences.

It is well accepted in science that if we have the different possibilities' k, l, m, n, a minimal requirement is that we are able to rank our preferences consistently

.Equation 3-2

In this way the transitive ranking can be mapped onto real number, by assigning numerical codes P(k), P(l):Equation 3-3

If there is a general common language it must apply for simple cases. The next axioms are given by Cox and are difficult to argue against:

 

Axiom A:

If we first specify our preference for k being true, then specify our preference for l being true (given k), then we have implicitly defined our preference for k and l together.

 

Axiom B:

If we specify our preference for k being true, then we have implicitly specified our preference for its negation k

Considering these mild requirements, Cox showed that there is a mapping of the original code P into other codes' pr that obey the usual rules of the probability theory.


Equation 3-4

Equation 3-5

 

Although many efforts are expended on arguments to contradict axioms A en B with an axiom C, no such contradictory axiom has been demonstrated yet. Bays' theorem itself tells us how to modulate the probabilities in accordance with extra evidence. It does not assign probabilities in the first place. For this we use MaxEnt.

 

4.4    The axioms of maximum entropy

The probability distribution p(x) of a variable x is an example of a positive additive distribution. It is positive in construction. It is additive in the sense that the overall probability in a domain D equals the sum of the probabilities in any decomposition into sub domains, and we write it as: Equation 3-6

In an optical image (with incoherent light) we will have also a positive, additive distribution.

It is simpler to investigate the general problem of assigning a positive additive distribution (PAD) then the specific problem of probability distribution.

 

We investigate the general problem of assigning PAD f(x), given some definitive but uncompleted constraints on it. If there is a rule for assigning a PAD, then it must give sensible results in simple cases. The "four entropy-axioms"- so-called because they lead to the entropic formulae relate to such cases. Proofs of the consequences of the axioms as formulated are given by Skilling.

 

Axiom I : "Subset Independence"

A separate treatment of individual separate distributions should give the same assignment as joint treatment of their union. More formally: if constraint C1 applies to f(x) in domain x D1 and C2 applies to a separate domain x D2, then the assignment

should give: Equation 3-7

where f(D|C) means the PAD assigned in domain D on basis of constraints C. The consequence of this constraint is that the PAD f should be assigned by maximizing over f some integral of the form: Equation 3-8

: function as yet unknown

m: Lesbesgue measure associated with x which must be given before the integral can be defined

The effect of this basic axiom is to eliminate all cross-terms between the different domains.

 

 

Axiom 2: "co-ordinate invariance"

The PAD should transform as a density under co-ordinate transformations. The consequence of this axiom is that the PAD should be assigned by maximising over f some integral of invariants:Equation 3-9

And still is unknown. The most important axiom is next.

 

Axiom 3: System Independence

If a portion q of a population has a certain property, then the proportion of any sub-population having that property should properly assign as q. For example if 1/3 of the people have blue eyes in the Netherlands, then the proportion of the left handed people having blue eyes should also be assigned a value of 1/3.

 

The consequence of this axiom is that the only integral of invariants whose maximum always selects these assignments, regardless of any other subdivision that may be present, is:Equation 3-10

where c is a constant

Axiom 4: Scaling

In the absence of additional information, the PAD should be assigned equal to the given measure (instead of being merely proportional). This a practical condition, since this is more convenient than that it is a deep requirement.

Consequence: The PAD f should be assigned by maximizing over f: Equation 3-11

m(x)dx: ensure that the global maximum of S, at f(x) = m(x) is zero, which is both convenient and

required for other purposes ix

Caused by the entropic form, we call S as defined in Equation 3-11 the entropy of the positive, additive distribution f. It reduces to the usual cross-entropy formula: Equation 3-12

if f and m are normalized.

The MaxEnt-entropy gives sensible results in simple cases, so if there is general assignment method, it should be MaxEnt, since no other axiom 5 has been proven yet.

The arguments above do not proof that MaxEnt assignment is reliable. Would a slightly different PAD be much more inferior? Furthermore experimental data are usually noisy, so they do not constitute testable information about a PAD-function. Noisy images will define the likelihood or conditional probability p(data|f) as a function of f. In a proper Bayesian analysis we need the quantified probability p(f), because we have to set a measure m.

 

4.5    Quantification

The reliability of an estimate is usually described in terms of ranges and domains, leading us to investigate probability integrals over domains V of possible PADs f(x), digitized for convenience into r cells as (f1, f2..fr) .Equation 3-13

Where M(f) is the measure on the space of PAD's. By definition the single PAD we most prefer is the most probable. We identify this with the PAD assigned by MaxEnt. Hence p(f|m) must be of the form :Equation 3-14

 

 

We do not know which function this might be. Now S has the units (dimensions) of the total f, so this monotonic function must incorporate a dimensional constant . This is not an absolute constant, so that :Equation 3-15

dr

Where is a monotonic function of dimensionless argument and:Equation 3-16

In the partition function that ensures that p(f|m) is properly normalized.

To find , we repeat the same kind of reasoning we had before. Finding a simple case for which the results are known and arguing that any general theory must apply for this specific example. Suppose it is raining (drops each with the quantum size q) at r cells (i=1,2....r) at random with Poisson expectation . This arrangement satisfies the entropic axioms, and the probability of occupation numbers n is known (from symmetry and straightforward counting of possible outcomes) to be: Equation 3-17

Define fi = ni q and mi =i.q to remain finite as the quantum size q is allowed to approach zero. Then the image-space of f becomes constructed from micro cells of volume qr, each associated with one lattice-point of integers (n1,n2...nr). So we have, as q tends to 0:
Equation 3-18

Because we take n large, we may Stirling's formula: Equation 3-19

to obtain (accurately to within O(1/n)):Equation 3-20

Here we recognize the entropy on r cells: Equation 3-21

so that Equation 3-22

If we compare this with the previous formula Equation 3-15, then we can identify: Equation 3-23

Further we can identify: Equation 3-24

It is important that the square-root factors in the Stirling's-formula could enable us to derive the measure M. This allowed us to make the passage between the point wise probability comparisons and full probability integrals over domains.

A natural interpretation of the measure is as the invariant volume of a metric g defined on the space. The space of the PADs will be:
Equation 3-25

which equals the entropy curvature S 2S / f.f. This quantity is also known as the Fisher information matrix.

This analysis used large numbers of small quanta q, so that is large. This limit will also ensure us that each ni will almost certainly be close the expected value I. Further we can expect that: Equation 3-26

To summarize if there is a general prior for positive additive distributions in f, it must be: Equation 3-27

and furthermore: Equation 3-28

where Equation 3-29

In this quantified prior only one variable exists which cannot be fixed a priori because it is dimensional.

 

4.5.1      Choice of regularization parameter

One drawback of the classical maximum entropy-method is that images do not consist out of raindrops of which are thrown randomly as is given in Equation 3-15. The results of the classical maximum entropy restoration were often disappointing. In the literature different ways for better results are described. For this reason Gull developed some better models for choosing the regularization parameter.

For the regularization parameter, Gullx described a way to determine this variable. The problem with this variable is that it is dimensional, and its best-fitting value varies quite strongly with the type and quality of the data that is available. Further the influence of noise in images is very important for the results of this kind of image processing.

For this reason it can only be determined posterior. For this reason Gull went to the other side of the problem, the likelihood that is written as: Equation 3-30

where

N: number of data

L(f): all details of experimental set-up and accuracy's of measurement

For the common case of independent, Gaussian errors, this reduces to , but other types of errors like Poisson noises are important. In forensic image processing the overall level of noise is most often not known, so we will eventually generalize to: Equation 3-31

However we assume that the errors are known in advance, so that =1. The joint PDF of data and image is: Equation 3-32

Based on Bayes theorem this is also proportional to the posterior probability distribution for f: p(f|D,,m). The maximum of this distribution as a function of f might be our best reconstruction and will occur at the maximum of: Equation 3-33

Methods of choosing by CHI-square-method

For choosing the smoothing parameter the method of regularization is described by Thomson et.al.. The objective of those methods is to produce an estimate of some unknown quantity f, usually a vector or a function, with the help of the data g. The estimate can be formulated in: Equation 3-34

Where:

: a measure of lack of fidelity to the data g

: a regularization functional, measuring roughness in a well-defined sense

: the regularization parameter

The criterion of Equation 3-34 is expressed in very general terms, and specification of a particular method entails making choices of , and . As far as and are concerned, sensible choices are usually made on the basis of context and/or prior knowledge about the true f.

The models of images can be described as linear additive models. If n denotes the number of pixels, we have:

Equation 3-35

in which g and f are n x 1 vectors, H is the n x n point-spread-matrix defining the systematic blur, and n x 1 vector describes the blur. We assume that the noise has a Gaussian distribution.

This will motivate the choice of: Equation 3-36

For we shall choose a quadratic function of the form: Equation 3-37

In this formula C represents a nonnegative-definite matrix. Our estimate f to be denoted as f() is the solution of :Equation 3-38

It follows that: Equation 3-39

In maximum entropy no explicit minimum exists. In practice however this method also seems to work for other problems in which no explicit minimum are existent. To find a value for one can first determine this value by the chi-test and further trying setting the manually. For this we purpose will use the CHI-test by :Equation 3-40

This will be calculated further numerically.

 

4.6    Premodel by Gull

In practice often the Bayesian choice of will often lead to a reconstruction that is over-fitted. Despite this the classic entropy-model is the correct answer to the problem. If we would modify variable there would be a lack of theoretical grounds to defend this choice. In fact it was the purity of the derivation of the joint pdf that is conditional on the knowledge of the initial model m. This model was first introduced as a measure on the x-scale of pixels, but is a point in the f-space and acts as a "model" here. The only freedom that was left in the hypothesis space is to consider variation in the model. The fact that the model is flat, expresses our lack of prior information about the structure of the picture.

There are large areas of the picture where the lighting is generally light or dark, with interesting details superimposed. There are correlation's from pixel to pixel present in the image that we have ignored so far. Our earlier MaxEnt-Axiom I forbids us to put the pixel-to-pixel-correlation's directly into our p(f|m,). For this reason we must circumvent the axiom in a subtle way.

Suppose we have a case in which four images are combined in one video. We have for example two images of the counter of the bank and two images of the entrance at night. Axiom I is designed to protect us from letting the reconstruction of the images of the counter influence our images of the entrance at night. There is however nothing to prevent us from taking different m0 -levels for each part. In fact is would be extremely desirable to have different values for those levels. So we can subdivide the space further. If we have the counter there might be a desk and some people working behind it. We can subdivide the image further.

If we continue to subdivide, we can get a better model, closer to the reconstruction and we expect that will increase. On the other side by introducing extra parameters there might be some drawbacks. We might get penalties for this in a way that it makes no sense to modify the model. In this model it is very important that no sharp boundaries exist, since this will influence our image.

We are now in a position to formulate a new, flexible hypothesis that is suitable for the reconstruction itself: Equation 3-41

Where

b: model-blur PSF

 

To complete the analysis we must assign a prior for a premodel m'. We treat m' as an image and use the entropic prior:Equation 3-42

where

T: S(m',flat)

: new Lagrange multiplier for the m'-space entropy T

We restrict ourselves to the mathematically tractable case of quadratic S and T, gaussian blurs and uniform noise-level, for which L, S and T matrices are all simultaneously diagonal in Fourier space. The Bayesian calculation of ' and ', now yields in :
Equation 3-43

Equation 3-44

where are eigenvalue of BtB and Equation 3-45

The noise level can be estimated as before:Equation 3-46

Here is once again a neat division of the degrees of freedom between S, T and L.

5         Minimal Total Variation

The Total Variation-technique is described by Rudin et.al. in a sequence of papers,.. This method claims to improve the blur and remove noise. The method is based on the total variation in the image that is minimized subject to constraints involving the point spread function and the statistics of the noise. The constraints are implemented using the Lagrange multipliers. L. Rudin claims in one of his articles that the method is non-invasive and non-oscillatory.

The theoretical backgrounds are based on shockwave-theories. The TV-based theories involve minimizing:

the absolute integral of the curvature along a feature curve, so also no smooth curves are allowed

The total variation of the magnitude of the gradient of the intensity along and across feature curves

Total variation of curvature along level sets to allow cusps and piecewise convex interfaces

The variational problems (1) and (2) yield fourth order non-linear PDE's for the evolution part of the restoration procedure, while (3) yield in sixth order PDE. The Total Variation Measure will minimize the geometrical oscillations of 'features' in reconstructed images, subject to image degradation constraints.

Noise might be spatially correlated to the solution of this algorithm. A set of global constraints will reduce the resolution of the restoration. Fine features may be grossly altered when the algorithm interprets them to be the results of the statistical variation. A better restoration is obtained by using the Lagrange-multipliers whose steady state solution satisfies constraints that are local to individual curves and regions. The next constraints are considered:

·        blurring kernel and noise statistics

·        divergence free and hence-area preserving

 

5.1    MTV definitions

Suppose we have the following problem:
Equation 4-1

where

: gaussian white noise

·  u0(x,y): observed in