Research Statement
Onur G. Guleryuz
My research interests are in the areas of statistical signal processing
and information theory with primary applications to digital images and
video.
For any given problem,
if one can list available mathematical techniques on an axis of
increasing
generality, one typically finds that the performance of solutions
is degraded as one moves toward using more general techniques, i.e.,
one typically improves performance by utilizing techniques that are
specialized to the problem at hand.
While generality is very much desired, it is clear that one must keep
in mind
the possible loss in performance/optimality when employing general
techniques on specific domains.
I believe that in many areas that I am interested in,
classical research focuses on applying very generic signal processing
and information theory techniques to problems with very specific
domains.
For example, while facilitating applications such as transmission,
compression, and noise removal
at the most basic level,
it can be said that classical techniques do not recognize the very
specific
sets that natural images are defined on and that there are significant
opportunities for
improvement (see Figures 1 and 2).
My research is geared toward fully understanding problems and the
domains
that they are defined on in order to design a repertoire of models and
techniques with the degree of robustness tuned to the point that
achieves the highest performance on the respective domain.
My recent research concentrates on robust image and video processing
using sparse representations,
designing analytically tractable stochastic models in large dimensions,
and discovering energy efficient compression/routing solutions in
sensor networks.
Figure 1: Typical images generated by random bits fed to
the decoders
of (a) SPIHT (wavelets), (b) JPEG (DCTs), (c)
JPEG2000 (wavelets), (d) JPEG-LS (pixel domain prediction),
assuming ~ 1 bit/pixel compression [8].
Inputting a sequence of random bits
to a compression decoder
would enable one to see, with high probability,
the set of images the compression algorithm tries
to achieve high performance on.
Since typical results are not even remotely close to natural images,
one can infer
that these algorithms are trying to be applicable in very general
settings.
While especially SPIHT and JPEG2000 are viewed as being
state-of-the-art,
much higher performance can be possible for algorithms that are tuned
to
more specific yet more relevant domains.

|
Figure 2: Examples where a next generation,
edge-aware compression algorithm ([1]) obtains substantial improvements over the
state-of-the-art. The algorithm obtains these improvements by noting
that natural images tend to have discontinuities along geometric
curves.
Geometric discontinuities constrain natural images to lie in
typical sets much different from those for images in Figure 1.
To the right of each image is the rate-distortion compression
performance obtained by the edge-aware algorithm
(G-DPCM) and by the SPIHT algorithm (higher plots are better as they
correspond to better rate-distortion pairs).
Top row contains toy images where G-DPCM attains
very significant improvements. Bottom row includes realistic images
with sharp edges
where improvements are more moderate but still significant.
G-DPCM obtains improvements by discovering and taking advantage of
statistical dependencies over edges present in the image.
It is uniformly better than SPIHT, since when no edges are present, it
reduces to
it.
Perhaps the most significant example of a specific domain that
is of interest to me is given
by the set of images and video frames obtained by imaging the world
around us.
This set captures
our environment at single or multiple time instances.
Hence the data that is obtained by examining images and video contains
the richness of our environment and us, albeit through the eyes of a
two dimensional projection.
Image and video data are very intrinsic to humans as most of us
have significant intuitions for these types of data.
For example, when confronted with image data and stock price data,
many of us will think of the former as "easy" but
we will seek financial advisers at considerable expense
to help us deal with the latter.
Fundamentally though, the two types of data have more similarities
than differences (Figure 3). They are both very
hard as they are governed by nonstationary statistics
with "events", which delineate sophisticated interfaces between
predictable portions of the data.
Both require robust techniques that determine what predictable is
and when events have happened.
Some of my research is geared toward designing robust signal processing
techniques that thrive on this type of nonstationary data
that is omnipresent in the natural world around us (see for example,
Figures 4 and 5).
My aim is to enable image/video applications that automatically exploit
the underlying nonstationarity,
to design future search, processing, and compression algorithms,
and to better define the space of natural images and video,
i.e., to better understand
the facets of the evolving real world surrounding us.
 |
Figure 3: "Events" on stock prices (a), and on
images (b), (c). Events are unexpected regions that
separate otherwise predictable
portions of data, forming
sophisticated interfaces between the predictable portions.
The statistical behavior of data typically changes drastically once
an event is crossed.
In (a) events correspond to news stories about the company,
in (b) and (c) events mark the transition between
various regions in the image
(for example, in (b), the event around sample point 290 marks
the transition from the carpet to the woman's scarf, i.e., from
a region of slowly varying values to one of high fluctuations).
On images events are along two dimensional curves and sometimes
manifest themselves
as sharp edges, but not always.
The event in (c) is more sophisticated and again results in a rapid
change in statistics.
The estimation algorithms outlined in [3,4] are
specifically geared toward
dealing with events that form interfaces between regions of drastically
different statistics.

|
Figure 4: (a) The same robust algorithm predicting
missing regions
of periodic, texture, and edge pixels [3,4]. The
algorithm does not
"know" or detect what type of region it is operating on.
Rather, it depends on an implicit local model based on sparse
local representations.
The properties that make this algorithm robust allow one to generalize
the algorithm to other types of data such as speech, seismic data, etc.
In (b), the same algorithm operates on another scenario and
predicts
"missing" high resolution information over edges using only low
resolution information [6].
The algorithm starts with only the lowest frequency band (LLLL) of a 2
level wavelet transform and predicts the remaining coefficients.
The resolution is effectively increased by a factor of 4 by
automatically exploiting nonlinear statistical dependencies over edges.
This application is very significant as one of the key problems
limiting state-of-the-art compression is the lack of tools that take
advantage of
these nonlinear dependencies.
Part of my research activities are concentrated on better understanding
the set of natural images and video using localized models and
associated approximation strategies
(early attempts that quantify the strength of such approaches
have been reported in [10]).
On a global scale, images/video exhibit structure that is easily
discernible
by human beings but rather difficult to exploit by signal processing
algorithms.
For example, the entities/objects in various scenes, their apparent
locations, etc., easily allow humans to make accurate guesses
about the scenes and summarize the content of the video sequence
depicting the scenes.
In comparison, with automated algorithms even the detection of
low-level information that may aid high-level decisions could be
problematic.
However, with high performance computing,
I believe it should be the case that
automated algorithms significantly outperform
humans in guessing the contents of a small region, and similarly,
perform local denoising, deblurring, and estimation in a way that is at
least as good
as an expert human operator.
One of my research objectives is to
design localized, sparse mathematical representations for images and
video, which when combined with simple and robust
techniques, allow one to construct such algorithms
(see Figures 4 and 5 for
examples from my recent research
that designs and exploits some sparse mathematical representations in
the contexts of images and video respectively).
A further objective is to expand these localized algorithms to more
global
scales in order to enable more sophisticated
applications and identify the performance
limits of "unintelligent" or low-level signal processing.
Extensions to other types of signals,
such as displacement fields that govern time evolutions of video
sequences [7],
developing robust tools that are applicable over image-like
nonstationary signals
are further objectives [2].
As the amount of accumulated data increases one is faced with three
important issues.
How does one compress, communicate and make sense of all this
information?
Whether we are trying to filter spam, generate automated news
summaries, or
compress and store large amounts of data in databases, there is a need
to do sophisticated modeling and processing on data defined
in large dimensions.
However, unless one can clearly delineate simple subsets in large
dimensions
and restrict algorithms to those subsets, the well-known
"curse of dimensions" phenomenon starts to hinder progress.
Some of my research interests involve the design of
tractable signal and data models that allow one
to robustly operate in large dimensions while allowing one to closely
approximate
real-world data.
The dual problem of extraction and communication of relevant portions
of high
dimensional data, when the eventual application is only concerned with
these portions,
is a related research interest.
For example, if one is only interested in doing certain
detection/estimation
tasks using the output of deployed sensors, then one can define new
transmission,
compression, and routing methods that significantly outperform the
generic
"observe and transmit everything" strategy [5].

|
Figure 5: Spatial Sparsity Induced Temporal Prediction. A
simple temporal prediction algorithm that enables
successful predictive encoding during fades, blended scenes, temporally
decorrelated noise, and
many other temporal evolutions which force predictors used in
traditional hybrid video compression to
fail. The frame to be coded is predicted using the reference frame in a
hybrid video compression setting (IPP...). Both frames contain
additive,
temporally decorrelated noise.
The fourth column provides a high-level summary of the required
processing for successful prediction (our algorithm only performs
simple low-level prediction).
Traditional motion compensated prediction
relies on translated blocks from reference frames to directly match
blocks in the frame to be coded.
However, translated reference frame blocks typically consist of two
superimposed parts: One part that is relevant for prediction
and another part that is not relevant. In many types of temporal
evolutions in video,
such as fades from one scene to another, blended scenes, temporally
decorrelated noise, etc.,
the prediction-irrelevant part can become severe and significantly hurt
prediction accuracy.
By performing prediction in a domain where the video frames are
spatially sparse, this work allows the automatic isolation of the
prediction-relevant parts of predicting blocks. These are then used to
enable
better prediction then would be possible otherwise.
Predictions formed by traditional techniques fail, which force the
encoding of the current frame
without prediction, i.e., as an INTRA frame.
The sparsity induced prediction algorithm generates successful results
because it exploits the fact that
natural images and video frames lie in non-convex sets. Correctly
determining the non-convexity through sparse representations
allows high performance results. Observe that the prediction is
successful even under complicated scenarios that involve brightness
changes and sophisticated fades. The algorithm manages to ``fish-out''
scenes, recombine them, correct lighting, etc., to form these
predictors.
References
- [1]
O. G. Guleryuz and A. L. da Cunha,
"Image Compression with a Geometrical Entropy Coder,"
Proc. IEEE Int'l Conf. on Image Proc. (ICIP2006), Atlanta, GA, Oct.
2006.
- [2] O. G. Guleryuz,
"Linear, Worst-Case Estimators for Denoising Quantization Noise in
Transform Coded Images,"
IEEE Transactions on Image Processing, vol. 15, No. 10, pp. 2967-2986,
October 2006.
- [3] O. G. Guleryuz,
"Nonlinear Approximation Based Image Recovery Using Adaptive Sparse
Reconstructions and Iterated Denoising: Part I - Theory," IEEE
Transactions on Image Processing,
vol. 15, No. 3, pp. 539-554, March, 2006.
- [4] O. G. Guleryuz,
"Nonlinear Approximation Based Image Recovery Using Adaptive Sparse
Reconstructions and Iterated Denoising: Part II - Adaptive Algorithms,"
IEEE Transactions on Image Processing,
vol. 15, No. 3, pp. 555-571, March, 2006.
- [5] O. G. Guleryuz and U.
C. Kozat,
"Joint Compression, Detection, and Routing in Capacity Constrained
Wireless Sensor Networks," Proc. IEEE Statistical Signal
Processing Workshop,
Bordeaux, France, July 2005.
- [6] O. G. Guleryuz,
"Predicting Wavelet Coefficients Over Edges Using Estimates
Based on Nonlinear Approximants," Proc. Data Compression Conference,
IEEE DCC-04, April 2004.
- [7] L. Sendur and O. G.
Guleryuz,
"Globally Optimal Wavelet-Based Motion Estimation Using Interscale
Edge and Occlusion Models," Proc. SPIE Conf. on Visual
Communication and Image Processing,
Jan. 2004.
- [8] O. G. Guleryuz,
V. Ratnakar, R. Radhakrishnan, and N. Memon,
"Stochastic Sampling from Image Coder Induced Probability
Distributions," Proc. Asilomar Conference on Signals and Systems,
Pacific
Grove, CA, Nov. 2003.
- [9] O. G. Guleryuz,
E. Lutwak, D. Yang and G. Zhang,
"Information Theoretic Inequalities for Contoured Probability
Distributions," IEEE Transactions on Information Theory,
vol. 48, no. 8, pp. 2377-2383, August 2002.
- [10] A. Cohen, I.
Daubechies, O. G. Guleryuz and Michael T. Orchard
"On the importance of combining wavelet-based nonlinear approximation
with coding strategies," IEEE Transactions on Information Theory,
vol. 48, no. 7, pp. 1895-1921, July 2002.
File translated from
TEX
by
TTH,
version 3.76.
On 04 Dec 2006, 15:52.