Research Statement 
Onur G. Guleryuz
My research interests are in the areas of statistical signal processing and information theory with primary applications to digital images and video. 
For any given problem, if one can list available mathematical techniques on an axis of increasing generality, one typically finds that the performance of solutions is degraded as one moves toward using more general techniques, i.e., one typically improves performance by utilizing techniques that are specialized to the problem at hand. While generality is very much desired, it is clear that one must keep in mind the possible loss in performance/optimality when employing general techniques on specific domains. I believe that in many areas that I am interested in, classical research focuses on applying very generic signal processing and information theory techniques to problems with very specific domains. For example, while facilitating applications such as transmission, compression, and noise removal at the most basic level, it can be said that classical techniques do not recognize the very specific sets that natural images are defined on and that there are significant opportunities for improvement (see Figures 1 and 2). My research is geared toward fully understanding problems and the domains that they are defined on in order to design a repertoire of models and techniques with the degree of robustness tuned to the point that achieves the highest performance on the respective domain. My recent research concentrates on robust image and video processing using sparse representations, designing analytically tractable stochastic models in large dimensions, and discovering energy efficient compression/routing solutions in sensor networks.

Typical images generated by random bits ...
Figure 1: Typical images generated by random bits fed to the decoders of (a) SPIHT (wavelets), (b) JPEG (DCTs), (c) JPEG2000 (wavelets), (d) JPEG-LS (pixel domain prediction), assuming ~ 1 bit/pixel compression [8]. Inputting a sequence of random bits to a compression decoder would enable one to see, with high probability, the set of images the compression algorithm tries to achieve high performance on. Since typical results are not even remotely close to natural images, one can infer that these algorithms are trying to be applicable in very general settings. While especially SPIHT and JPEG2000 are viewed as being state-of-the-art, much higher performance can be possible for algorithms that are tuned to more specific yet more relevant domains.

Examples where a next generation, edge-aware compression algorithm ...
Figure 2: Examples where a next generation, edge-aware compression algorithm ([1]) obtains substantial improvements over the state-of-the-art. The algorithm obtains these improvements by noting that natural images tend to have discontinuities along geometric curves. Geometric discontinuities constrain natural images to lie in typical sets much different from those for images in Figure 1. To the right of each image is the rate-distortion compression performance obtained by the edge-aware algorithm (G-DPCM) and by the SPIHT algorithm (higher plots are better as they correspond to better rate-distortion pairs). Top row contains toy images where G-DPCM attains very significant improvements. Bottom row includes realistic images with sharp edges where improvements are more moderate but still significant. G-DPCM obtains improvements by discovering and taking advantage of statistical dependencies over edges present in the image. It is uniformly better than SPIHT, since when no edges are present, it reduces to it.

Perhaps the most significant example of a specific domain that is of interest to me is given by the set of images and video frames obtained by imaging the world around us. This set captures our environment at single or multiple time instances. Hence the data that is obtained by examining images and video contains the richness of our environment and us, albeit through the eyes of a two dimensional projection. Image and video data are very intrinsic to humans as most of us have significant intuitions for these types of data. For example, when confronted with image data and stock price data, many of us will think of the former as "easy" but we will seek financial advisers at considerable expense to help us deal with the latter. Fundamentally though, the two types of data have more similarities than differences (Figure 3). They are both very hard as they are governed by nonstationary statistics with "events", which delineate sophisticated interfaces between predictable portions of the data. Both require robust techniques that determine what predictable is and when events have happened. Some of my research is geared toward designing robust signal processing techniques that thrive on this type of nonstationary data that is omnipresent in the natural world around us (see for example, Figures 4 and 5). My aim is to enable image/video applications that automatically exploit the underlying nonstationarity, to design future search, processing, and compression algorithms, and to better define the space of natural images and video, i.e., to better understand the facets of the evolving real world surrounding us.


"Events" on stock prices ...
Figure 3: "Events" on stock prices (a), and on images (b), (c). Events are unexpected regions that separate otherwise predictable portions of data, forming sophisticated interfaces between the predictable portions. The statistical behavior of data typically changes drastically once an event is crossed. In (a) events correspond to news stories about the company, in (b) and (c) events mark the transition between various regions in the image (for example, in (b), the event around sample point 290 marks the transition from the carpet to the woman's scarf, i.e., from a region of slowly varying values to one of high fluctuations). On images events are along two dimensional curves and sometimes manifest themselves as sharp edges, but not always. The event in (c) is more sophisticated and again results in a rapid change in statistics. The estimation algorithms outlined in [3,4] are specifically geared toward dealing with events that form interfaces between regions of drastically different statistics.

The same robust algorithm ...
Figure 4: (a) The same robust algorithm predicting missing regions of periodic, texture, and edge pixels [3,4]. The algorithm does not "know" or detect what type of region it is operating on. Rather, it depends on an implicit local model based on sparse local representations. The properties that make this algorithm robust allow one to generalize the algorithm to other types of data such as speech, seismic data, etc. In (b), the same algorithm operates on another scenario and predicts "missing" high resolution information over edges using only low resolution information [6]. The algorithm starts with only the lowest frequency band (LLLL) of a 2 level wavelet transform and predicts the remaining coefficients. The resolution is effectively increased by a factor of 4 by automatically exploiting nonlinear statistical dependencies over edges. This application is very significant as one of the key problems limiting state-of-the-art compression is the lack of tools that take advantage of these nonlinear dependencies.



Part of my research activities are concentrated on better understanding the set of natural images and video using localized models and associated approximation strategies (early attempts that quantify the strength of such approaches have been reported in [10]). On a global scale, images/video exhibit structure that is easily discernible by human beings but rather difficult to exploit by signal processing algorithms. For example, the entities/objects in various scenes, their apparent locations, etc., easily allow humans to make accurate guesses about the scenes and summarize the content of the video sequence depicting the scenes. In comparison, with automated algorithms even the detection of low-level information that may aid high-level decisions could be problematic. However, with high performance computing, I believe it should be the case that automated algorithms significantly outperform humans in guessing the contents of a small region, and similarly, perform local denoising, deblurring, and estimation in a way that is at least as good as an expert human operator. One of my research objectives is to design localized, sparse mathematical representations for images and video, which when combined with simple and robust techniques, allow one to construct such algorithms (see Figures 4 and 5 for examples from my recent research that designs and exploits some sparse mathematical representations in the contexts of images and video respectively). A further objective is to expand these localized algorithms to more global scales in order to enable more sophisticated applications and identify the performance limits of "unintelligent" or low-level signal processing. Extensions to other types of signals, such as displacement fields that govern time evolutions of video sequences [7], developing robust tools that are applicable over image-like nonstationary signals are further objectives [2].
As the amount of accumulated data increases one is faced with three important issues. How does one compress, communicate and make sense of all this information? Whether we are trying to filter spam, generate automated news summaries, or compress and store large amounts of data in databases, there is a need to do sophisticated modeling and processing on data defined in large dimensions. However, unless one can clearly delineate simple subsets in large dimensions and restrict algorithms to those subsets, the well-known "curse of dimensions" phenomenon starts to hinder progress. Some of my research interests involve the design of tractable signal and data models that allow one to robustly operate in large dimensions while allowing one to closely approximate real-world data. The dual problem of extraction and communication of relevant portions of high dimensional data, when the eventual application is only concerned with these portions, is a related research interest. For example, if one is only interested in doing certain detection/estimation tasks using the output of deployed sensors, then one can define new transmission, compression, and routing methods that significantly outperform the generic "observe and transmit everything" strategy [5].

Spatial Sparsity Induced ...
Figure 5: Spatial Sparsity Induced Temporal Prediction. A simple temporal prediction algorithm that enables successful predictive encoding during fades, blended scenes, temporally decorrelated noise, and many other temporal evolutions which force predictors used in traditional hybrid video compression to fail. The frame to be coded is predicted using the reference frame in a hybrid video compression setting (IPP...). Both frames contain additive, temporally decorrelated noise. The fourth column provides a high-level summary of the required processing for successful prediction (our algorithm only performs simple low-level prediction). Traditional motion compensated prediction relies on translated blocks from reference frames to directly match blocks in the frame to be coded. However, translated reference frame blocks typically consist of two superimposed parts: One part that is relevant for prediction and another part that is not relevant. In many types of temporal evolutions in video, such as fades from one scene to another, blended scenes, temporally decorrelated noise, etc., the prediction-irrelevant part can become severe and significantly hurt prediction accuracy. By performing prediction in a domain where the video frames are spatially sparse, this work allows the automatic isolation of the prediction-relevant parts of predicting blocks. These are then used to enable better prediction then would be possible otherwise. Predictions formed by traditional techniques fail, which force the encoding of the current frame without prediction, i.e., as an INTRA frame. The sparsity induced prediction algorithm generates successful results because it exploits the fact that natural images and video frames lie in non-convex sets. Correctly determining the non-convexity through sparse representations allows high performance results. Observe that the prediction is successful even under complicated scenarios that involve brightness changes and sophisticated fades. The algorithm manages to ``fish-out'' scenes, recombine them, correct lighting, etc., to form these predictors.


References

[1] O. G. Guleryuz and A. L. da Cunha, "Image Compression with a Geometrical Entropy Coder," Proc. IEEE Int'l Conf. on Image Proc. (ICIP2006), Atlanta, GA, Oct. 2006.
[2] O. G. Guleryuz, "Linear, Worst-Case Estimators for Denoising Quantization Noise in Transform Coded Images," IEEE Transactions on Image Processing, vol. 15, No. 10, pp. 2967-2986, October 2006.
[3] O. G. Guleryuz, "Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated Denoising: Part I - Theory," IEEE Transactions on Image Processing, vol. 15, No. 3, pp. 539-554, March, 2006.
[4] O. G. Guleryuz, "Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated Denoising: Part II - Adaptive Algorithms," IEEE Transactions on Image Processing, vol. 15, No. 3, pp. 555-571, March, 2006.
[5] O. G. Guleryuz and U. C. Kozat, "Joint Compression, Detection, and Routing in Capacity Constrained Wireless Sensor Networks," Proc. IEEE Statistical Signal Processing Workshop, Bordeaux, France, July 2005.
[6] O. G. Guleryuz, "Predicting Wavelet Coefficients Over Edges Using Estimates Based on Nonlinear Approximants," Proc. Data Compression Conference, IEEE DCC-04, April 2004.
[7] L. Sendur and O. G. Guleryuz, "Globally Optimal Wavelet-Based Motion Estimation Using Interscale Edge and Occlusion Models," Proc. SPIE Conf. on Visual Communication and Image Processing, Jan. 2004.
[8] O. G. Guleryuz, V. Ratnakar, R. Radhakrishnan, and N. Memon, "Stochastic Sampling from Image Coder Induced Probability Distributions," Proc. Asilomar Conference on Signals and Systems, Pacific Grove, CA, Nov. 2003.
[9] O. G. Guleryuz, E. Lutwak, D. Yang and G. Zhang, "Information Theoretic Inequalities for Contoured Probability Distributions," IEEE Transactions on Information Theory, vol. 48, no. 8, pp. 2377-2383, August 2002.
[10] A. Cohen, I. Daubechies, O. G. Guleryuz and Michael T. Orchard "On the importance of combining wavelet-based nonlinear approximation with coding strategies," IEEE Transactions on Information Theory, vol. 48, no. 7, pp. 1895-1921, July 2002.



File translated from TEX by TTH, version 3.76.
On 04 Dec 2006, 15:52.