Face Hallucination

Sparse Representation-Based Approaches to Face Hallucination

The field of sparse and redundant representation modeling has gone through a major revolution in the past two decades. This started with a series of algorithms for approximating the sparsest solutions of linear systems of equations, later to be followed by surprising theoretical results that guarantee these algorithms’ performance. With these contributions in place, major barriers in making this model practical and applicable were removed, and sparsity and redundancy became central, leading to state-of-the-art results in various disciplines. One of the main beneficiaries of this progress is the field of image processing, where this model has been shown to lead to unprecedented performance in various applications [10].
Recently, sparse representation has achieved great progress in computer vision (Wright et al. 2010) and data analysis (Zhou and Tao 2013). [1?]  The sparse representation of signals has already been applied in many fields, such as object recognition [22,23], text categorization [24], signal classification [21], etc. [?] Compared with other conventional methods, sparse representation can usually offer a better performance, with its capacity for efficient signal modeling [21] [?].

Sparse representation of a signal is based on the assumption that most or all signals can be represented as a linear combination of a small number of elementary signals only, called atoms, from an overcomplete dictionary.

Single-image super-resolution produces a reconstructed HR image from an input LR image using the prior knowledge learned from a set of LR–HR training image pairs, and the reconstructed HR image should be consistent with the LR input. An observed model between a HR image and its corresponding LR counterpart is given as follows:

Il = IhHS(r) + N

where Iand Ih denote the LR and HR images, respectively; H represents a blurring filter; S(r) is a down-sampling operator with a scaling factor of r in the horizontal and vertical dimensions; and N is a noise vector, such as the Gaussian white noise. Here, we will focus on the situation whereby the blur kernel is the Dirac delta function as [11,44], i.e. H is the identity matrix [7].

Therefore, the purpose of SR is to recover as much of the information lost in the down-sampling process as possible. Since the reconstruction process still remains ill posed, different priors can be used to guide and constrain the reconstruction results [7?].

In recent years, the sparse representation model (SRM) has been used as the prior model, and has shown promising results in image super-resolution. Sparse representation-based approaches promise a better performance, and efficient signal modeling. Moreover, they avoid the need to image registration or alignment. It’s no more necessary to estimate the blurring operator used in down-sampling of the original high-resolution image [7?].

In the sparse representation, a common formulation of the problem of finding the sparse representation of a signal using an overcomplete dictionary is described as follows:

ωo = min ||ω||0, s.t. ψ = Aω

where A is an M N matrix whose columns are the elements of the overcomplete dictionary, with M < N, and ψ ∈ RMx1 is an observational signal. The purpose of sparse representation is to find an N x 1 coefficient vector ω, which is considered to be a sparse vector, i.e. most of its entries are zeros, except for those elements in the overcomplete dictionary A which are associated with the observational signal ψ [7?].

SCSR:

In particular, methods have been proposed for image reconstruction and state-of-the-art results have been obtained (Mairal et al. 2008a,b). Yang et al. (2008b) applied the idea of the sparse representation model with a coupled learning process to face image super-resolution and achieved good results [1?].

Results of ScSR compared to other methods
Results of ScSR compared to other methods

Yang et al. (2008b) [8?] argued that in face hallucination, the most frequently used subspace method for modeling the human face is PCA, which chooses a new coordinate system such that the variances of the dataset are preserved orderly. However, the PCA bases are holistic, making it unstable to occlusions. Compared to NMF, the reconstruction results of PCA are not that intuitive and hard to interpret as PCA allows subtractive combinations of the basis images. Even though faces are objects with lots of variance, they are made up of several relatively independent parts such as eyes, eyebrows, noses, mouths, checks and chins. The idea behind Non-negative Matrix Factorization (NMF) [1] is to extract these relevant parts and find an additive combination of these local features, which is inspired by psychological and physiological principles assuming that humans learn objects in part-based owner [8?].

Yang et al. (2008b) method, denoted as ScSR, is based on the idea of sparse signal representation whereby the linear relationships among HR training signals can be accurately recovered from their low-dimensional projections. This method assumes that image patches can be well represented as a sparse linear combination of elements from a specific dictionary, and a pair of HR–LR dictionaries is constructed to force LR–HR patches to have the same sparse coefficients [7?].

In ScSR, the structures of LR images are used to form a sparse prior model, which is then employed to reconstruct the HR images or HR patches. High-resolution patches have a sparse linear representation with respect to an compact learned overcomplete dictionary of patches randomly sampled from similar images [7?].

algorithm face hallucination via sparse coding

Yang et al. (2008b) is not the end of the application of sparse representation to FH, since the method considers less prior knowledge of the face image than the face images provide, and the effective exploration of the sparsity of face images is therefore an interesting problem to resolve [1?].

Although Yang et al. (2008b) only use a small database and a simply face alignment algorithm, the results already reveal the potential of their algorithm for hallucinating faces. A larger training database and more complicated face alignments as in [12] and [2] will promise better results, and they leave that to their future work [8?].

SCDL:

Wang et al. [9] propose a simple yet more general model to solve the cross-style image synthesis problems; for examples, up-convert a low resolution image to a high resolution one, and convert a face sketch into a photo for matching, etc [9].

SCDL Flow Chart

Their method trains a dictionary pair and a mapping function simultaneously. The pair of dictionaries aims to characterize the two structural domains of image types, and the mapping function that reveals the intrinsic relationship between the two styles. Their model can be adapted to various cross-style image applications.In summary, their paper has two main contributions [9]:

1. Proposed a novel coupled dictionary learning approach (Two dictionaries will not be fully coupled, allowing much flexibility for synthesis).

2. Proposed a reliable style transform algorithm (Successful application in image super-resolution and photo-sketch transformation).

The proposed SCDL approach involves two algorithms: the dictionary and mapping learning algorithm and the image synthesis algorithm, which are summarized in the following Algorithm 1 and Algorithm 2, respectively [9?].

Semi Coupled Dictionary Learning Algorithm

Cross-Style Image Synthesis Algorithm

 

 

 

 

 

 

 

 

 

In the future study, they will adapt SCDL to more types of cross-modality synthesis tasks and extend it to cross-style image recognition tasks.

SCDL Results
SCDL Results

Li et al. [7]

 

The differences between ScSR and our method are that ScSR represents image patches as a sparse linear combination of elements from an appropriately chosen overcomplete dictionary, while in our method, a pixel is represented as a sparse linear combination of elements from its neighboring pixels [7?].

 

 
Single Image Super-Resolution – A Quantitative Comparison
http://www.ijert.org/view-pdf/13246/single-image-super-resolution—a-quantitative-comparison

 Lisha P P, Jayasree V K From the analysis we have found that learning based algorithm using sparse dictionary performs better.

References:

[7] Yongchao Li, Cheng Cai, Guoping Qiu & Kin-Man Lam. Face hallucination based on sparse local-pixel structure. Journal, Pattern Recognition archive, Volume 47 Issue 3, March, 2014, Pages 1261-1270.

[8] Jianchao Yang, Hao Tang, Yi Ma & Huang, T. Face hallucination VIA sparse coding. Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on 12-15 Oct. 2008, Page(s): 1264 – 1267.

[9] Shenlong Wang, Zhang, D., Yan Liang & Quan Pan. Semi-Coupled Dictionary Learning with Applications to Image Super-Resolution and Photo-Sketch Synthesis. Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on 16-21 June 2012, Page(s):
2216 – 2223.

[10] Michael Elad. Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer-Verlag New York, 2010.

[11] Guillermo Sapiro; Duke University. Image and video processing: From Mars to Hollywood with a stop at the hospital. Coursera, Lecture 64 – 68.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s