Contact me

Lei Zhang, Ph.D, Full Professor@CQU; IEEE Senior Member, CCF Distinguished Member, ACM Member, CAAI/CSIG Member; Head of LiVE (Learning Intelligence & Vision Essential) Group@CQU; Director of Chongqing Key Laboratory of Bio-perception and Multimodal Intelligent Information Processing; Research Fellow@Peng Cheng Lab in Shenzhen; World Top 0.5% Scientists

Associate Editor for IEEE Transactions on Image Processing (IF: 10.8)

Associate Editor for IEEE Transactions on Instrumentation and Measurement (IF: 4.016)

Associate Editor for Neural Networks (Elsevier, IF: 8.05)

Associate Editor for CAAI Transactions on Intelligence Technology

Guest Editor for The Visual Computer

Guest Editor for "Brain Inspired Neural Networks and Deep Learning Systems"

Area Chair (AC) for IJCAI 2025, ACM MM 2020/2021, AAAI 2021, PRCV-2021, CICAI-2022

2013~2015, Hong Kong Scholar (HKPU), Postdoctoral Fellow, Department of Computing, The Hong Kong Polytechnic University

2017~2018, Visiting Professor, University of Macau

Note: I am in office all day from 9:00am to 9:00pm

{Home} {Publications and Codes} {Research} {Projects} {Professional Activities} {People}

Machine Vision, Image Recognition and Machine Learning

[Multi-view Learning for Image Understanding]

Paper:

Lei Zhang* and David Zhang, Visual Understanding via Multi-Feature Shared Learning With Global Consistency, IEEE Transactions on Multimedia (T-MM), vol. 18, no. 2, pp. 247-259, 2016. [paper]

Abstract:

Image/video data is usually represented with multiple visual features. Fusion of multi-source information for establishing attributes has been widely recognized. Multi-feature visual recognition has recently received much attention in multimedia applications. This paper studies visual understanding via a newly proposed -norm-based multi-feature shared learning framework, which can simultaneously learn a global label matrix and multiple sub-classifiers with the labeled multi-feature data. Additionally, a group graph manifold regularizer composed of the Laplacian and Hessian graph is proposed. It can better preserve the manifold structure of each feature, such that the label prediction power is much improved through semi-supervised learning with global label consistency. For convenience, we call the proposed approach global-label-consistent classifier (GLCC). The merits of the proposed method include the following: 1) the manifold structure information of each feature is exploited in learning, resulting in a more faithful classification owing to the global label consistency; 2) a group graph manifold regularizer based on the Laplacian and Hessian regularization is constructed; and 3) an efficient alternative optimization method is introduced as a fast solver owing its speed to convex sub-problems. Experiments on several benchmark visual datasets—the 17-category Oxford Flower dataset, the challenging 101-category Caltech dataset, the YouTube and Consumer Videos dataset, and the large-scale NUS-WIDE dataset—have been used for multimedia understanding. The results demonstrate that the proposed approach compares favorably with state-of-the-art algorithms. An extensive experiment using the deep convolutional activation features also shows the effectiveness of the proposed approach.

Approach:

Overview of the proposed framework. In the left part (the training phase), the proposed algorithm exploits a multi-feature shared learning over potential visual features of the training images. In the right part (the testing phase), a joint decision function with the learned classifier parameters and is computed based on the extracted visual features from the testing image.

Experiments:

Experiments are conducted on the Oxford Flowers 17 dataset, the Caltech 101 dataset, the YouTube & Consumer Videos dataset and the large-scale NUS-WIDE dataset for multimedia understanding. Additionally, we have also conducted an extensive experiment on the convolutional neural net (CNN) based deep features for object recognition.

References:

[1] S. Wang et al., “Semi-supervised multiple feature analysis for action recognition,” IEEE Trans. Multimedia, vol. 16, no. 2, pp. 289–298, Feb. 2014.

[2] Y. Yang, Z. Ma, A. G. Hauptmann, and N. Sebe, “Feature selection for multimedia analysis by sharing information among multiple tasks,” IEEE Trans. Multimedia, vol. 15, no. 3, pp. 661–669, Apr. 2013.

[3] F. Nie, H. Huang, X. Cai, and C. Ding, “Efficient and robust feature selection via joint l2,1-norms minimization,” in Proc. NIPS, 2010, pp. 1813–1821.

[4] T. Xia, T. Mei, and Y. Zhang, “Multiview spectral embedding,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 40, no. 6, pp. 1438–1446, Dec. 2010.

Welcome to LiVE (Learning Intelligence & Vision Essential) Group

Contact me

Links

Machine Vision, Image Recognition and Machine Learning

[Multi-view Learning for Image Understanding]