教学科研

Mining the Displacement of Max-pooling in Convolutional Neural Networks

来源:yl23411永利发布时间:2022-05-14浏览次数:241

题目:Mining the Displacement of Max-pooling in Convolutional Neural Networks

主讲人:郑煜辰 博士

时间:2022年5月18日16:00-17:30

地点:绿2-303

腾讯会议号:438-2370-8414

主讲人简介

郑煜辰,副教授,硕士研究生导师,主要研究方向包括人工智能,模式识别与机器学习,文档识别分析,手写签名鉴定,医学图像处理,数据挖掘等。于2014年6月与2017年6月在中国海洋大学信息科学与工程学部,计算机科学与技术学院获得学士与硕士学位,2020年9月获得日本国立九州大学(2022年QS世界大学排名137位,2021年泰晤士高等教育世界大学排名日本第4位)信息智能工程专业学术博士学位。

长期从事人工智能领域相关研究工作,具体应用包括建立基于“微变形”捕捉的手写签名鉴定系统;建立基于深度排序学习算法的医学图像分析系统,用于癌症检测和眼部疾病检测等研究。长期与国内外高校和研究机构如德国人工智能研究中心,日本埼玉工业大学,巴基斯坦国立科学与技术大学,北京大学等有着广泛的交流与合作。

在国内外高水平期刊和会议发表高水平学术论文20余篇,担任国际文档识别分析会议程序委员会委员(ICDAR 2021 PC Member),国际人工神经网络会议程序委员会委员(ICANN 2022 PC Member),图像与信号处理国际会议程序委员会委员 (ISPR 2021 PC Member),中国自动化学会会员,中国计算机学会会员。担任Pattern Recognition, Multimedia Systems, IEEE TCSVT, IEEE TNNLS, IJDAR, IET Image Processing, IJCAS等国际高水平期刊审稿人, ICANN, ICDAR, IJCNN, ICFHR, CVPR, IJCAI等国际会议审稿人。

内容摘要

The max-pooling operation is a common step in modern deep convolutional neural networks (CNNs), which is often introduced to obtain translation-invariant representations and downsample the feature maps of convolutional layers. However, in doing so, it loses some spatial information. In this thesis, we extract a novel feature from max-pooling operation in CNNs, called displacement features. The displacement features record the location coordinates of the maximum values in pooling windows of the max-pooling operation. Then, we analyze and discover the class-wise trend and behavior of the displacement features in many different ways. To verify the effectiveness of the displacement features, we apply the displacement features on two classical tasks, text recognition and offline signature verification. For text recognition tasks, We extract the displacement features from the max-pooling layer and combine them with the features resulting from max-pooling to capture the micro differences between the similar classes. The extensive experimental results and discussions on three text datasets, MNIST dataset, HASY dataset, and Chars74K-font dataset demonstrate that the proposed displacement features can improve the performance of the CNN based architectures and tackle the issues with the micro deformations of max-pooling in the text recognition tasks. For offline signature verification tasks, we extract the displacement features of the maximums in the max-pooling operation and fuses it with the pooling features to capture the micro deformations between the genuine signatures and skilled forgeries as a feature extraction procedure. The extensive experimental results and analysis on GPDS-150, GPDS-300, GPDS-1000, GPDS-2000, and GPDS-5000 datasets demonstrate that the proposed method can discriminate the genuine signatures and their corresponding skilled forgeries well and achieve state-of-the-art performance on these datasets.