2020-02-12
Back to listThe 34th AAAI Conference on Artificial Intelligence (AAAI-20) is now underway in New York. As one of the world’s leading conferences in the field of artificial intelligence, AAAI-20 received over 8,800 submissions, with 7,737 submissions reviewed and 1,591 accepted research papers (highlighting an acceptance rate of 20.6 percent).
This year, Baidu achieved a record-high of 28 accepted research papers covering a wide range of topics from natural language processing and machine learning to computer vision and more. Despite the absence of a number of our research authors at the conference due to the recent coronavirus travel ban, we encourage attendees to stop by our booth and chat with our experts about our latest research projects and career opportunities.
In this blog, we will spotlight innovations from three of our accepted research papers in further detail.
Pre-trained language model
Unsupervised pre-trained language models have made significant progress in various natural language processing tasks. But while they provide an opportunity to attain valuable insight from the training corpora, existing models are often based on the co-occurrence of words and sentences.
In the paper ERNIE 2.0: A Continual Pre-training Framework for Language Understanding, our researchers proposed a continual pre-training framework named ERNIE 2.0 which incrementally builds and learns pre-training tasks through constant multi-task learning. In this framework, models can learn different aspects of knowledge in training corpora, including named entity, semantic closeness, and discourse relations.
These experimental results demonstrated how ERNIE 2.0 outperformed BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several common tasks in Chinese. Last year, the ERNIE model achieved new state-of-the-art performance on GLUE and became the world's first model to score over 90 in terms of the macro-average score (90.1), surpassing human baselines by 3 percent. Today, ERNIE is widely applied to real-world application scenarios and boosts the capabilities of understanding the language.
The paper has been accepted as an oral presentation while the source code and pre-trained models have been released at GitHub.
Machine reading comprehension
Adversarial training has proved to be an effective method for training robust machine reading comprehension models, as existing manual approaches are not able to generate all possible adversarial samples along with their rules in a regular way.
In the paper A Robust Adversarial Training Approach to Machine Reading Comprehension, our researchers presented an automatic adversarial model-driven approach to recognize undetected adversarial samples and eventually improve the robustness of machine reading models.
Specifically, researchers used an adversarial method to generate a perturbation vector input for each training sample, aiming to mislead the reading comprehension model. They then used a strategy to sample the lexical weights of perturbation vectors to extract corresponding discretized perturbation texts, which are used to construct the adversarial samples that are used to train the reading comprehension model. The above steps are repeated until the model converges.
The research results showed that our proposed adversarial training technique achieved a significantly improved outcome across different adversarial datasets as well as generated diversified adversarial samples. The paper also concludes that this method will require further improvements as a number of generated adversarial samples did not contain natural language.
Computer vision
3D object detection is playing an increasingly critical role in autonomous driving, but stereo imagery-based 3D detection tactics are still no match for lidar-based methods. In the paper ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection, our researchers propose adaptive zooming, a technique by which distant cars are analyzed on a larger scale to achieve more accurate depth estimation.
The resulting architecture, named ZoomNet, surpassed all existing state-of-the-art technology by significant margins on the popular KITTI 3D detection benchmark. More importantly, ZoomNet is the first stereo imagery-based solution to reach comparable performance to current lidar-based methods at a relatively lower threshold.
More specifically, ZoomNet achieves this by performing a fine-grained analysis on 2D instances from left and right bounding boxes. The foreground pixels in 2D are then projected into 3D space for pose regression. With the built-in technique adaptive zooming, ZoomNet can simultaneously adjust the size of the 2D instance bounding box to a uniform resolution as well as the camera’s intrinsic parameters. As a result, ZoomNet can achieve higher quality disparity maps from the adjusted image and construct point clouds of similar density for instances of different depths. In addition, researchers can also introduce part locations as a generalized version of key-points to better localize cars and to enhance the resistance to occlusion.
Our researchers also presented the KITTI Fine-Grained car (KFG) dataset by extending KITTI with an instance-wise 3D CAD model and pixel-wise fine-grained annotations. Both the KFG dataset and our codes will be publicly available soon.
Accepted papers
Generative Adversarial Regularized Mutual Information Policy Gradient Framework for Automatic Diagnosis
Yuan Xia, Jingbo Zhou, Zhenhui Shi, Chao Lu, Haifeng Huang
Capturing Sentence Relations for Answer Sentence Selection with Multi-Perspective Graph Encoder
Zhixing Tian, Yuanzhe Zhang, Xinwei Feng, Wenbin Jiang, Yajuan Lyu, Kang Liu and Jun Zhao
Distributed Primal-Dual Optimization for Online Multi-task Learning
Peng Yang, Ping Li
Meta-CoTGAN: A Meta Cooperative Training Paradigm for Improving Adversarial Text Generation
Haiyan Yin, Dingcheng Li, Xu Li, Ping Li
IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation
Xiaoyun Li, Chenxi Wu, Ping Li
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
Yu Sun,Shuohuan Wang, Yukun Li,Shikun Feng,Hao Tian,Hua Wu, Haifeng Wang
Knowledge Graph Grounded Goal Planning for Open-Domain Conversation Generation
Jun Xu, Haifeng Wang, Zheng-Yu Niu, Hua Wu, Wanxiang Che
Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding
Yuchen Liu, Jiajun Zhang, Hao Xiong, Long Zhou, Zhongjun He, Hua Wu, Haifeng Wang, Chengqing Zong
A Robust Adversarial Training Approach to Machine Reading Comprehension
Kai Liu, Xin Liu, An Yang, Jin Liu, Jinsong Su, Sujian Li, Qiaoqiao She
Multi-Label Classification with Label Graph Superimposing
Ya Wang, Dongliang He, Fu Li, Xiang Long, Zhichao Zhou, Jinwen Ma, Shilei Wen
ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection
Zhenbo Xu, Wei Zhang, Xiaoqing Ye, Xiao Tan, Wei Yang, Shilei Wen, Errui Ding, Ajin Meng, Liusheng Huang
Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification
Renchun You, Zhiyao Guo, Lei Cui, Xiang Long, Yingze Bao, Shilei Wen
Dynamic Instance Normalization for Arbitrary Style Transfer
Yongcheng Jing, Xiao Liu, Yukang Ding, Xinchao Wang, Errui Ding, Mingli Song, Shilei Wen
SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback
Chao Wang, Hengshu Zhu, Chen Zhu, Chuan Qin, Hui Xiong
Relational Graph Neural Network with Hierarchical Attention for Knowledge Graph Completion
Zhao Zhang, Fuzhen Zhuang, Hengshu Zhu, Zhiping Shi, Hui Xiong, Qing He
Why We Go Where We Go: Profiling User Decisions on Choosing POIs
Renjun Hu, Xinjiang Lu, Chuanren Liu, Yanyan Li, Hao Liu, Shuai Ma, and Hui Xiong
Semi-Supervised Hierarchical Recurrent Graph Neural Network for City-Wide Parking Availability Prediction
Weijia Zhang, Hao Liu, Yanchi Liu, Jingbo Zhou, Hui Xiong
Learning Conceptual-Contextual Embeddings for Medical Text
Xiao Zhang, Dejing Dou and Ji Wu
Ultrafast Photorealistic Style Transfer via Neural Architecture Search.
Jie An*, Haoyi Xiong*, Jun Huan, and Jiebo Luo
Person Tube Retrieval via Language Description
Hehe Fan, Yi Yang
Context Modulated Dynamic Networks for Actor and Action Video Segmentation From a Sentence
Hao Wang, Cheng Deng, Fan Ma, Yi Yang
Symbiotic Attention with Privileged Information for Egocentric Action Recognition
Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang
Adversarial Localized Energy Network for Structured Prediction
Pingbo Pan, Ping Liu, Yan Yan, Tianbao Yang, Yi Yang
EEMEFN: Low-Light Image Enhancement via Edge-Enhanced Multi-Exposure Fusion network
Minfeng Zhu, Pingbo Pan, Wei Chen, Yi Yang
Relation-Aware Pedestrian Attribute Recognition with Graph Convolutional Networks
Zichang Tan, Yang Yang, Jun Wan, Guodong Guo, Stan Z. Li
GBCNs: Genetic Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs
Chunlei Liu, Wenrui Ding, Yuan Hu, Baochang Zhang, Jianzhuang Liu, Guodong Guo
AutoRemover: Automatic Object Removal for Autonomous Driving Videos
Rong Zhang, Wei Li, Peng Wang, Chenye Guan, Jin Fang, Yuhang Song, Jinhui Yu, Baoquan Chen, Weiwei Xu, Yang Ruigang
CSPN++: Learning Context and Resource Aware Convolutional Spatial Propagation Networks for Depth Completion
Xinjing Cheng, Peng Wang, Chenye Guan and Ruigang Yang