Deep Learning

Finding Task-Relevant Features for Few-Shot Learning by Category Traversal – CVPR 2019
Edge-labeling Graph Neural Network for Few-shot Learning – CVPR 2019
Generating Classification Weights with GNN Denoising Autoencoders for Few-Shot Learning – CVPR 2019
Kervolutional Neural Networks – CVPR 2019
Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem – CVPR 2019
Neural Rejuvenation: Improving Deep Network Training by Enhancing Computational Resource Utilization – CVPR 2019
Hardness-Aware Deep Metric Learning – CVPR 2019
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation – CVPR 2019
Learning Loss for Active Learning – CVPR 2019
Striking the Right Balance with Uncertainty – CVPR 2019
AutoAugment: Learning Augmentation Policies from Data – CVPR 2019
Parsing R-CNN for Instance-Level Human Analysis
Large Scale Incremental Learning
TopNet: Structural Point Cloud Decoder
Perceive Where to Focus: Learning Visibility-aware Part-level Features for Partial Person Re-identification
Meta-Transfer Learning for Few-Shot Learning
Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation
Deep RNN Framework for Visual Sequential Applications
Graph-Based Global Reasoning Networks
SSN: Learning Sparse Switchable Normalization via SparsestMax
Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition

Finding Task-Relevant Features for Few-Shot Learning by Category Traversal – CVPR 2019

Introduced a Category Traversal Module that can be inserted as a plug-and-play module into most metric-learning based few-shot learners. This component traverses across the entire support set at once, identifying task-relevant features based on both intra-class commonality and inter-class uniqueness in the feature space. Incorporating our module improves performance consider- ably (5%-10% relative) over baseline systems.

Edge-labeling Graph Neural Network for Few-shot Learning – CVPR 2019

Proposed a novel edge-labeling graph neural network (EGNN), which adapts a deep neural network on the edge-labeling graph, for few-shot learning. On both of the supervised and semi-supervised few-shot image classification tasks with two benchmark datasets, the proposed EGNN significantly improves the performances over the existing GNNs.

Generating Classification Weights with GNN Denoising Autoencoders for Few-Shot Learning – CVPR 2019

Proposed the use of a Denoising Autoencoder network (DAE) that (during training) takes as input a set of classification weights corrupted with Gaussian noise and learns to reconstruct the target-discriminative classification weights. In order to capture the co-dependencies between different classes in a given task instance of our meta-model, we propose to implement the DAE model as a Graph Neural Network (GNN).

Kervolutional Neural Networks – CVPR 2019

Existing works mainly leverage on the activation layers, which can only provide point-wise non-linearity. To solve this problem, a new operation, kervolution (kernel convolution), is introduced to approximate complex behaviors of human perception systems leveraging on the kernel trick. It generalizes convolution, enhances the model capacity, and captures higher order interactions of features, via patch-wise kernel functions, but without introducing additional parameters. Extensive experiments show that kervolutional neural networks (KNN) achieve higher accuracy and faster convergence than baseline CNN.

Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem – CVPR 2019

Showed that ReLU type neural networks which yield a piecewise linear classifier function fail in this regard as they produce almost always high confidence predictions far away from the training data. For bounded domains like images we propose a new robust optimization technique similar to adversarial training which enforces low confidence predictions far away from the training data.

Neural Rejuvenation: Improving Deep Network Training by Enhancing Computational Resource Utilization – CVPR 2019

Our method detects dead neurons and computes resource utilization in real time, rejuvenates dead neurons by resource reallocation and reinitialization, and trains them with new training schemes. By simply replacing standard optimizers with Neural Rejuvenation, we are able to improve the performances of neural networks by a very large margin while using similar training efforts and maintaining their original resource usages.

Hardness-Aware Deep Metric Learning – CVPR 2019

Most previous deep metric learning methods only utilizes a subset of training data, which may not be enough to characterize the global geometry of the embedding space comprehensively. To address this problem, we perform linear interpolation on embeddings to adaptively manipulate their hard levels and generate corresponding label-preserving synthetics for recycled training, so that information buried in all samples can be fully exploited and the metric is always challenged with proper difficulty.

Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation – CVPR 2019

Neural Architecture Search (NAS) has successfully identified neural network architectures that exceed human designed ones on large-scale image classification. We proposed to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space. We present a network level search space that includes many popular designs, and develop a formulation that allows efficient gradient-based architecture search.

Learning Loss for Active Learning – CVPR 2019

Proposed a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named “loss prediction module,” to a target network, and learn it to predict target losses of unlabeled inputs.

Striking the Right Balance with Uncertainty – CVPR 2019

Rare classes tend to get a concentrated representation in the classification space which hampers the generalization of learned boundaries to new test examples. Demonstrated that the Bayesian uncertainty estimates directly correlate with the rarity of classes and the difficulty level of individual samples. Presented a novel framework for uncertainty based class imbalance learning that follows two key insights: First, classification boundaries should be extended further away from a more uncertain (rare) class to avoid overfitting and enhance its generalization. Second, each sample should be modeled as a multi-variate Gaussian distribution with a mean vector and a covariance matrix defined by the sample’s uncertainty. The learned boundaries should respect not only the individual samples but also their distribution in the feature space.

AutoAugment: Learning Augmentation Policies from Data – CVPR 2019

Described a simple procedure called AutoAugment to automatically search for improved data augmentation policies. In our implementation, we have designed a search space where a policy consists of many sub-policies, one of which is randomly chosen for each image in each mini-batch. A sub-policy consists of two operations, each operation being an image processing function such as translation, rotation, or shearing, and the probabilities and magnitudes with which the functions are applied. We use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset.

Parsing R-CNN for Instance-Level Human Analysis

Presented an end-to-end pipeline for solving the instance-level human analysis, named Parsing R-CNN. It processes a set of human instances simultaneously through comprehensive considering the characteristics of region-based approach and the appearance of a human, thus allowing representing the details of instances.

Large Scale Incremental Learning

Incremental learning methods have been proposed to retain the knowledge acquired from the old classes, by using knowledge distilling and keeping a few exemplars from the old classes. However, these methods struggle to scale up to a large number of classes. Distinguishing between an increasing number of visually similar classes is particularly challenging, when the training data is unbalanced. We found that the last fully connected layer has a strong bias towards the new classes, and this bias can be corrected by a linear model. With two bias parameters, our method performs remarkably well on two large datasets.

TopNet: Structural Point Cloud Decoder

Presented a novel decoder that generates a structured point cloud without assuming any specific structure or topology on the underlying point set.

Perceive Where to Focus: Learning Visibility-aware Part-level Features for Partial Person Re-identification

Proposed a Visibility-aware Part Model (VPM), which learns to perceive the visibility of regions through self-supervision. The visibility awareness allows VPM to extract region-level features and compare two images with focus on their shared regions (which are visible on both images).

Meta-Transfer Learning for Few-Shot Learning

The key idea is to leverage a large number of similar few-shot tasks in order to learn how to adapt a base-learner to a new task for which only a few labeled samples are available. Proposed a novel few-shot learning method called meta-transfer learning (MTL) which learns to adapt a deep NN for few shot learning tasks. Specifically, “meta” refers to training multiple tasks, and “transfer” is achieved by learning scaling and shifting functions of DNN weights for each task. In addition, we introduce the hard task (HT) meta-batch scheme as an effective learning curriculum for MTL.

Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation

Proposed to train convolutional neural networks (CNNs) with both binarized weights and activations, leading to quantized models specifically} for mobile devices with limited power capacity and computation resources. We proposed a network decomposition strategy, in which we divide the network into groups. In this way, each full-precision group can be effectively reconstructed by aggregating a set of homogeneous binary branches. We learn effective connections among groups to improve the representational capability.

Deep RNN Framework for Visual Sequential Applications

There are mainly two novel designs in our deep RNN framework: one is a new RNN module called Context Bridge Module (CBM) which splits the information flowing along the sequence (temporal direction) and along depth (spatial representation direction), making it easier to train when building deep by balancing these two directions; the other is the Overlap Coherence Training Scheme that reduces the training complexity for long visual sequential tasks on account of the limitation of computing resources.

Graph-Based Global Reasoning Networks

Neural Networks (CNNs) excel at modeling local relations by convolution operations, but they are typically inefficient at capturing global relations between distant regions and require stacking multiple convolution layers. Proposed a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed. After reasoning, relation-aware features are distributed back to the original coordinate space for down-stream tasks. We further present a highly efficient instantiation of the proposed approach and introduce the Global Reasoning unit (GloRe unit) that implements the coordinate-interaction space mapping by weighted global pooling and weighted broadcasting, and the relation reasoning via graph convolution on a small graph in interaction space.

SSN: Learning Sparse Switchable Normalization via SparsestMax

The recently-proposed switchable normalization (SN) provides a new perspective for deep learning: it learns to select different normalizers for different convolution layers of a ConvNet. However, SN uses softmax function to learn importance ratios to combine normalizers, leading to redundant computations compared to a single normalizer. This work addresses this issue by presenting Sparse Switchable Normalization (SSN) where the importance ratios are constrained to be sparse. SSN has several appealing properties. (1) It inherits all benefits from SN such as applicability in various tasks and robustness to a wide range of batch sizes. (2) It is guaranteed to select only one normalizer for each normalization layer, avoiding redundant computations. (3) SSN can be transferred to various tasks in an end-to-end manner.

Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition

We present a generic, flexible and 3D rotation invari- ant framework based on spherical symmetry for point cloud recognition. By introducing regular icosahedral lattice and its fractals to approximate and discretize sphere, convo- lution can be easily implemented to process 3D points. Based on the fractal structure, a hierarchical feature learn- ing framework together with an adaptive sphere projection module is proposed to learn deep feature in an end-to-end manner. Our framework not only inherits the strong repre- sentation power and generalization capability from convo- lutional neural networks for image recognition, but also ex- tends CNN to learn robust feature resistant to rotations and perturbations.