Guanlin Li

On Learning Under Dataset Noise

I am extremely curious about how data or experience drives the model to learn imperfect but useful prediction rules. One of the aspect of such learning process is the noisy experience in the data from which the model is going to learn. At the last day of the second decade of the 21st century. I would like to summarize some papers that I have encountered along my daily browsing.

Note that, learning under noise is highly correlated with topics like curriculum learning, data augmentation, data selection and active learning, which may or may not be covered in this post, based on which I hope one day I would write something about.

I want to do this summary due to the very paper named (and the blog post):

Confident Learning: Estimating Uncertainty in Dataset Labels, which has submitted to AISTATS 2020.

This is a method paper for empirical improvement. Their basic ideas are:

Dataset pruning
Examples ranking
Confidence-weighed training

which once done properly, I think, is the best practice of learning under noise.

One thing that really interests me is their so-called model-agnostic dataset uncertainty estimation method.

Learning under noise

Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels, ICML 2019.
Unsupervised Label Noise Modeling and Loss Correction, ICML 2019.
Learning with Bad Training Data via Iterative Trimmed Loss Minimization, ICML 2019.
Learning Not to Learn in the Presence of Noisy Labels, ICML 2020 submitted.
Learning Adaptive Loss for Robust Learning with Noisy Labels, CVPR 2020 submitted.
Improving Generalization by Controlling Label-Noise Information in Neural Network Weights, ICML 2020 submitted.

Uncertainty estimate

On Discriminative Learning of Prediction Uncertainty, ICML 2019.

Data selection

Learning and Data Selection in Big Datasets, ICML 2019.
Metric-Optimized Example Weights, ICML 2019.

Curriculum learning

Self-Similar Epochs: Value in Arrangement, ICML 2019.