3.1 Attribution

Interpretability is a means to build trustworthy machine learning system which can generate ‘rationales’ to explain why it make a decision.

Today (Sep. 19), we are going to dig into a recent NeurIPS paper Robust Attribution Regularization which introduces the concept of robust attribution. Their work is built on the Integrated Gradient (IG) methods by proposing training objectives in classic robust optimization to achieve robust IG.

3.1.1 Saliency with Guarantees

Discovering Conditionally Salient Features with Statistical Guarantees, ICML 2019.
Interpreting Black-Box Models via Hypothesis Testing, arXiv Apr. 2019.
Statistical Consistent Saliency Estimation, ICLR 2020.