3.1 Attribution

Interpretability is a means to build trustworthy machine learning system which can generate ‘rationales’ to explain why it make a decision.

Today (Sep. 19), we are going to dig into a recent NeurIPS paper Robust Attribution Regularization which introduces the concept of robust attribution. Their work is built on the Integrated Gradient (IG) methods by proposing training objectives in classic robust optimization to achieve robust IG.