The explanation game: a formal framework for interpretable machine learning
View/ Open
Publisher
DOI
10.1007/s11229-020-02629-9
Journal
SYNTHESE
ISSN
0039-7857
Metadata
Show full item recordAbstract
We propose a formal framework for interpretable machine learning. Combining elements from statistical learning, causal interventionism, and decision theory, we design
an idealised explanation game in which players collaborate to find the best explanation(s) for a given algorithmic prediction. Through an iterative procedure of questions
and answers, the players establish a three-dimensional Pareto frontier that describes
the optimal trade-offs between explanatory accuracy, simplicity, and relevance. Multiple rounds are played at different levels of abstraction, allowing the players to explore
overlapping causal patterns of variable granularity and scope. We characterise the
conditions under which such a game is almost surely guaranteed to converge on a
(conditionally) optimal explanation surface in polynomial time, and highlight obstacles that will tend to prevent the players from advancing beyond certain explanatory
thresholds. The game serves a descriptive and a normative function, establishing a conceptual space in which to analyse and compare existing proposals, as well as design
new and improved solutions.