The explanation game: a formal framework for interpretable machine learning

Watson, DS; Floridi, L

View/Open

Published version (625.0Kb)

Publisher

Springer Nature

DOI

10.1007/s11229-020-02629-9

Journal

SYNTHESE

ISSN

0039-7857

Metadata

Show full item record

Abstract

We propose a formal framework for interpretable machine learning. Combining elements from statistical learning, causal interventionism, and decision theory, we design an idealised explanation game in which players collaborate to find the best explanation(s) for a given algorithmic prediction. Through an iterative procedure of questions and answers, the players establish a three-dimensional Pareto frontier that describes the optimal trade-offs between explanatory accuracy, simplicity, and relevance. Multiple rounds are played at different levels of abstraction, allowing the players to explore overlapping causal patterns of variable granularity and scope. We characterise the conditions under which such a game is almost surely guaranteed to converge on a (conditionally) optimal explanation surface in polynomial time, and highlight obstacles that will tend to prevent the players from advancing beyond certain explanatory thresholds. The game serves a descriptive and a normative function, establishing a conceptual space in which to analyse and compare existing proposals, as well as design new and improved solutions.

Authors

Watson, DS; Floridi, L

URI

https://qmro.qmul.ac.uk/xmlui/handle/123456789/64741

Collections

Centre for Clinical Pharmacology [1055]

Licence information

Creative Commons Attribution 4.0 International License

Copyright statements

Except where otherwise noted, this item's license is described as Creative Commons Attribution 4.0 International License