Towards Explainable Classifiers Using the Counterfactual Approach - Global Explanations for Discovering Bias in Data

The paper proposes summarized attribution-based post-hoc explanations for the detection and identification of bias in data. A global explanation is proposed, and a step-by-step framework on how to detect and test bias is introduced. Since removing unwanted bias is often a complicated and tremendous task, it is automatically inserted, instead. Then, the bias is evaluated with the proposed counterfactual approach. The obtained results are validated on a sample skin lesion dataset. Using the proposed method, a number of possible bias-causing artifacts are successfully identified and confirmed in dermoscopy images. In particular, it is confirmed that black frames have a strong influence on Convolutional Neural Network’s prediction: 22% of them changed the prediction from benign to malignant.

eISSN:: 2083-2567
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Databases and Data Mining, Artificial Intelligence

Journal RSS Feed

Towards Explainable Classifiers Using the Counterfactual Approach - Global Explanations for Discovering Bias in Data

Published Online: Dec 03, 2020

Page range: 51 - 67

Received: May 15, 2020

Accepted: Sep 30, 2020

DOI: https://doi.org/10.2478/jaiscr-2021-0004

Keywords
explainable classifiers, counterfactual approach, bias detection

© 2021 Agnieszka Mikołajczyk et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Towards Explainable Classifiers Using the Counterfactual Approach - Global Explanations for Discovering Bias in Data

Published Online: Dec 03, 2020

Page range: 51 - 67

Received: May 15, 2020

Accepted: Sep 30, 2020

DOI: https://doi.org/10.2478/jaiscr-2021-0004

Keywordsexplainable classifiers, counterfactual approach, bias detection

© 2021 Agnieszka Mikołajczyk et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
explainable classifiers, counterfactual approach, bias detection