The goal of our 4-phase research project was to test if a
machine-learning-based loan screening application (5D) could detect bad loans
subject to the following constraints: a) utilize a minimal-optimal number of
features unrelated to the credit history, gender, race or ethnicity of the
borrower (BiMOPT features); b) comply with the European Banking Authority and
EU Commission principles on trustworthy Artificial Intelligence (AI). All
datasets have been anonymized and pseudoanonymized. In Phase 0 we selected a
subset of 10 BiMOPT features out of a total of 84 features; in Phase I we
trained 5D to detect bad loans in a historical dataset extracted from a
mandatory report to the Bank of Italy consisting of 7,289 non-performing loans
(NPLs) closed in the period 2010-2021; in Phase II we assessed the baseline
performance of 5D on a distinct validation dataset consisting of an active
portolio of 63,763 outstanding loans (performing and non-performing) for a
total financed value of over EUR 11.5 billion as of December 31, 2021; in Phase
III we will monitor the baseline performance for a period of 5 years (2023-27)
to assess the prospective real-world bias-mitigation and performance of the 5D
system and its utility in credit and fintech institutions. At baseline, 5D
correctly detected 1,461 bad loans out of a total of 1,613 (Sensitivity = 0.91,
Prevalence = 0.0253;, Positive Predictive Value = 0.19), and correctly
classified 55,866 out of the other 62,150 exposures (Specificity = 0.90,
Negative Predictive Value = 0.997). Our preliminary results support the
hypothesis that Big Data & Advanced Analytics applications based on AI can
mitigate bias and improve consumer protection in the loan screening process
without compromising the efficacy of the credit risk assessment. Further
validation is required to assess the prospective performance and utility of 5D
in credit and fintech institutions.