The rise of algorithmic decision-making has spawned much research on fair
machine learning (ML). Financial institutions use ML for building risk
scorecards that support a range of credit-related decisions. Yet, the
literature on fair ML in credit scoring is scarce. The paper makes three
contributions. First, we revisit statistical fairness criteria and examine
their adequacy for credit scoring. Second, we catalog algorithmic options for
incorporating fairness goals in the ML model development pipeline. Last, we
empirically compare different fairness processors in a profit-oriented credit
scoring context using real-world data. The empirical results substantiate the
evaluation of fairness measures, identify suitable options to implement fair
credit scoring, and clarify the profit-fairness trade-off in lending decisions.
We find that multiple fairness criteria can be approximately satisfied at once
and recommend separation as a proper criterion for measuring the fairness of a
scorecard. We also find fair in-processors to deliver a good balance between
profit and fairness and show that algorithmic discrimination can be reduced to
a reasonable level at a relatively low cost. The codes corresponding to the
paper are available on GitHub.