This paper introduces the Objective Fairness Index (OFI), a legally grounded metric for evaluating bias in machine learning by comparing marginal benefits across groups, demonstrating its ability to detect algorithmic bias in applications like COMPAS and Folktable’s Adult Employment dataset where traditional Disparate Impact fails.
Fairness, AI Ethics, Classification, Robustness, Trustworthy AI
Jarren Briscoe, Assefaw Gebremedhin
Washington State University
Generated by grok-3
Background Problem
The paper addresses the pervasive issue of bias in machine learning models, particularly in sensitive applications like recidivism prediction (COMPAS) and employment prediction. It highlights the inconsistency in defining bias across various contexts and the inadequacy of the Disparate Impact (DI) metric in capturing the legal nuances of objective testing, as established by precedents like Griggs v. Duke Power Co. and Ricci v. DeStefano. The key problem solved is the development of a legally consistent and context-aware metric for bias evaluation, aiming to differentiate between discriminatory algorithmic tests and systemic disparities outside the algorithm’s control.
Method
The core method introduced is the Objective Fairness Index (OFI), which evaluates bias by calculating the difference in marginal benefits between two groups, defined as the discrepancy between actual benefits (predictions) and expected benefits (actual labels). It is formalized as OFI = (FP_i - FN_i)/n_i - (FP_j - FN_j)/n_j, where FP and FN are false positives and false negatives, and n is the group size. The method integrates legal principles of objective testing by ensuring that bias assessment accounts for what happened versus what should have happened, using binary confusion matrices for generalizability. Key steps include defining individual and group benefits, computing marginal benefits, and deriving OFI as a comparative metric with thresholds established via standard deviation analysis ([-0.3, 0.3]).
Experiment
The experiments apply OFI to two datasets: COMPAS for recidivism prediction and Folktable’s Adult Employment dataset for employment prediction using Random Forest and Naïve Bayes classifiers. The setup aims to test OFI’s ability to detect algorithmic bias against DI’s findings, focusing on ethnicity pairs in COMPAS and various demographic groups in Folktable. Results show OFI corroborating DI in COMPAS, confirming algorithmic bias against certain races, and revealing discrepancies in Folktable (e.g., identifying bias against Pacific Islanders where DI suggests positive bias). The setup is reasonable for binary classification but lacks comparison with other fairness metrics and does not address multi-class or regression tasks. The results partially match expectations by highlighting OFI’s legal consistency, though the limited scope and absence of robustness tests against data noise or imbalance are notable shortcomings.
Further Thoughts
The introduction of OFI opens up intriguing possibilities for integrating legal frameworks into AI fairness metrics, but it also raises questions about scalability and adaptability. Could OFI be extended to non-binary classification tasks, such as multi-class problems or regression, without losing its legal grounding? Additionally, the paper’s focus on legal standards might benefit from a comparative analysis with ethical frameworks from other disciplines, such as sociology or philosophy, to capture a broader spectrum of fairness. Another area of exploration could be the interaction between OFI and existing debiasing techniques—does OFI merely identify bias, or can it inform corrective measures at the algorithmic level? Finally, connecting this work to privacy-preserving machine learning could be insightful, as legal contexts often intersect with privacy concerns, especially in sensitive applications like COMPAS. These intersections could guide future interdisciplinary research to ensure AI systems are not only legally compliant but also ethically sound and socially beneficial.