Skip to content

Add a fairness metric that computes Equalized Odds #772

Open
@npatki

Description

@npatki

Problem Description

While SDMetrics currently includes privacy metrics, it does not have any fairness metrics. We should add an Equalized Odds fairness metrics that computes whether there is a bias against a particular group of records.

Expected behavior

Create a new metric called EqualizedOddsImprovement that measures whether the synthetic data improves fairness vs the real data. The outputted score should still be within the [0, 1] range; the value should tell us the direction and degree of improvement:

  • score > 0.5 indicates that the synthetic data is more fair than the real data
  • score < 0.5 indicates that the synthetic data is less fair than the real data

As an example, consider a loan application dataset where we'd like to predict whether each application will be approved. The Equalized Odds metric will tell us whether the predictions are biased towards a certain group of applications -- such as applications where requestor_race='Asian'.

from sdmetrics.single_table import EqualizedOddsImprovement

EqualizedOddsImprovement.compute_breakdown(
  real_training_data=real_dataset,
  synthetic_data=synthetic_dataset,
  real_validation_data=test_dataset,
  metadata=my_metadata,
  prediction_column_name='loan_approved',
  positive_class_label='True',
  sensitive_column_name='requestor_race',
  sensitive_column_value='Asian',
  classifier='XGBoost',
)

The breakdown should provide the final score, as well as the sub-scores within the real data vs. the synthetic data. For each, we should provide the prediction counts for both strata (Asian=True and Asian=False).

{
  'score': 0.7189,
  'real_training_data': {
    'equalized_odds': 0.96033
    'prediction_counts_validation': {
      'Asian=True': {
        'true_positive': XX,
        'false_positive': XX,
        'true_negative': XX,
        'false_negative': XX
      },
      'Asian=False': {
        'true_positive': XX,
        'false_positive': XX,
        'true_negative': XX,
        'false_negative': XX
      }
    }
  },
  'synthetic_data': {
    # same keys/values as the real training_data
  }
}

Additional context

More details about this metric can be found in this doc (visible to DataCebo team members only). Once completed, we will publish the methodology and the details within the SDMetrics documentation.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions