Global Representativeness Index
  • Home
  • Methodology
  • Results
  • Library
  • About

Global Representativeness Index

Global Representativeness Index (GRI)

A rigorous, open-source framework for measuring how well survey samples represent the global population.

License: MIT Python 3.8+ arXiv

The Problem

Large-scale surveys and public consultations increasingly inform AI policy, technology design, and global governance. Yet there is no standardized way to measure how representative these samples actually are. A survey of 1,000 people can look impressive—but if 80% of respondents come from three countries, the results may not generalize to the global population.

The GRI provides a single, interpretable score quantifying representativeness across demographic dimensions, grounded in a well-understood statistical distance measure.

The Formula

\[ \text{GRI} = 1 - \text{TVD}(p, q) = 1 - \frac{1}{2} \sum_{i=1}^{k} |p_i - q_i| \]

where \(p_i\) is the sample proportion and \(q_i\) is the population proportion for stratum \(i\).

The GRI is built on Total Variation Distance (TVD), the largest possible difference between the probabilities that two distributions assign to any event. The complement maps this to a 0–1 scale where 1.0 = perfect representation and 0.0 = complete mismatch.

Key Results

GRI scores from six waves of the Global Dialogues survey (N = 971–1,280 per wave):

Dimension GD1 GD2 GD3 GD4 GD5 GD6
Country x Gender x Age 0.293 0.282 0.374 0.319 0.301 0.292
Country x Religion 0.471 0.474 0.515 0.518 0.484 0.481
Country x Environment 0.369 0.339 0.387 0.390 0.354 0.345
Gender 0.989 0.990 0.996 0.979 0.986 0.995
Continent 0.832 0.830 0.886 0.883 0.773 0.802

These scores reveal that while single-axis representation (gender, continent) is strong, fine-grained intersectional dimensions remain challenging—a finding invisible without the GRI framework. See full results for all 13 dimensions.

Quick Start

pip install gri
from gri import GRIAnalysis

analysis = GRIAnalysis.from_survey_file("survey_data.csv")
scorecard = analysis.calculate_scorecard(include_max_possible=True)
analysis.plot_scorecard(save_to="scorecard.png")
print(analysis.generate_report())

See the library documentation for the full API reference and examples.

Learn More

  • Methodology — TVD framework, multi-dimensional scorecards, maximum achievable scores
  • Results — Complete benchmark results from Global Dialogues waves 1–6
  • Python Library — Installation, API reference, and usage examples
  • About — Citation, authors, license, and data sources
 

Built with Quarto · MIT License · © 2025 The Collective Intelligence Project

  • Report an issue