Python Library
Installation
pip install griRequires Python 3.8+. Dependencies: pandas, numpy, matplotlib, pyyaml.
Quick Start
Object-Oriented API
from gri import GRIAnalysis
# Load survey and compute full scorecard
analysis = GRIAnalysis.from_survey_file("survey_data.csv")
scorecard = analysis.calculate_scorecard(include_max_possible=True)
# Visualize and report
analysis.plot_scorecard(save_to="scorecard.png")
print(analysis.generate_report())Functional API
from gri import calculate_gri, load_benchmark_suite, load_gd_survey
# Load data
survey = load_gd_survey(3) # Global Dialogues wave 3
benchmarks = load_benchmark_suite()
# Calculate GRI for a single dimension
gri_score = calculate_gri(
survey_df=survey,
benchmark_df=benchmarks["country_gender_age"],
strata_cols=["country", "gender", "age_group"]
)
print(f"GRI: {gri_score:.3f}")Key Functions
| Function | Description |
|---|---|
calculate_gri() |
Core GRI calculation via TVD |
calculate_diversity_score() |
Strata coverage measurement |
calculate_gri_scorecard() |
Multi-dimensional scorecard |
calculate_design_effect() |
Design effect, effective N, and precision retained |
calculate_sri() |
Strategic Representativeness Index |
monte_carlo_max_scores() |
Maximum achievable score simulation |
calculate_efficiency_ratio() |
Actual vs. theoretical maximum |
Key Classes
| Class | Description |
|---|---|
GRIAnalysis |
High-level analysis wrapper (load, calculate, plot, report) |
GRIScorecard |
Configuration-driven scorecard generator |
GRIConfig |
Configuration management |
Data Format
Survey data should be a CSV with one row per respondent and demographic columns:
country,gender,age_group,religion,environment
India,Female,25-34,Hindu,Urban
Brazil,Male,35-44,Christian,Urban
Nigeria,Female,18-24,Muslim,Rural
...
Required columns depend on which dimensions you calculate:
| Column | Values | Used In |
|---|---|---|
country |
Country name | All geographic dimensions |
gender |
Male, Female | Gender-related dimensions |
age_group |
5-year bands (e.g., 18-24) | Age-related dimensions |
religion |
Major religion category | Religion dimensions |
environment |
Urban, Rural | Environment dimensions |
The library includes built-in loaders for Global Dialogues data (load_gd_survey()) and World Values Survey data (load_wvs_survey()).
Interpreting Scores
| GRI Range | Interpretation |
|---|---|
| 0.90–1.00 | Excellent — near-perfect demographic match |
| 0.70–0.89 | Good — strong representation with minor gaps |
| 0.50–0.69 | Moderate — noticeable demographic skew |
| 0.30–0.49 | Low — significant underrepresentation in key strata |
| 0.00–0.29 | Very low — major demographic mismatch |
Context matters: a GRI of 0.35 on Country x Gender x Age (2,699 strata) is more impressive than 0.35 on Continent (6 strata). Always consider the maximum achievable score for the dimension.
Advanced Features
Design Effect and Effective Sample Size
Quantifies the precision cost of demographic mismatch. Returns the design effect, effective sample size, and precision retained:
from gri import calculate_design_effect
result = calculate_design_effect(
survey_df=survey,
benchmark_df=benchmark,
strata_cols=["country", "gender", "age_group"]
)
print(f"Design Effect: {result['design_effect']:.2f}")
print(f"Effective N: {result['effective_n']:.0f}")
print(f"Precision Retained: {result['precision_retention']:.1%}")Strategic Representativeness Index (SRI)
Combines GRI with diversity scores using configurable weights:
from gri import calculate_sri
sri = calculate_sri(
survey_df=survey,
benchmark_df=benchmark,
strata_cols=["country", "gender", "age_group"],
gri_weight=0.7,
diversity_weight=0.3
)Visualization
from gri import plot_gri_scorecard, plot_segment_deviations
# Full scorecard heatmap
plot_gri_scorecard(scorecard, save_to="heatmap.png")
# Segment-level deviation analysis
plot_segment_deviations(analysis, dimension="Country × Gender × Age",
top_n=20, save_to="deviations.png")Monte Carlo Simulation
Estimate the maximum achievable GRI for a given sample size:
from gri import monte_carlo_max_scores
max_scores = monte_carlo_max_scores(
benchmark_df=benchmark,
strata_cols=["country", "gender", "age_group"],
sample_size=1000,
n_simulations=1000
)
print(f"Max GRI at N=1000: {max_scores['max_gri']:.3f}")Source Code
The full source code is available on GitHub. Issues and contributions are welcome.