RegiStream

Infrastructure for Register Data Research

The open application layer between register data and research. Modules for labelling, lookup, and synthetic data. In Stata, Python, and R. More on the way. Free to use.

  • Modular toolkit. autolabel, datamirror, more on the way.
  • 64,367 variables translated into English, across 4 Nordic agencies.
  • Nordic coverage. Sweden, Denmark, Norway, Iceland.
Where we are

Shipped, building, next.

An open look at what's live, what we're working on right now, and what's coming. Updated as milestones land.

Shipped 4
Catalog · 4 Nordic countries, 6 agencies Live
autolabel · Stata, Python, R v3.0.0
datamirror · Stata v1.0.0
Open bundle format · schema v2 Published
Building now 1
datamirror comes to Python and R Coming
Next 3
More domains · EU and beyond
Multilingual at scale
MCP server · catalog via LLMs

autolabel

Replace hours of manual labeling with one command. Variable names and value labels come straight from any published catalog, in English or the source language.

datamirror

Develop outside the secure environment. Deploy inside. Ship replication packages reviewers can actually rerun, even when the underlying data can never leave. Synthetic data that preserves regression coefficients, not just distributions, across OLS, fixed effects, IV, logit, probit, Poisson, and negative binomial.

Stata v1.0.0
Python Coming
R Coming

Deployed on

Statistics Sweden MONA
Built and maintained by

The team

Stockholm-based PhD economists working with administrative microdata daily. RegiStream is the infrastructure we wished existed.

Jeffrey Clark

PhD Student, Economics

Stockholm University

Jie Wen

PhD Student, Business Administration

Stockholm School of Economics