Documentation
Frequently asked questions
Short answers to common questions about installing, using, and citing RegiStream. If your question isn't here, open an issue on GitHub.
About RegiStream
What is RegiStream?
RegiStream is open infrastructure for register-data research. It has two parts: a toolkit of open-source packages (autolabel, datamirror) that researchers install into their statistical software, and a catalog of register metadata translated to English and distributed as versioned bundles. The catalog starts with six Nordic agencies and is built to grow.
Who maintains it?
RegiStream is maintained by Jeffrey Clark (Stockholm University) and Jie Wen (Stockholm School of Economics), both PhD economists working with administrative microdata daily. Development happens in public at github.com/registream/registream.
Is it free?
Yes. Free to install, free to use, free to cite. The code is BSD 3-Clause licensed. Catalog metadata is redistributed under the license terms of each source agency; see each domain page in the catalog for attribution.
Data coverage
Which agencies are in the catalog?
Six Nordic register agencies: Statistics Sweden (SCB), Statistics Denmark (DST), Statistics Norway (SSB), Försäkringskassan (FK), Socialstyrelsen (SoS), and Statistics Iceland (Hagstofa). See the catalog for live variable and register counts plus per-agency attribution.
What languages are available?
Every bundle ships in the agency's native language and in English. Iceland publishes English directly, so no translation layer is needed there. Language coverage per agency is listed on each domain page.
What about other countries?
Register data exists worldwide; the long-term goal is global coverage. The catalog starts with the Nordics because that's where the metadata was most accessible for a first release. The bundle format is agency-agnostic and can support any agency whose metadata is publicly documented. Additions will be announced on GitHub when they ship.
Installing
How do I install autolabel?
Platform-specific install commands live in the autolabel docs. Stata installs via net install from registream.org (SSC publish is on the way). Python installs via pip install registream. R installs from registream.org/r/ as a CRAN-format repo (CRAN publish is on the way).
How do I install datamirror?
datamirror v1 is Stata-only today. Python and R ports are in progress. See the datamirror docs for the current install path.
Can I use RegiStream offline or inside a secure environment?
Yes. autolabel downloads bundles once and caches them locally; after that, there are no runtime calls to the registream.org server. For secure environments with no internet, see the secure-environment install. For institutional-scale deployments, see institutional setup.
Versioning
What's the difference between schema version and data version?
The schema version (e.g., 2.0) describes the bundle format: which columns exist, what values they hold, how registers and variables relate. The data version (e.g., v20260309) is a dated snapshot of metadata from one agency. Schema changes are breaking; data updates are not.
Which data version should I use?
For reproducible research, pin an explicit date-versioned bundle in your code. For interactive use, use the latest. autolabel defaults to latest unless you specify otherwise. registream cite lists every installed catalog version so a replicator can see which snapshots the session used.
Custom data
Can I add labels for data that isn't in the catalog?
Yes. autolabel reads Schema v2 bundles from anywhere on disk. Build your own bundle with CSV files following the schema and point autolabel at the folder. See the Schema v2 reference for the file layout.
Can I share custom metadata with collaborators?
Yes. Bundles are plain directories of CSVs. Zip one, send it, they unzip it locally. Nothing routes through registream.org.
Citing and licensing
How do I cite RegiStream in a paper?
See how to cite RegiStream for APA and BibTeX forms. Each client's cite command (autolabel cite in Stata, cite() in Python / R) prints a version-pinned citation block.
What's the license?
The code is BSD 3-Clause: do anything you want with the code, keep the copyright notice, and don't use the RegiStream name to endorse derivative products without permission. Catalog metadata is redistributed per each source agency's license terms; check each domain page in the catalog before republishing or redistributing.
Contact
How do I report a bug or request a feature?
Open an issue on GitHub. For bugs, include your platform (Stata / Python / R and version), autolabel version, the command you ran, and a minimal reproduction. For feature requests or new-agency suggestions, describe the use case.