Install
Install in secure research environments
MONA, DST Forskermaskinen, SSB Dapla, Hagstofa secure room, and similar. These environments have no internet, so autolabel can't auto-download metadata. Here's the manual workflow.
What this is for
Register-data research often happens inside a secure environment provisioned by a national statistical agency. These environments are designed so that microdata never leaves: no internet, constrained file transfer, audit logs on everything moved in or out.
On a normal dev machine, autolabel update datasets pulls metadata bundles from registream.org automatically. Inside a secure environment that command fails (no network), so the metadata needs to be moved in manually by an authorized person.
Overview of the flow
- Dev machine — install RegiStream, download the bundle(s) you need for the domains you'll work with.
- Transfer — move the downloaded files into the secure environment through the agency's standard file-in channel.
- Secure env — install RegiStream inside the env (from a locally cached package), place the bundle files at the expected cache path, run a one-time offline setup.
- Use autolabel — behaves identically to online; no network calls are made on subsequent runs.
1. On your dev machine
Install
* Meta-package: installs core + autolabel + datamirror in one go
net install registream, from("https://registream.org/install/stata/latest") replaceDownload the bundle(s) you need
Visit the catalog for each domain you need and download the bundle ZIP — named {domain}_{lang}_v{version}.zip — for each language. That single file is what you'll transfer; you don't need to unzip it on the dev machine.
Alternatively, if your dev machine has internet, run autolabel scope, domain(scb) lang(eng) to download and process the bundle into ~/.registream/autolabel/scb/. That cache is also a valid thing to transfer (see step 3), but the ZIP is simpler.
Save the RegiStream installers too
If the secure environment has no Stata package manager (or has a restricted one), you'll need to move the package files themselves. Two ways to do this; pick whichever fits your transfer channel.
Option A — download per-package zips (simplest)
Download the package(s) you need, transfer the zip(s) into the secure env, and unzip in step 3.
registream(core) — required for autolabel and datamirrorautolabel— variable + value labelingdatamirror— checkpoint-constrained synthetic data
Option B — copy from your own ado/plus
If you've already run net install registream on the dev machine, the files are already on disk. Save the installers from ~/ado/plus/ — typically:
~/ado/plus/r/registream.ado
~/ado/plus/r/registream.sthlp
~/ado/plus/a/autolabel.ado
~/ado/plus/a/autolabel.sthlp
~/ado/plus/_/_rs_*.ado
~/ado/plus/_/_al_*.adoCopy the whole tree or zip the relevant files. You'll restore them in step 3.
2. Transfer into the secure environment
Use whatever file-in channel the agency provides. This is typically an upload-box web form, an SFTP drop-box for the project, or email-to-airlock for small files. Some environments require a data steward to approve the transfer; others are self-service for reference data. Check the agency's docs.
You're moving two things:
- The bundle directory tree (from
~/.registream/autolabel/<domain>/). - The RegiStream installer files (from
~/ado/plus/).
Per-file size limits and chunking
Most agency upload boxes cap individual files. SCB MONA's drag-and-drop UI, for example, has historically used a per-file ceiling around 10 MB. To accommodate this, autolabel bundles ship pre-chunked: instead of one large variables.csv, you get variables/0000.csv, variables/0001.csv, … each well under 10 MB. Same pattern for scope/, value_labels/, and release_sets/.
The chunked layout mirrors what autolabel expects on disk; you don't reassemble anything. Drag each chunk file into the upload box one by one (or zip-volume the whole tree before upload if the agency allows zips and your zip-volume sizes also fit the cap).
3. Install inside the secure env
Place the bundle at the expected cache directory
Each language has a default cache directory that autolabel reads from. If your home directory isn't writable in the env, override it via the REGISTREAM_DIR environment variable.
| Language | Default cache dir | Override |
|---|---|---|
| Stata | ~/.registream/ (mac/linux); ~/AppData/Local/registream/ (Windows) |
REGISTREAM_DIR env var |
| Python | ~/.registream/ (mac/linux); ~/AppData/Local/registream/ (Windows) |
REGISTREAM_DIR env var |
| R | tools::R_user_dir("registream", "cache") — CRAN-compliant per OS (e.g. ~/Library/Application Support/org.R-project.R/R/registream/cache/ on macOS) |
REGISTREAM_DIR env var, or cache_dir field in config_r.toml |
Setting REGISTREAM_DIR makes all three clients share one directory — useful if you want a single offline cache to feed Stata, Python, and R at once.
Three ways to stage the bundle
autolabel accepts the bundle in any of three forms; pick whichever your file-in channel makes easiest. On first call, autolabel detects which form is present and does the rest.
# Option 1 (simplest) — drop the ZIP as-is.
$REGISTREAM_DIR/autolabel/scb_eng_v20260309.zip
# Option 2 — drop the unzipped bundle (the ZIP's internal layout).
$REGISTREAM_DIR/autolabel/scb_eng/
manifest/ scope/ variables/ value_labels/ release_sets/
# Option 3 — drop the post-processed flat cache (CSV+DTA pairs)
# from another machine, e.g. one that has run `autolabel scope` already.
$REGISTREAM_DIR/autolabel/scb/
manifest_eng.csv scope_eng.csv release_sets_eng.csv
variables_eng.csv variables_eng.dta
value_labels_eng.csv value_labels_eng.dta
scope_eng.dta release_sets_eng.dtaOn first call, options 1 and 2 are processed automatically: autolabel unzips the file (option 1) or reads the constituent folder (option 2), produces the flat cache layout (option 3), and runs the labeling. There is no separate ingest step. Option 3 is the form a previous autolabel run leaves behind, so transferring it from a colleague's machine works the same way.
Option 1 is the recommended default for MONA / Forskermaskinen / Dapla because it's a single file and matches the catalog download exactly. The agency's file-in channel typically caps individual files; if the ZIP exceeds the cap, choose option 2 (the bundle's internal sharding keeps each CSV well under typical 10 MB limits).
Place the Stata installer files
Stata-side, the installer ado files go on Stata's adopath:
~/ado/plus/
r/registream.ado
r/autolabel.ado
_/_rs_*.ado
_/_al_*.ado
If ~/ado/plus/ isn't on the adopath inside the env, add its location manually with adopath + before first use.
Run the offline first-run
* First call — registream sees no internet, prompts you for offline mode
use my_register_data.dta, clear
autolabel variables, domain(scb) lang(eng)The first-run wizard detects the lack of connectivity and defaults to Offline Mode: no auto-updates, no usage telemetry, no network calls. Accept the default.
You can also pre-configure before first use, which avoids the wizard entirely. Drop a minimal offline config at ~/.registream/config_stata.csv:
key;value
usage_logging;true
telemetry_enabled;false
internet_access;false
auto_update_check;falseSee Institutional setup → Shared configuration for the full template (with the equivalent TOML for Python and R).
Verify your install
Three quick checks confirm autolabel is using the right cache, the right config, and (if you turned it on) is logging usage as intended.
* Where does this autolabel install think its cache lives?
registream info
* What does the active config look like?
registream config
registream info prints the resolved cache directory, the install version, and the schema version of any cached bundles. registream config prints the current key/value table and the path of the file backing it.
Where to find the files autolabel writes:
$REGISTREAM_DIR/config_stata.csv— the active Stata config (key/value, semicolon-delimited). Equivalent files:config_python.toml,config_r.toml.$REGISTREAM_DIR/usage_stata.csv— the local usage log, written only whenusage_loggingistrue. One row per autolabel call; columns include timestamp, command, domain, lang. Equivalent for the other languages.$REGISTREAM_DIR/autolabel/datasets.csv— the cache index. Updated only when autolabel runs the API download path; pre-staged bundles (options 2 and 3 above) leave this file alone, since the client doesn't know which release line a manually-transferred file belongs to.
Confirm usage logging is on: after running one labeling command, usage_stata.csv should have grown by one row. If it doesn't appear, set usage_logging;true in config_stata.csv and re-run.
Configure the config file directly
The first-run wizard sets reasonable defaults; the table below is for cases where you'd rather pre-configure (or audit what the wizard wrote). All three languages use the same key set; only the file format differs.
| Key | Type | Default | What it does |
|---|---|---|---|
usage_logging |
boolean | true |
Append one row to usage_{lang}.csv per autolabel call. Local file only; no network. |
telemetry_enabled |
boolean | false |
Send anonymous usage pings to registream.org. Always set to false in secure environments. |
internet_access |
boolean | true |
Whether autolabel may make network calls (downloads, version checks). Set to false in secure environments to harden against accidental network use. |
auto_update_check |
boolean | true |
Daily check whether a newer bundle version exists for cached domains. Honours internet_access. |
To change a key after install, either edit the config file directly or run registream config, <key>(<value>) from Stata (with equivalents in Python and R). The file is the source of truth; the command is a thin writer.
4. Use autolabel
Once installed offline with bundle files in place, autolabel behaves identically to online use. No network calls happen. The tool reads bundles from ~/.registream/autolabel/ directly.
use lisa_2020.dta, clear
autolabel variables, domain(scb) lang(eng)
autolabel values, domain(scb) lang(eng)
* Pin a scope (recommended for mixed panels)
autolabel variables, domain(scb) lang(eng) scope("LISA" "Individer 16 år och äldre")
* Inspect
autolabel lookup kon kommun, domain(scb) lang(eng) detail
autolabel scope, domain(scb) lang(eng)See autolabel Stata reference for the full command set.
Updating later
When a new catalog version is released (you'll typically hear about it via a mailing list or a paper release note), repeat steps 1–2:
- On your dev machine, run
autolabel update datasets, domain(scb) lang(eng)to fetch the new bundle. - Transfer the refreshed directory tree into the secure env again.
- Replace the contents of
~/.registream/autolabel/<domain>/with the new tree.
The schema version is pinned in the bundle itself (schema_version = 2.0 in the manifest), so mismatched tool + bundle versions are caught at load time with a clear error rather than producing wrong labels.
Per-environment notes
SCB MONA (Sweden)
- Files move in through the MONA file-in interface; check with your project's data steward for size limits.
- autolabel has been deployed on MONA by several research groups; the workflow above has been validated there.
DST Forskermaskinen (Denmark)
- File-in channel is via the DST secure-transfer portal per project agreement.
- The
dstdomain is the relevant catalog.hagstofa(Iceland) is not currently relevant inside Forskermaskinen unless you're running a joint project.
SSB Dapla (Norway)
- Dapla is newer and has some tooling surface that may evolve how third-party tools are installed. Check with SSB for current guidance before first install.
ssbis the relevant catalog.
Other environments
The pattern is the same everywhere: move the installer files + bundle tree in, place at the expected paths, run offline setup. If your environment isn't listed here and you hit a wall, email jeffrey@registream.org — we'll add the walkthrough.
See also
- Install overview — platforms + first-run basics
- Institutional setup — shared config + private domains for team deployments
- autolabel Stata reference
- Catalog — all domains with bundle downloads