autolabel · Stata
Stata reference
Complete command reference for autolabel in Stata — syntax, options, labeling rules, deferred execution, examples, and institutional metadata.
Install
Requires Stata 16.0 or later.
net install registream, from("https://registream.org/install/stata/latest") replace
net install registream is the meta-package — it installs core + every module (autolabel + datamirror). If you only want autolabel, install just that one: net install autolabel pulls core in as a dependency automatically. Full install notes and the first-run wizard: install guide.
Quick start
use lisa_2020.dta, clear
* Apply variable and value labels from SCB metadata (English)
autolabel variables, domain(scb) lang(eng)
autolabel values, domain(scb) lang(eng)That's it — variable and value labels are now applied based on the best-matching scope in the SCB metadata. See Labeling rules for what happens under the hood when no scope is pinned.
Syntax
Labeling commands
autolabel variables [varlist], domain(string) lang(string)
[scope(string) release(string)
exclude(varlist) suffix(string) dryrun savedo(filename)]
autolabel values [varlist], domain(string) lang(string)
[scope(string) release(string)
exclude(varlist) suffix(string) dryrun savedo(filename)]Inspection commands
autolabel lookup varlist, domain(string) lang(string) [scope(string) detail]
autolabel scope [search], [domain(string) lang(string) scope(string) list]
autolabel suggest, [domain(string) lang(string) scope(string) list]Maintenance commands
autolabel update [package | datasets] [, domain(string) lang(string) version(string)]
autolabel info
autolabel version
autolabel citeDescription
autolabel applies variable and value labels from structured metadata to datasets. It downloads and caches metadata from registream.org, then matches variables in your dataset against the catalog to apply human-readable labels.
RegiStream hosts metadata for government statistical agencies including Statistics Sweden (scb), Statistics Denmark (dst), Statistics Norway (ssb), Försäkringskassan (fk), Socialstyrelsen (sos), and Statistics Iceland (hagstofa). Any institution can also create metadata files for their own data sources — see Institutional metadata.
Autolabel schema v2 preserves register-level context: each variable can appear in multiple scopes (registers, sub-populations) with different labels, value definitions, and releases. When no scope() is specified, autolabel automatically infers the best-matching scope by analyzing the variables in your dataset.
Scope levels are defined per domain. For SCB the two scope levels are Register (e.g. LISA) and Variant (e.g. Individer 16 år och äldre). For SSB the levels are Source and Group. Scope levels are declared in each domain's manifest file and can vary across providers — the schema supports any depth.
autolabel, RegiStream asks you to choose a setup mode (Offline, Standard, or Full). This determines whether metadata downloads automatically and whether usage data is collected. Change later with registream config.
Options — Required
domain(string)
The metadata domain. Each domain represents an institution or data provider (e.g. scb for Statistics Sweden, dst for Statistics Denmark). Institutions can create custom domains for their own data; see Institutional metadata.
lang(string)
The language for labels. Available languages depend on the domain. For scb: eng (English) or swe (Swedish).
Options — Filtering
scope(string)
Filter metadata to a specific scope within the domain. Pass one or more quoted strings — one per scope level. For SCB (2-level: Register + Variant):
scope("LISA")— matches all sub-scopes under LISAscope("LISA" "Individer 16 år och äldre")— matches that specific scope
Each quoted token is matched against the corresponding scope-level column. Matching priority per level: (1) exact alias match (case-insensitive), then (2) name substring match (case-insensitive). scope("LISA") matches the alias "LISA"; scope("Integration") matches the full name "Longitudinell integrationsdatabas..." via substring.
When omitted, autolabel automatically infers the best-matching scope by analyzing which scope's variable list best overlaps your dataset's variables. The detected scope and match percentage are displayed.
release(string)
Filter to a specific release. For example, release(2005) keeps metadata rows whose release set includes the "2005" release. Useful when value label sets change over time (e.g. municipality codes, education classifications).
Options — Output
exclude(varlist)
Variables to exclude from labeling.
suffix(string)
Append a suffix to create new labeled variables instead of modifying existing ones. When using autolabel values on string categorical variables, the original string codes are permanently replaced with numeric codes. Use suffix() to preserve the original variable. See Important limitations.
detail
For autolabel lookup only. Show every scope level and release entry for each variable instead of the default summary view. Without detail, lookup shows one block per variable with the most common label and a scope count. With detail, every individual entry is displayed.
list
For autolabel scope and autolabel suggest. Show all scopes instead of the default limit of 10.
Options — Deferred execution
dryrun
Display the generated labeling commands without executing them. The commands are shown exactly as they would be applied, so you can inspect what autolabel will do before it modifies your dataset. See Deferred execution.
savedo(filename)
Save the generated labeling commands to a do-file without executing them. The saved file is a complete, executable do-file that can be reviewed, edited, and run later with do. See Deferred execution.
Labeling rules
When you run autolabel variables or autolabel values, autolabel needs to decide which metadata row to use for each variable. This section describes the exact rule — both for automatic mode and for explicit-pin mode.
Automatic mode (no pin)
Automatic mode is what you get when you run autolabel variables, domain(scb) lang(eng) with no scope() / release() options. Two steps happen:
-
Primary scope inference. For each scope in the domain, autolabel counts how many variables from your dataset it contains. The scope with the highest count is reported as the primary:
If the top scope covers fewer than 10% of your dataset's variables, autolabel reports "no strong primary scope" instead — this signals that the dataset is a mixed panel with no obvious dominant source.
Primary scope (inferred): LISA [Longitudinell...] — 160 of 175 variables (91%) -
Per-variable collapse with majority fallback. For every variable in your dataset that has at least one metadata row, autolabel picks the winning row using this priority:
- If the variable is in the inferred primary scope, use the row from that scope.
- Otherwise, use the row whose label appears most often across all the variable's candidate rows (majority rule over scopes). This is how variables that aren't in the primary scope still get sensible labels from wherever they do appear.
- Deterministic tiebreak on scope level columns so results are reproducible.
Every variable with any metadata row gets labeled — not just those in the primary scope. The primary-scope preference is a sort-key bias, not a filter. Variables not in the primary fall through to the majority-label fallback across the scopes they do appear in.
The success message gives you an honest split:
✓ Variable labels applied to 165 of 175 variable(s).
160 from LISA (primary inferred scope, 91%)
5 from other scopes (majority label fallback)
10 skipped (no entry in scb metadata — existing labels preserved)Explicit-pin mode
When you pass scope() (and optionally release()), autolabel skips inference entirely and filters the metadata to the pinned subset before the collapse. Only rows matching your pin are considered.
autolabel variables lopnr kon alder, domain(scb) lang(eng) scope("LISA")Label-wipe guard: variables in your varlist that have no row in the pinned scope are skipped — their pre-existing labels are not overwritten. This makes it safe to chain multiple explicit-pin calls for multi-scope panels.
Multi-scope panels
If your panel mixes variables from multiple scopes, you have two options:
Option A — automatic (simple): run autolabel variables, domain(scb) lang(eng) with no pin. Primary scope is inferred; variables not in the primary fall through to majority-label. One command, everything gets labeled, and the success message reports exactly how many labels came from each source.
Option B — explicit-pin per subset (reproducible): run autolabel suggest first to preview coverage per scope, then pin each subset with its own varlist:
autolabel variables lopnr kon alder kommun, ///
domain(scb) lang(eng) scope("LISA" "Individer 16 år och äldre")
autolabel variables cfarnr bransch anstallda, ///
domain(scb) lang(eng) scope("Företagsregister")
The label-wipe guard ensures earlier calls' labels survive later calls. Overlap only happens if you explicitly list the same variable in two different pinned calls — in which case the later call wins (last-write semantics, same as Stata's own label variable).
Preview coverage (autolabel suggest)
autolabel suggest analyzes your currently-loaded dataset against the domain metadata and reports which scopes would contribute labels under automatic mode, without actually applying anything. It's the recommended first step for mixed-panel labeling workflows.
autolabel suggest, domain(string) lang(string) [scope(string) list]Top-level view
. use my_panel.dta, clear
. autolabel suggest, domain(scb) lang(eng)Displays a compact coverage table of scopes that would contribute labels under automatic mode:
- Scope — the scope name (clickable, drills into the detail view)
- Count — how many of your dataset's variables would be labeled from this scope
- Share — the same count as a percentage of your dataset variables
A leading * marks the inferred primary scope (if one exists with ≥10% coverage). The header shows the domain, language, and total coverage (e.g. scb/eng | 165 of 174 covered).
By default the top 10 scopes are shown. Pass list to show all contributing scopes.
Scope-detail view
. autolabel suggest, domain(scb) lang(eng) scope("LISA")
Lists the variables in your dataset that would be labeled from the specified scope, along with the label each would receive. Each variable name is a clickable hyperlink that drills into autolabel lookup for that variable.
Also prints a copy-pasteable explicit-pin command for the subset:
. autolabel variables agstfa akassa akters ..., ///
domain(scb) lang(eng) scope("LISA" "Individer 16 år och äldre")
This is the foundation of the multi-scope panel workflow. Run suggest once to see per-scope breakdown, then either apply labels automatically or pin each subset explicitly by copy-pasting the commands for the scopes you want.
Under the hood. autolabel suggest runs the same collapse that autolabel variables uses (primary inference + majority-label fallback), then groups the per-variable scope attribution so you can see exactly what automatic mode would produce before you run it.
Deferred execution
By default, autolabel variables and autolabel values generate and immediately execute labeling commands. The dryrun and savedo() options separate inspection from application, giving you full control over what is applied to your dataset.
A typical deferred workflow:
-
Inspect what labels are available:
. autolabel lookup kon kommun, domain(scb) lang(eng) -
Preview the commands that would be generated:
. autolabel variables, domain(scb) lang(eng) dryrun -
Save to a do-file for review and editing:
. autolabel variables, domain(scb) lang(eng) savedo("my_labels.do") . view "my_labels.do" -
Apply when satisfied:
. do "my_labels.do" * or directly: . autolabel variables, domain(scb) lang(eng)
The dryrun and savedo() options work with both variables and values modes. The lookup command is always non-destructive and does not need these options.
Examples — Basic labeling
. autolabel variables, domain(scb) lang(eng)Label all variables using SCB metadata in English.
. autolabel values, domain(scb) lang(swe)Apply value labels to all variables using SCB metadata in Swedish.
. autolabel variables ku*ink yrkarbtyp, domain(scb) lang(eng) exclude(ku3ink)Label specific variables (with wildcard), excluding ku3ink.
. autolabel values kon, domain(scb) lang(eng) suffix("_lbl")Create a new labeled variable kon_lbl, preserving the original.
Examples — Scope-specific
. autolabel variables, domain(scb) lang(eng) scope("LISA") release("2005")
Apply labels specific to LISA for the 2005 release. For example, kon receives the label "Gender" (from LISA), not "Gender of child" (from Barnregistret).
. autolabel variables, domain(scb) lang(eng) scope("LISA" "Individer 16 år och äldre")Apply labels from a specific scope level 1 + level 2 combination using multi-string syntax.
. autolabel values, domain(scb) lang(eng) scope("Barnregistret")Apply value labels from Barnregistret using scope level 1 match.
Examples — Lookup
. autolabel lookup kon kommun, domain(scb) lang(eng)Display metadata for kon and kommun across all scopes.
. autolabel lookup kon, domain(scb) lang(eng) detailShow every scope level and release entry for kon.
Examples — Browse scopes
. autolabel scope, domain(scb) lang(eng)Browse top-level scopes (first 10).
. autolabel scope LISA, domain(scb) lang(eng)Search for scopes matching "LISA" by name or alias.
. autolabel scope, domain(scb) lang(eng) scope("LISA")Drill into LISA — shows sub-scopes (variants) with release counts.
. autolabel scope, domain(scb) lang(eng) scope("LISA" "Individer 16 år och äldre")Show releases for a specific scope.
. autolabel scope, domain(scb) lang(eng) scope("LISA" "Individer 16 år och äldre" "2005")Show variables in a specific release (overflow token = release).
. autolabel scope, domain(scb) lang(eng) listShow all scopes (no 10-row limit).
Examples — Deferred execution
. autolabel variables, domain(scb) lang(eng) dryrunPreview labeling commands without applying them.
. autolabel values, domain(scb) lang(eng) savedo("my_value_labels.do")Save value labeling commands to a do-file for review.
Examples — Dataset updates
. autolabel update datasetsCheck for and download metadata updates for all cached domains.
. autolabel update datasets, domain(scb) lang(eng)Check for updates for a specific domain and language.
Important limitations
When using autolabel values on string categorical variables, the original string codes are permanently replaced with sequential numeric codes (1, 2, 3...). This means:
- Original string codes cannot be recovered after encoding
- You cannot filter by original string values after labeling
- Re-running
autolabel valuesrequires reloading original data
Numeric categorical variables do not have this limitation — they preserve original numeric codes when labels are applied.
Solution: use the suffix() option to preserve original data:
. autolabel values astsni2007, domain(scb) lang(eng) suffix("_lbl")
This keeps the original astsni2007 variable unchanged and creates a new labeled variable astsni2007_lbl.
Institutional metadata
Any institution can create metadata for use with autolabel. This is useful for organizations that maintain their own register data, administrative records, or survey datasets with standardized variable definitions.
Requirements
Create five semicolon-delimited CSV files following the autolabel schema v2:
{domain}_manifest_{lang}.csv— key-value manifest declaring domain metadata, scope depth, and scope level names{domain}_scope_{lang}.csv— atomic scope-release manifest withscope_level_1,scope_level_1_alias, etc. columns{domain}_variables_{lang}.csv— variable names, labels, definitions,value_label_idandrelease_set_idforeign keys{domain}_value_labels_{lang}.csv— value label mappings in both JSON and Stata format{domain}_release_sets_{lang}.csv— junction table linking release sets to scope atoms
See the schema v2 reference for the full specification.
Installation
Place the CSV files in ~/.registream/autolabel/{domain}/. No internet access is required — the files are read directly from disk. Use with:
. autolabel variables, domain(yourdomain) lang(yourlang)Secure environments
For secure environments (MONA, DST Forskermaskinen, SSB Dapla, and similar): an authorized person copies the CSV files onto the secure server. No network access is needed at runtime.
See also
- Schema v2 reference — the data format autolabel reads
- Catalog — available domains, bundle downloads, attribution
- Install guide — including secure environments and institutional setup
- Citation — or run
autolabel citefor a version-pinned block
Core package commands (from registream):
registream info— view current configurationregistream config, option(value)— change settingsregistream update— check for package updatesregistream stats— view usage statistics