General Questions
What is RegiStream?
RegiStream is a tool that provides harmonized metadata for research datasets. It automatically applies variable labels, value labels, and documentation to your data, saving time and ensuring consistency across research projects.
Which statistical software does RegiStream support?
RegiStream currently supports Stata and Python. R support is planned for future releases.
Is RegiStream free to use?
Yes, RegiStream is completely free and open source. You can view the source code and contribute on GitHub.
What data sources are available?
Currently, RegiStream provides metadata for Statistics Sweden (SCB) datasets. We are actively working on adding more data sources. You can also create your own custom metadata for proprietary or institution-specific datasets.
Installation Questions
How do I install RegiStream for Stata?
The recommended installation method is via net install:
net install registream, from("https://registream.org/install/stata/latest") replace
SSC installation will be available soon once the package is submitted and approved by the SSC archive.
For offline or high-security environments, you can also manually download the package. See the complete Stata installation guide for all methods and detailed instructions.
How do I install RegiStream for Python?
The recommended installation method is via pip:
pip install registream
This will install the latest version (currently v1.0.1) from PyPI.
For offline or high-security environments, you can manually download the wheel file from PyPI. See the complete Python installation guide for all methods and detailed instructions.
How do I update RegiStream?
RegiStream can update both the package itself and the metadata datasets. We recommend updating regularly to get the latest features, bug fixes, and metadata improvements.
Quick Commands:
- Update package:
registream updateorregistream update package - Update datasets:
registream update dataset
See detailed instructions:
Can I use RegiStream offline?
Yes! RegiStream downloads metadata files to your local machine. Once downloaded, you can use them offline. You'll only need an internet connection to:
- Download new datasets
- Update existing metadata
- Check for package updates
Metadata is stored in ~/.registream/autolabel_keys/ on Mac/Linux or
%USERPROFILE%\AppData\Local\registream\autolabel_keys\ on Windows.
Usage Questions
What is the difference between schema version and data version?
Schema version (e.g., Schema 1.0) defines the structure and required columns for metadata files. Data version (e.g., v20251018) refers to a specific release of a dataset with updated content.
For example, SCB may release multiple data versions (v20251018, v20260115) that all follow Schema 1.0 specifications.
How do I know which version of a dataset to use?
By default, RegiStream uses the latest version of each dataset. For reproducibility in published
research, we recommend specifying an exact version:
* Use latest version (default)
autolabel variables, domain(scb) lang(eng)
* Use specific version for reproducibility
autolabel variables, domain(scb) lang(eng) version(20251018)
What languages are available?
Available languages depend on the dataset. Statistics Sweden (SCB) currently offers:
eng- Englishswe- Swedish
Check the datasets documentation for available languages per dataset.
Can I apply labels in multiple languages?
Yes, but only one language can be active at a time in your dataset. You can switch languages by running
autolabel again with a different lang() option.
Custom Data Questions
How do I create my own custom label data?
You can create custom metadata for your own datasets by following the RegiStream schema specifications. This allows you to use RegiStream's labeling features with proprietary or institution-specific data.
Steps:
- Create a variables CSV file with your metadata (variable names, labels, types, etc.)
- Optionally create a value labels CSV file for categorical variables
- Save files in
~/.registream/autolabel_keys/following the naming convention - Use
autolabelcommand with your custom domain name
See the complete Custom Datasets Guide for detailed instructions, file format specifications, and examples.
Can I share my custom metadata with collaborators?
Yes! Custom metadata files are just CSV files stored locally. You can share them with collaborators by:
- Directly sharing the CSV files (they can place them in their
~/.registream/autolabel_keys/directory) - Storing them in a shared network drive or repository
- Including them in your project's supplementary materials
Note: You're sharing only the metadata (labels, descriptions), not the actual data.
What file format do custom datasets use?
Custom datasets must be semicolon-delimited CSV files (;) with UTF-8 encoding. See the
Schema Requirements section for complete specifications.
Troubleshooting Questions
Why am I getting "file not found" errors?
This usually means RegiStream hasn't downloaded the metadata files yet. The first time you run
autolabel, it will automatically download the required metadata.
If the issue persists, force a fresh download:
* Force re-download of metadata
autolabel variables *, domain(scb) lang(eng) force
Also check your internet connection and verify the files are being stored in ~/.registream/autolabel_keys/
Labels aren't being applied to my variables. What's wrong?
Common causes:
- Variable names don't match: RegiStream matches on variable names. The matching is case-insensitive,
but variable names must be exact matches (e.g.,
konmatches, butgenderwon't). - Wrong domain: Make sure you're using the correct
domain()option for your dataset source. - Variables not in metadata: Not all variables may have metadata available in the dataset you're using.
- Metadata not downloaded: Try running the command with the
forceoption to re-download metadata.
After running autolabel, check the output to see how many variables were labeled successfully.
How do I report bugs or request features?
Please report issues on our GitHub Issues page. Include:
- Your Stata/Python version
- RegiStream version (
which registreamin Stata) - Exact error message
- Steps to reproduce the issue
Can I use older schema versions?
We recommend using the latest schema version for new projects. However, older versions remain available for download to ensure reproducibility of published research. Note that deprecated schemas are not actively maintained and may have limitations.
Where are RegiStream files stored on my computer?
RegiStream uses the following directories:
- Mac/Linux:
~/.registream/ - Windows:
%USERPROFILE%\AppData\Local\registream\
Metadata files are in the autolabel_keys/ subdirectory.
How much disk space does RegiStream use?
Each metadata file is typically 1-10 MB compressed. The total footprint depends on how many datasets and languages you download. Most users will use less than 100 MB total.
Still have questions?
Check the full documentation for Stata or Datasets, or visit our GitHub Issues page to ask the community.