RegiStream

Version:

Philosophy

RegiStream for Python is designed to integrate seamlessly with pandas, making the labeling process as intuitive as possible.

We use pandas accessors to provide a natural, pandas-native experience. By simply adding .lab to your DataFrame or Series, you gain access to labeled versions of your data. This approach means you can continue using all the pandas functionality you already know, with the added benefit of proper variable and value labels.

Whether you're previewing data, creating tables, plotting, or running regressions, RegiStream enhances your workflow without changing it - you just add .lab to bring your labels into the picture.

Installation

RegiStream for Python can be installed via pip or manually for offline environments.

Pip Install (Recommended)

The quickest way to install RegiStream is directly from PyPI using pip:

pip install registream

This command will download and install the latest version of RegiStream (currently v1.0.1), including all necessary dependencies.

Offline Installation for Secure Environments

If you are working on an offline server or a high-security environment (e.g., MONA), you can manually download the RegiStream wheel file and transfer it to the secure system.

Download from PyPI

After downloading, follow these steps:

Download the wheel file (*.whl) from PyPI on your local system
Transfer the wheel file to your secure server system
On the secure system, install the package from the local wheel file:

pip install /path/to/registream-1.0.1-py3-none-any.whl

Note: For information on downloading and setting up metadata files in secure environments, see the Datasets Guide.

Verification

To verify the installation was successful, run:

import registream
registream.__version__
# Expected output: '1.0.1'

This should display the installed version.

Uninstalling

To uninstall RegiStream:

pip uninstall registream

Updating RegiStream

To update RegiStream to the latest version, use pip's upgrade option:

# Update to latest version
pip install --upgrade registream

# Or check for updates first
pip list --outdated | grep registream

Note: Updating the package will not affect your cached metadata files. Metadata is stored separately in ~/.registream/autolabel_keys/ and persists across package updates.

Checking Current Version

To see which version you have installed:

import registream
print(registream.__version__)

Quick Start

Get started with RegiStream in just a few lines of code!

import pandas as pd
import registream

# Load your data
df = pd.read_stata('path/to/your/data.dta')

# Apply variable labels
df.autolabel(domain='scb', lang='eng')

# Apply value labels
df.autolabel(label_type='values', domain='scb', lang='eng')

# View labeled data
df.lab.head()

That's it! Your DataFrame now has variable and value labels that you can access using the .lab accessor.

Usage Guide

Applying Labels to DataFrames

The autolabel() method applies metadata labels to your pandas DataFrame:

# Apply variable labels (labels for column names)
df.autolabel(domain='scb', lang='eng')

# Apply value labels (labels for categorical values)
df.autolabel(label_type='values', domain='scb', lang='eng')

# Apply both at once
df.autolabel(domain='scb', lang='eng')
df.autolabel(label_type='values', domain='scb', lang='eng')

Viewing Labeled Data

Use the .lab accessor to view your data with labels:

# Preview with variable labels (column names are labeled)
df.lab.head()

# Preview with both variable and value labels
df.lab.show_values().head()

# View specific columns with labels
cols_to_show = ['astsni2007', 'ssyk3', 'astsni2002']
df[cols_to_show].lab.head()

What's the difference?

df.lab.head() - Shows data with variable labels as column names
df.lab.show_values().head() - Shows data with both variable labels AND value labels for categorical data

Tabulation with Labels

Create frequency tables using labeled values:

# Without labels
df.astsni2007.value_counts()

# With value labels
df.lab.astsni2007.value_counts()

# Crosstab with labels
pd.crosstab(df.lab.kon, df.lab.syssstatj)

Creating Labeled Plots

Use .lab with plotting libraries for automatically labeled visualizations:

import matplotlib.pyplot as plt
import seaborn as sns

# Aggregate data by year and employment status
agg_df = df.groupby(["examar", "syssstatj"], as_index=False).agg({"inkpens": "mean"})

# Create a plot with labeled variables
plt.figure(figsize=(10, 6))
ax = sns.scatterplot(data=agg_df.lab, x='examar', y='inkpens', hue='syssstatj', palette="viridis")

# Style adjustments
plt.title('Pension Income by Year and Employment Status')
plt.xlabel('Year')
plt.ylabel('Average Pension Income')
plt.xticks(rotation=45)

plt.show()

Regression Analysis with Labels

Extract labeled variables for statistical modeling:

import statsmodels.api as sm

# Extract labeled variables
income = df.lab['dispinkfam04']  # Numeric variable
industry = df.lab['astsni2007']  # Categorical variable
outcome = df.lab['dispink04']    # Target variable

# Convert categorical variable into one-hot encoding
X = pd.get_dummies(industry, drop_first=True, dtype=float)
X[income.name] = income.astype(float)

# Add intercept
X = sm.add_constant(X)

# Run the regression
model = sm.OLS(outcome.astype(float), X).fit()
print(model.summary())

API Reference

autolabel()

Apply variable or value labels to a DataFrame.

df.autolabel(label_type='variables', domain='scb', lang='eng', force=False)

Parameters:

Parameter	Type	Description
`label_type`	str	Type of labels: `'variables'` or `'values'` (default: `'variables'`)
`domain`	str	Dataset domain (currently: `'scb'`)
`lang`	str	Language: `'eng'` (English) or `'swe'` (Swedish)
`force`	bool	Force re-download of metadata (default: `False`)

.lab Accessor

Access labeled versions of your DataFrame or Series.

# Access labeled DataFrame
df.lab.head()

# Access labeled column
df.lab.astsni2007

# Show value labels
df.lab.show_values().head()

meta_search()

Search for variables containing a specific keyword in their name or label.

# Search for variables related to "industry"
df.meta_search("industry")

# Search for "income" variables
df.meta_search("income")

Variable Label Methods

Get and set variable labels programmatically:

# Get a single variable label
df.get_variable_labels('astsni2007')

# Get multiple variable labels
df.get_variable_labels(['astsni2007', 'ssyk3'])

# Set a variable label
df.set_variable_labels('astsni2007', "Industry classification (SNI 2007)")

Value Label Methods

Get and set value labels for categorical variables:

# Get value labels for a variable
labels = df.get_value_labels('astsni2007')

# Update value labels
updates = {'00000': 'custom label 1', '01110': 'custom label 2'}
df.set_value_labels('astsni2007', updates)

lookup()

Look up variable metadata from the RegiStream domain without loading data:

from registream import lookup

# Lookup a variable in the SCB domain
lookup('astsni2007', domain='scb', lang='eng')

Configuration

Data Storage Locations

By default, RegiStream stores metadata files in your system's user home directory:

macOS/Linux: ~/.registream/autolabel_keys/
Windows: %USERPROFILE%\AppData\Local\registream\autolabel_keys\

This default location enables seamless sharing of label data across projects and programming languages (Stata, Python, R).

Secure Environments (MONA, etc.)

If you are working on a secure system that does not have a standard user home directory, you need to specify a custom directory for RegiStream to store files.

Option 1: Set in Your Script

import os
os.environ['REGISTREAM_DIR'] = "/path/to/custom/directory"

# Then import and use registream
import registream

Option 2: Set System-Wide Environment Variable

Linux/macOS:

Add to your .bashrc, .zshrc, or equivalent shell configuration file:

export REGISTREAM_DIR="/path/to/custom/directory"

Then reload your shell configuration: source ~/.bashrc or restart your terminal.

Windows:

Set via System Properties → Advanced → Environment Variables, or use Command Prompt (as Administrator):

setx REGISTREAM_DIR "C:\path\to\custom\directory"

Restart any command prompts or applications for the change to take effect.

Setting Up Data Files in Secure Environments

After setting your custom directory, create the required folder structure:

# Create directories
/path/to/custom/directory/registream/
/path/to/custom/directory/registream/autolabel_keys/

Download and extract metadata files into the appropriate subdirectories:

Download the ZIP files (e.g., scb_variables_eng.zip)
Extract to get folders like scb_variables_eng/
Move the CSV files into /path/to/custom/directory/registream/autolabel_keys/scb_variables_eng/

Important: Folder names must match exactly (e.g., scb_variables_eng). RegiStream will automatically recognize and use them when you run your code.

Troubleshooting

Import Error: No module named 'registream'

Make sure RegiStream is installed in your current Python environment:

pip install registream

Labels Not Appearing

Common causes:

Metadata not downloaded: First run of autolabel() will download metadata automatically if you have internet
Variable names don't match: Check that your DataFrame column names match the metadata (case-insensitive)
Wrong domain/language: Verify you're using the correct domain and lang parameters

File Not Found Errors in Secure Environments

Ensure you've set the REGISTREAM_DIR environment variable correctly and that metadata files are in the right location.

Getting Help

For more help:

Additional Resources

For dataset-specific documentation, see the Datasets Guide. To create custom metadata, check out the Custom Datasets Guide.

Recent Changes

Latest updates for RegiStream Python

Loading changelog...

View Full Changelog