Skip to content

CLI tool for remote notebook execution on BERDL JupyterHub

Notifications You must be signed in to change notification settings

BERDataLakehouse/berdl_remote

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

berdl-remote

Lint and Test Python 3.13+ codecov

CLI tool for remote notebook execution on BERDL JupyterHub.

Execute notebooks and shell commands on your BERDL Hub environment from your local machine.

Installation

# Install directly from GitHub
pip install "git+https://github.com/BERDataLakehouse/berdl_remote.git"

# For development (from source)
git clone https://github.com/BERDataLakehouse/berdl_remote.git
cd berdl_remote
pip install -e .

# With development dependencies
pip install -e ".[dev]"

via pyproject.toml

# Add to pyproject.toml dependencies
berdl-remote = { git = "https://github.com/BERDataLakehouse/berdl_remote.git", rev = "main" }

Quick Start

1. Get Credentials from JupyterLab

The easiest way to configure berdl-remote is using the BERDL Access Request Extension in JupyterLab:

  1. Open JupyterLab on hub.berdl.kbase.us.
  2. Click the "Get Credentials" button (key icon) in the toolbar.
  3. Click Download Config File (saves as remote-config.yaml).
  4. Move the file to ~/.berdl/remote-config.yaml.

Alternatively, you can copy the configuration text from the modal and paste it into ~/.berdl/remote-config.yaml.

Manual Method (DevTools)
  1. Log in to Hub.
  2. Open Browser DevTools (F12 → Application → Cookies).
  3. Copy _xsrf, jupyterhub-session-id, and jupyterhub-user-USERNAME.
  4. Run berdl-remote configure.

2. Configure

berdl-remote configure

Follow the prompts to enter your Hub URL, username, and cookies.

Configuration is saved to ~/.berdl/remote-config.yaml.

3. Verify Connection

berdl-remote status

Important: Your Jupyter server must be running with at least one notebook open (for an active kernel).

Commands

berdl-remote status

Check connection to your Jupyter server.

berdl-remote status

berdl-remote run (Papermill)

Execute a notebook with optional parameters. Creates a separate output file.

# Basic execution (local file in your home directory)
berdl-remote run /home/myuser/notebooks/analysis.ipynb

# With parameters
berdl-remote run /home/myuser/notebooks/analysis.ipynb \
  -p batch_size 100 \
  --output /home/myuser/notebooks/analysis_executed.ipynb

# Using S3/MinIO paths directly
berdl-remote run s3://cdm-lake/users-general-warehouse/myuser/notebooks/analysis.ipynb \
  --output s3://cdm-lake/users-general-warehouse/myuser/notebooks/analysis_executed.ipynb

berdl-remote nbconvert

Execute a notebook in place using nbconvert.

berdl-remote nbconvert /home/myuser/notebooks/quick_test.ipynb --inplace

berdl-remote python

Execute Python code directly on the remote kernel. Has access to all kernel variables (spark, get_settings, etc.).

# Print settings
berdl-remote python "print(get_settings().USER)"

# Run Spark queries
berdl-remote python "spark = get_spark_session(); spark.sql('SHOW DATABASES').show()"

# Check environment
berdl-remote python "import os; print(os.environ.get('MINIO_ENDPOINT_URL'))"

berdl-remote shell

Execute shell commands on the remote server.

# List files
berdl-remote shell "ls -la /minio/my-files/notebooks/"

# Check papermill version
berdl-remote shell "papermill --version"

# Execute notebook with papermill manually (via shell)
berdl-remote shell "papermill s3://cdm-lake/users-general-warehouse/myuser/notebooks/analysis.ipynb s3://cdm-lake/users-general-warehouse/myuser/notebooks/analysis_executed.ipynb"

Working with BERDL MinIO Object Storage Notebooks

For detailed instructions on accessing MinIO, configuring the MinIO client (mc), and using Python/boto3 with BERDL MinIO, please refer to the BERDL MinIO Guide.

MinIO Path Structure

Files are stored in the cdm-lake bucket with this structure:

Type S3 Path
Personal files s3://cdm-lake/users-general-warehouse/YOUR_USERNAME/
Personal SQL warehouse s3://cdm-lake/users-sql-warehouse/YOUR_USERNAME/
Tenant files s3://cdm-lake/tenant-general-warehouse/TENANT_NAME/
Tenant SQL warehouse s3://cdm-lake/tenant-sql-warehouse/TENANT_NAME/

Example for user myuser:

  • Notebook location: s3://cdm-lake/users-general-warehouse/myuser/notebooks/test.ipynb

Executing Notebooks

# Execute notebook from S3/MinIO
berdl-remote run s3://cdm-lake/users-general-warehouse/myuser/notebooks/test.ipynb \
  --output s3://cdm-lake/users-general-warehouse/myuser/notebooks/test_executed.ipynb

# Using shell (direct papermill control)
berdl-remote shell "papermill s3://cdm-lake/users-general-warehouse/myuser/notebooks/test.ipynb s3://cdm-lake/users-general-warehouse/myuser/notebooks/test_executed.ipynb"

# With parameters
berdl-remote run s3://cdm-lake/users-general-warehouse/myuser/notebooks/analysis.ipynb \
  --output s3://cdm-lake/users-general-warehouse/myuser/notebooks/analysis_output.ipynb \
  -p date 2024-01-15

Automation Workflow

Example: Automated Notebook Execution

#!/bin/bash
# run_analysis.sh

# 1. Upload notebook to MinIO
mc cp analysis.ipynb berdl-minio/cdm-lake/users-general-warehouse/myuser/notebooks/

# 2. Execute remotely with parameters
berdl-remote run s3://cdm-lake/users-general-warehouse/myuser/notebooks/analysis.ipynb \
  -p date "2024-01-15" \
  -p data_source "s3://bucket/data/" \
  --output s3://cdm-lake/users-general-warehouse/myuser/notebooks/analysis_2024-01-15.ipynb

# 3. Download results
mc cp berdl-minio/cdm-lake/users-general-warehouse/myuser/notebooks/analysis_2024-01-15.ipynb ./results/

Configuration

Config File Format

hub_url: https://hub.berdl.kbase.us
username: your_username
cookies:
  _xsrf: "abc123..."
  jupyterhub-session-id: "xyz789..."
  jupyterhub-user-your_username: "token..."

Multiple Configurations

# Use a specific config file
berdl-remote --config ~/.berdl/prod-config.yaml status
berdl-remote --config ~/.berdl/dev-config.yaml status

Security Notes

  • Credentials are stored in ~/.berdl/remote-config.yaml with file permissions set to 0600
  • Session cookies expire; you may need to reconfigure periodically
  • Never share your config file or cookies

About

CLI tool for remote notebook execution on BERDL JupyterHub

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages