Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/commands/build.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ The HTML visualization provides an interactive graph that can be viewed in a web

### `release`

Build a release RO-Crate in a directory, scanning for and linking existing sub-RO-Crates. This creates a parent RO-Crate that references and contextualizes the sub-crates.
Build a release RO-Crate in a directory, scanning for and linking existing sub-RO-Crates. This creates a parent RO-Crate that references and contextualizes the sub-crates. For more details see [workflow documentation](release_creation.md)

```bash
fairscape-cli build release [OPTIONS] RELEASE_DIRECTORY
Expand Down
44 changes: 44 additions & 0 deletions docs/commands/release_creation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# How to build a FAIRSCAPE Release

The process is a little complicated I hope to explain its current state and come up with a better future solution without just adding more commands.

## Overview

The `build release` command can operate in two modes:

1. **Full pre-processing** (default): Processes all subcrates, then creates the release crate.
2. **Skip pre-processing** (`--skip-subcrate-processing`): Creates the release crate without processing subcrates. The sub-crates would need to be later linked to the release crate and processed.

## What does processing sub-crates mean?

Processing function is located in `src/fairscape_cli/utils/build_utils.py`, this function performs four steps on each subcrate found in the release directory:

| Step | Function | Description |
| ---- | -------------------------- | ----------------------------------------------------------------------- |
| 1 | `process_link_inverses()` | Adds OWL inverse properties using the EVI |
| 2 | `process_add_io()` | Calculates and adds `EVI:inputs` and `EVI:outputs` to the root ro-crate |
| 3 | `process_evidence_graph()` | Generates provenance graph JSON and HTML |
| 4 | `process_croissant()` | Converts RO-Crate metadata to Croissant |
| 5\* | `buld_preivew()` | Builds html preview for the RO-Crate |

## Mapping to CLI Commands

Each processing step can be executed using CLI commands:

| Processing Step | Equivalent CLI Command |
| -------------------------- | -------------------------------------------------------- |
| `process_link_inverses()` | `fairscape augment link-inverses <rocrate-path>` |
| `process_add_io()` | `fairscape augment add-io <rocrate-path>` |
| `process_evidence_graph()` | `fairscape build evidence-graph <rocrate-path> <ark-id>` |
| `process_croissant()` | `fairscape build croissant <rocrate-path>` |
| `build_preview()` | `fairscape build preview <rocrate-path>` |

## Why do Subcrates need to be processed

The CLI doesn't know while creating and adding things when the subcrate is completed. So once the user is finished with the subcrate some post-processing occurs to make it "release ready". This post-processing adds missing terms IE fills in generated or generatedBy to make sure all terms are pointing in both directions. Fills in I/O information important for crates saying they were generated by other crates. Creates useful supporting documents and formats HTML Preview, Croissant, and evidence graphs. A sub-crate is valid without all this, but these are an important part of our release processing.

## How to make release first sub-crates later better?

- Add build preview (needed regardless).
- Add augment sub-crate that does all 4 steps so you don't need to do them individually.
- Somehow link and rebuild release? Aggregated metrics rebuilt datasheet with pointers to sub-crate
220 changes: 220 additions & 0 deletions docs/subcrate-processing-workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# Subcrate Processing Workflow

This document describes how `process_all_subcrates` works in the `build release` command and how its steps map to individual CLI commands for flexible workflow support.

## Overview

The `build release` command can operate in two modes:
1. **Full processing** (default): Processes all subcrates, then creates the release crate
2. **Skip processing** (`--skip-subcrate-processing`): Creates the release crate without processing subcrates. But the sub-crates would need to be later linked and the top-level ro-crate is missing aggreagated metrics.

## What `process_all_subcrates` Does

Located in `src/fairscape_cli/utils/build_utils.py`, this function performs five steps on each subcrate found in the release directory:

| Step | Function | Description |
|------|----------|-------------|
| 1 | `process_link_inverses()` | Adds OWL inverse properties using the EVI ontology |
| 2 | `process_add_io()` | Calculates and adds `EVI:inputs` and `EVI:outputs` to the root dataset |
| 3 | `process_evidence_graph()` | Generates provenance graph JSON and HTML visualization |
| 4 | `process_croissant()` | Converts RO-Crate metadata to Croissant JSON-LD format |
| 5 | `process_preview()` | Generates `ro-crate-preview.html` for browser viewing |

## Mapping to CLI Commands

Each processing step can be executed individually using existing CLI commands:

| Processing Step | Equivalent CLI Command |
|-----------------|------------------------|
| `process_link_inverses()` | `fairscape augment link-inverses <rocrate-path>` |
| `process_add_io()` | `fairscape augment add-io <rocrate-path>` |
| `process_evidence_graph()` | `fairscape build evidence-graph <rocrate-path> <ark-id>` |
| `process_croissant()` | `fairscape build croissant <rocrate-path>` |
| `process_preview()` | `fairscape build preview <rocrate-path>` |

**All-in-one command:** Use `fairscape build subcrate <path>` to run all five steps on a single subcrate.

## Supported Workflows

### Workflow 1: Subcrates First, Then Release (Default)

This is the current default behavior. Subcrates are processed automatically before the release crate is created.

```bash
# Single command handles everything
fairscape build release ./my-release \
--name "My Release" \
--organization-name "My Org" \
--project-name "My Project" \
--description "Release description" \
--keywords "keyword1" --keywords "keyword2"
```

**What happens internally:**
1. `process_all_subcrates()` finds and processes all subcrates in `./my-release`
2. Subcrate metadata is collected (authors, keywords)
3. Release RO-Crate is created with aggregated metadata
4. Subcrates are linked to the release via `LinkSubcrates()`
5. Release-level Croissant and datasheet are generated

### Workflow 2: Release First, Then Subcrates Later

Use this when you need to create the release crate structure first and add/process subcrates afterward.

```bash
# Step 1: Create release crate without processing subcrates
fairscape build release ./my-release \
--name "My Release" \
--organization-name "My Org" \
--project-name "My Project" \
--description "Release description" \
--keywords "keyword1" \
--skip-subcrate-processing

# Step 2: Add subcrates to the release directory
# (manually copy or create subcrate directories)

# Step 3: Process each subcrate (all-in-one command)
fairscape build subcrate ./my-release/subcrate1 --release-directory ./my-release
fairscape build subcrate ./my-release/subcrate2 --release-directory ./my-release

# Or process each step individually if needed:
# fairscape augment link-inverses ./my-release/subcrate1
# fairscape augment add-io ./my-release/subcrate1
# fairscape build evidence-graph ./my-release/subcrate1 <ark-id>
# fairscape build croissant ./my-release/subcrate1
# fairscape build preview ./my-release/subcrate1
```

## Potential Enhancements

### 1. Batch Subcrate Processing Command

A new command to process all subcrates in an existing release:

```bash
fairscape augment subcrates <release-directory>
```

This would call `process_all_subcrates()` on an existing release, enabling:
1. Build release first with `--skip-subcrate-processing`
2. Add subcrates to the release directory
3. Run batch processing on all subcrates

### 2. Re-link Subcrates Command

A command to update the release's `hasPart` references after adding new subcrates:

```bash
fairscape augment link-subcrates <release-directory>
```

This would call `LinkSubcrates()` to update the release metadata with references to any newly added subcrates.

### 3. Combined Post-Processing Command

A single command to both process subcrates and re-link them:

```bash
fairscape augment finalize-release <release-directory>
```

This would:
1. Run `process_all_subcrates()` to process all subcrates
2. Run `LinkSubcrates()` to update release references
3. Regenerate release-level Croissant and datasheet

## Command Reference

### `augment link-inverses`

Adds OWL inverse properties to an RO-Crate based on the EVI ontology.

```bash
fairscape augment link-inverses <rocrate-path> [--ontology-path PATH] [--namespace URI]
```

**Options:**
- `--ontology-path`: Custom OWL ontology file (defaults to bundled `evi.xml`)
- `--namespace`: Primary namespace URI for property keys (defaults to EVI namespace)

### `augment add-io`

Calculates and adds `EVI:inputs` and `EVI:outputs` to the root dataset.

```bash
fairscape augment add-io <rocrate-path> [--verbose]
```

**Inputs are:**
- All `EVI:Sample` entities
- Datasets referenced in `usedDataset` that were not generated by a computation
- Datasets referenced in `usedDataset` but not defined in the `@graph`

**Outputs are:**
- All datasets that were not used by any computation

### `build evidence-graph`

Generates a provenance graph for a specific ARK identifier.

```bash
fairscape build evidence-graph <rocrate-path> <ark-id> [--output-file PATH]
```

**Outputs:**
- `provenance-graph.json`: JSON representation of the evidence graph
- `provenance-graph.html`: Interactive HTML visualization

### `build croissant`

Converts an RO-Crate to Croissant JSON-LD format.

```bash
fairscape build croissant <rocrate-path> [--output PATH]
```

**Output:**
- `croissant.json` (or custom path): Croissant-formatted metadata

### `build preview`

Generates a lightweight HTML preview for an RO-Crate.

```bash
fairscape build preview <rocrate-path> [--published]
```

**Options:**
- `--published`: Indicate if the crate is published (affects link rendering)

**Output:**
- `ro-crate-preview.html`: Browser-viewable summary of the crate

### `build subcrate`

Processes a single subcrate with all augmentation and build steps. This is the recommended command for processing individual subcrates.

```bash
fairscape build subcrate <subcrate-path> [--release-directory PATH] [--published]
```

**Options:**
- `--release-directory`: Parent release directory (used for relative paths in evidence graphs)
- `--published`: Indicate if the crate is published

**Steps performed:**
1. Link inverse properties (OWL ontology entailments)
2. Add `EVI:inputs` and `EVI:outputs` to the root dataset
3. Generate evidence graph (JSON + HTML visualization)
4. Generate Croissant export (JSON-LD)
5. Generate preview HTML

**Example:**
```bash
# Process a subcrate within a release
fairscape build subcrate ./my-release/experiment-1 --release-directory ./my-release

# Process a standalone subcrate
fairscape build subcrate ./my-subcrate
```
96 changes: 94 additions & 2 deletions src/fairscape_cli/commands/build_commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@
from fairscape_cli.utils.build_utils import (
process_all_subcrates,
process_croissant,
process_datasheet
process_datasheet,
process_preview,
process_subcrate
)

from fairscape_cli.models import (
Expand Down Expand Up @@ -534,4 +536,94 @@ def build_croissant(ctx, rocrate_path, output):
except Exception as e:
click.echo(f"ERROR: Failed to convert RO-Crate to Croissant: {e}", err=True)
traceback.print_exc()
ctx.exit(1)
ctx.exit(1)


@build_group.command('preview')
@click.argument('rocrate-path', type=click.Path(exists=True, path_type=pathlib.Path))
@click.option('--published', is_flag=True, default=False, help="Indicate if the crate is considered published (affects link rendering).")
@click.pass_context
def build_preview_command(ctx, rocrate_path: pathlib.Path, published: bool):
"""
Generate a preview HTML file (ro-crate-preview.html) for an RO-Crate.

This creates a lightweight HTML summary of the RO-Crate that can be
viewed in a browser. Useful for quickly inspecting crate contents.
"""
if rocrate_path.is_dir():
crate_dir = rocrate_path
elif rocrate_path.name == "ro-crate-metadata.json":
crate_dir = rocrate_path.parent
else:
click.echo(f"ERROR: Input path must be an RO-Crate directory or a ro-crate-metadata.json file.", err=True)
ctx.exit(1)

metadata_file = crate_dir / "ro-crate-metadata.json"
if not metadata_file.exists():
click.echo(f"ERROR: Metadata file not found: {metadata_file}", err=True)
ctx.exit(1)

click.echo(f"Generating preview for: {crate_dir}")

if process_preview(crate_dir, published=published):
click.echo(f"Preview generated: {crate_dir / 'ro-crate-preview.html'}")
else:
click.echo("ERROR: Failed to generate preview", err=True)
ctx.exit(1)


@build_group.command('subcrate')
@click.argument('subcrate-path', type=click.Path(exists=True, path_type=pathlib.Path))
@click.option('--release-directory', type=click.Path(exists=True, path_type=pathlib.Path), default=None,
help="Parent release directory (used for relative paths in evidence graphs).")
@click.option('--published', is_flag=True, default=False, help="Indicate if the crate is considered published.")
@click.pass_context
def build_subcrate_command(ctx, subcrate_path: pathlib.Path, release_directory: Optional[pathlib.Path], published: bool):
"""
Process a subcrate with all augmentation and build steps.

This command performs the following steps on a single subcrate:

\b
1. Link inverse properties (OWL ontology entailments)
2. Add EVI:inputs and EVI:outputs to the root dataset
3. Generate evidence graph (JSON + HTML visualization)
4. Generate Croissant export (JSON-LD)
5. Generate preview HTML

Use this command to fully process a subcrate before or after adding it
to a release. This is the individual-crate equivalent of the subcrate
processing that happens during 'build release'.
"""
if subcrate_path.is_dir():
crate_dir = subcrate_path
elif subcrate_path.name == "ro-crate-metadata.json":
crate_dir = subcrate_path.parent
else:
click.echo(f"ERROR: Input path must be an RO-Crate directory or a ro-crate-metadata.json file.", err=True)
ctx.exit(1)

metadata_file = crate_dir / "ro-crate-metadata.json"
if not metadata_file.exists():
click.echo(f"ERROR: Metadata file not found: {metadata_file}", err=True)
ctx.exit(1)

click.echo(f"\n=== Processing subcrate: {crate_dir.name} ===")

results = process_subcrate(crate_dir, release_directory=release_directory, published=published)

# Summary
click.echo(f"\n=== Summary ===")
click.echo(f" Link inverses: {'OK' if results['link_inverses'] else 'FAILED'}")
click.echo(f" Add I/O: {'OK' if results['add_io'] else 'FAILED'}")
click.echo(f" Evidence graph: {'OK' if results['evidence_graph'] else 'SKIPPED/FAILED'}")
click.echo(f" Croissant: {'OK' if results['croissant'] else 'FAILED'}")
click.echo(f" Preview: {'OK' if results['preview'] else 'FAILED'}")

if results['errors']:
click.echo(f"\nErrors encountered:")
for error in results['errors']:
click.echo(f" - {error}")
ctx.exit(1)
else:
click.echo(f"\nSubcrate processing completed successfully.")
Loading
Loading