Skip to content

Commit b0ca5b0

Browse files
authored
Forward-merge branch-24.10 into branch-25.02 (nv-morpheus#1980)
## Description * Manually resolving forward merger issues from PR nv-morpheus#1966 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md). - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes.
2 parents 258acf4 + 3882ec1 commit b0ca5b0

File tree

66 files changed

+621
-2904
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+621
-2904
lines changed

.devcontainer/Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ ENV PYTHON_PACKAGE_MANAGER="${PYTHON_PACKAGE_MANAGER}"
9696

9797
ENV SCCACHE_REGION="us-east-2"
9898
ENV SCCACHE_BUCKET="rapids-sccache-devs"
99-
ENV VAULT_HOST="https://vault.ops.k8s.rapids.ai"
99+
ENV AWS_ROLE_ARN="arn:aws:iam::279114543810:role/nv-gha-token-sccache-devs"
100100
ENV HISTFILE="/home/coder/.cache/._bash_history"
101101

102102
ENV MORPHEUS_SUPPORT_DOCA=ON

.devcontainer/cuda12.5-conda/devcontainer.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"args": {
66
"CUDA": "12.5",
77
"PYTHON_PACKAGE_MANAGER": "conda",
8-
"BASE": "rapidsai/devcontainers:24.10-cpp-mambaforge-ubuntu22.04"
8+
"BASE": "rapidsai/devcontainers:24.12-cpp-mambaforge-ubuntu22.04"
99
}
1010
},
1111
"privileged": true,

ci/release/update-version.sh

-1
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,6 @@ sed_runner "s/v${CURRENT_FULL_VERSION}-runtime/v${NEXT_FULL_VERSION}-runtime/g"
9191
examples/digital_fingerprinting/production/docker-compose.yml \
9292
examples/digital_fingerprinting/production/Dockerfile
9393
sed_runner "s/v${CURRENT_FULL_VERSION}-runtime/v${NEXT_FULL_VERSION}-runtime/g" examples/digital_fingerprinting/production/Dockerfile
94-
sed_runner "s|blob/branch-${CURRENT_SHORT_TAG}|blob/branch-${NEXT_SHORT_TAG}|g" examples/digital_fingerprinting/starter/README.md
9594

9695
# examples/developer_guide
9796
sed_runner 's/'"VERSION ${CURRENT_FULL_VERSION}.*"'/'"VERSION ${NEXT_FULL_VERSION}"'/g' \

ci/vale/styles/config/vocabularies/morpheus/accept.txt

+2
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ LLM(s?)
4646
# https://github.com/logpai/loghub/
4747
Loghub
4848
Milvus
49+
PyPI
4950
[Mm]ixin
5051
MLflow
5152
Morpheus
@@ -71,6 +72,7 @@ pytest
7172
[Ss]ubcard(s?)
7273
[Ss]ubgraph(s?)
7374
[Ss]ubword(s?)
75+
[Ss]uperset(s?)
7476
[Tt]imestamp(s?)
7577
[Tt]okenization
7678
[Tt]okenizer(s?)

docs/CMakeLists.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,15 @@ add_custom_target(${PROJECT_NAME}_docs
3030
BUILD_DIR=${CMAKE_CURRENT_BINARY_DIR} ${SPHINX_EXECUTABLE} ${SPHINX_HTML_ARGS} ${SPHINX_SOURCE} ${SPHINX_BUILD}
3131
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
3232
COMMENT "Generating documentation with Sphinx"
33-
DEPENDS morpheus-package-outputs morpheus_llm-package-outputs
33+
DEPENDS morpheus-package-outputs morpheus_llm-package-outputs morpheus_dfp-package-outputs
3434
)
3535

3636
add_custom_target(${PROJECT_NAME}_docs_linkcheck
3737
COMMAND
3838
BUILD_DIR=${CMAKE_CURRENT_BINARY_DIR} ${SPHINX_EXECUTABLE} ${SPHINX_LINKCHECK_ARGS} ${SPHINX_SOURCE} ${SPHINX_LINKCHECK_OUT}
3939
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
4040
COMMENT "Checking documentation links with Sphinx"
41-
DEPENDS morpheus-package-outputs morpheus_llm-package-outputs
41+
DEPENDS morpheus-package-outputs morpheus_llm-package-outputs morpheus_dfp-package-outputs
4242
)
4343

4444
list(POP_BACK CMAKE_MESSAGE_CONTEXT)

docs/source/basics/overview.rst

+1-4
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ The Morpheus CLI is built on the Click Python package which allows for nested co
2727
together. At a high level, the CLI is broken up into two main sections:
2828

2929
* ``run``
30-
* For running AE, FIL, NLP or OTHER pipelines.
30+
* For running FIL, NLP or OTHER pipelines.
3131
* ``tools``
3232
* Tools/Utilities to help set up, configure and run pipelines and external resources.
3333

@@ -58,16 +58,13 @@ run:
5858
--help Show this message and exit.
5959
6060
Commands:
61-
pipeline-ae Run the inference pipeline with an AutoEncoder model
6261
pipeline-fil Run the inference pipeline with a FIL model
6362
pipeline-nlp Run the inference pipeline with a NLP model
6463
pipeline-other Run a custom inference pipeline without a specific model type
6564
6665
6766
Currently, Morpheus pipeline can be operated in four different modes.
6867

69-
* ``pipeline-ae``
70-
* This pipeline mode is used to run training/inference on the AutoEncoder model.
7168
* ``pipeline-fil``
7269
* This pipeline mode is used to run inference on FIL (Forest Inference Library) models such as XGBoost, RandomForestClassifier, etc.
7370
* ``pipeline-nlp``

docs/source/cloud_deployment_guide.md

+3-43
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@ limitations under the License.
3232
- [Verify Model Deployment](#verify-model-deployment)
3333
- [Create Kafka Topics](#create-kafka-topics)
3434
- [Example Workflows](#example-workflows)
35-
- [Run AutoEncoder Digital Fingerprinting Pipeline](#run-autoencoder-digital-fingerprinting-pipeline)
3635
- [Run NLP Phishing Detection Pipeline](#run-nlp-phishing-detection-pipeline)
3736
- [Run NLP Sensitive Information Detection Pipeline](#run-nlp-sensitive-information-detection-pipeline)
3837
- [Run FIL Anomalous Behavior Profiling Pipeline](#run-fil-anomalous-behavior-profiling-pipeline)
@@ -383,10 +382,9 @@ kubectl -n $NAMESPACE exec deploy/broker -c broker -- kafka-topics.sh \
383382

384383
This section describes example workflows to run on Morpheus. Four sample pipelines are provided.
385384

386-
1. AutoEncoder pipeline performing Digital Fingerprinting (DFP).
387-
2. NLP pipeline performing Phishing Detection (PD).
388-
3. NLP pipeline performing Sensitive Information Detection (SID).
389-
4. FIL pipeline performing Anomalous Behavior Profiling (ABP).
385+
1. NLP pipeline performing Phishing Detection (PD).
386+
2. NLP pipeline performing Sensitive Information Detection (SID).
387+
3. FIL pipeline performing Anomalous Behavior Profiling (ABP).
390388

391389
Multiple command options are given for each pipeline, with varying data input/output methods, ranging from local files to Kafka Topics.
392390

@@ -424,44 +422,6 @@ helm install --set ngc.apiKey="$API_KEY" \
424422
morpheus-sdk-client
425423
```
426424

427-
428-
### Run AutoEncoder Digital Fingerprinting Pipeline
429-
The following AutoEncoder pipeline example shows how to train and validate the AutoEncoder model and write the inference results to a specified location. Digital fingerprinting has also been referred to as **HAMMAH (Human as Machine <> Machine as Human)**.
430-
These use cases are currently implemented to detect user behavior changes that indicate a change from a human to a machine or a machine to a human, thus leaving a "digital fingerprint." The model is an ensemble of an autoencoder and fast Fourier transform reconstruction.
431-
432-
Inference and training based on a user ID (`user123`). The model is trained once and inference is conducted on the supplied input entries in the example pipeline below. The `--train_data_glob` parameter must be removed for continuous training.
433-
434-
```bash
435-
helm install --set ngc.apiKey="$API_KEY" \
436-
--set sdk.args="morpheus --log_level=DEBUG run \
437-
--edge_buffer_size=4 \
438-
--pipeline_batch_size=1024 \
439-
--model_max_batch_size=1024 \
440-
pipeline-ae \
441-
--columns_file=data/columns_ae_cloudtrail.txt \
442-
--userid_filter=user123 \
443-
--feature_scaler=standard \
444-
--userid_column_name=userIdentitysessionContextsessionIssueruserName \
445-
--timestamp_column_name=event_dt \
446-
from-cloudtrail --input_glob=/common/models/datasets/validation-data/dfp-cloudtrail-*-input.csv \
447-
--max_files=200 \
448-
train-ae --train_data_glob=/common/models/datasets/training-data/dfp-cloudtrail-*.csv \
449-
--source_stage_class=morpheus.stages.input.cloud_trail_source_stage.CloudTrailSourceStage \
450-
--seed 42 \
451-
preprocess \
452-
inf-pytorch \
453-
add-scores \
454-
timeseries --resolution=1m --zscore_threshold=8.0 --hot_start \
455-
monitor --description 'Inference Rate' --smoothing=0.001 --unit inf \
456-
serialize \
457-
to-file --filename=/common/data/<YOUR_OUTPUT_DIR>/cloudtrail-dfp-detections.csv --overwrite" \
458-
--namespace $NAMESPACE \
459-
<YOUR_RELEASE_NAME> \
460-
morpheus-sdk-client
461-
```
462-
463-
For more information on the Digital Fingerprint use cases, refer to the starter example and a more production-ready example that can be found in the `examples` source directory.
464-
465425
### Run NLP Phishing Detection Pipeline
466426

467427
The following Phishing Detection pipeline examples use a pre-trained NLP model to analyze emails (body) and determine phishing or benign. Here is the sample data as shown below is used to pass as an input to the pipeline.

docs/source/conda_packages.md

+128
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
# Morpheus Conda Packages
19+
The Morpheus stages are the building blocks for creating pipelines. The stages are organized into libraries by use case. The current libraries are:
20+
- `morpheus-core`
21+
- `morpheus-dfp`
22+
- `morpheus-llm`
23+
24+
The libraries are hosted as Conda packages on the [`nvidia`](https://anaconda.org/nvidia/) channel.
25+
26+
The split into multiple libraries allows for a more modular approach to using the Morpheus stages. For example, if you are building an application for Digital Finger Printing, you can install just the `morpheus-dfp` library. This reduces the size of the installed package. It also limits the dependencies eliminating unnecessary version conflicts.
27+
28+
29+
## Morpheus Core
30+
The `morpheus-core` library contains the core stages that are common across all use cases. The Morpheus core library is built from the source code in the `python/morpheus` directory of the Morpheus repository. The core library is installed as a dependency when you install any of the other Morpheus libraries.
31+
To set up a Conda environment with the [`morpheus-core`](https://anaconda.org/nvidia/morpheus-core) library you can run the following commands:
32+
### Create a Conda environment
33+
```bash
34+
export CONDA_ENV_NAME=morpheus
35+
conda create -n ${CONDA_ENV_NAME} python=3.10
36+
conda activate ${CONDA_ENV_NAME}
37+
```
38+
### Add Conda channels
39+
These channel are required for installing the runtime dependencies
40+
```bash
41+
conda config --env --add channels conda-forge &&\
42+
conda config --env --add channels nvidia &&\
43+
conda config --env --add channels rapidsai &&\
44+
conda config --env --add channels pytorch
45+
```
46+
### Install the `morpheus-core` library
47+
```bash
48+
conda install -c nvidia morpheus-core
49+
```
50+
The `morpheus-core` Conda package installs the `morpheus` python package. It also pulls down all the necessary Conda runtime dependencies for the core stages including [`mrc`](https://anaconda.org/nvidia/mrc) and [`libmrc`](https://anaconda.org/nvidia/libmrc).
51+
### Install additional PyPI dependencies
52+
Some of the stages in the core library require additional dependencies that are hosted on PyPI. These dependencies are included as a requirements file in the `morpheus` python package. The requirements files can be located and installed by running the following command:
53+
```bash
54+
MORPHEUS_CORE_PKG_DIR=$(dirname $(python -c "import morpheus; print(morpheus.__file__)"))
55+
pip install -r ${MORPHEUS_CORE_PKG_DIR}/requirements_morpheus_core.txt
56+
```
57+
58+
## Morpheus DFP
59+
Digital Finger Printing (DFP) is a technique used to identify anomalous behavior and uncover potential threats in the environment​. The `morpheus-dfp` library contains stages for DFP. It is built from the source code in the `python/morpheus_dfp` directory of the Morpheus repository. To set up a Conda environment with the [`morpheus-dfp`](https://anaconda.org/nvidia/morpheus-dfp) library you can run the following commands:
60+
### Create a Conda environment
61+
```bash
62+
export CONDA_ENV_NAME=morpheus-dfp
63+
conda create -n ${CONDA_ENV_NAME} python=3.10
64+
conda activate ${CONDA_ENV_NAME}
65+
```
66+
### Add Conda channels
67+
These channel are required for installing the runtime dependencies
68+
```bash
69+
conda config --env --add channels conda-forge &&\
70+
conda config --env --add channels nvidia &&\
71+
conda config --env --add channels rapidsai &&\
72+
conda config --env --add channels pytorch
73+
```
74+
### Install the `morpheus-dfp` library
75+
```bash
76+
conda install -c nvidia morpheus-dfp
77+
```
78+
The `morpheus-dfp` Conda package installs the `morpheus_dfp` python package. It also pulls down all the necessary Conda runtime dependencies including [`morpheus-core`](https://anaconda.org/nvidia/morpheus-core).
79+
### Install additional PyPI dependencies
80+
Some of the DFP stages in the library require additional dependencies that are hosted on PyPI. These dependencies are included as a requirements file in the `morpheus_dfp` python package. And can be installed by running the following command:
81+
```bash
82+
MORPHEUS_DFP_PKG_DIR=$(dirname $(python -c "import morpheus_dfp; print(morpheus_dfp.__file__)"))
83+
pip install -r ${MORPHEUS_DFP_PKG_DIR}/requirements_morpheus_dfp.txt
84+
```
85+
86+
## Morpheus LLM
87+
The `morpheus-llm` library contains stages for Large Language Models (LLM) and Vector Databases. These stages are used for setting up Retrieval Augmented Generation (RAG) pipelines. The `morpheus-llm` library is built from the source code in the `python/morpheus_llm` directory of the Morpheus repository.
88+
To set up a Conda environment with the [`morpheus-llm`](https://anaconda.org/nvidia/morpheus-dfp) library you can run the following commands:
89+
### Create a Conda environment
90+
```bash
91+
export CONDA_ENV_NAME=morpheus-llm
92+
conda create -n ${CONDA_ENV_NAME} python=3.10
93+
conda activate ${CONDA_ENV_NAME}
94+
```
95+
### Add Conda channels
96+
These channel are required for installing the runtime dependencies
97+
```bash
98+
conda config --env --add channels conda-forge &&\
99+
conda config --env --add channels nvidia &&\
100+
conda config --env --add channels rapidsai &&\
101+
conda config --env --add channels pytorch
102+
```
103+
### Install the `morpheus-llm` library
104+
```bash
105+
conda install -c nvidia morpheus-llm
106+
```
107+
The `morpheus-llm` Conda package installs the `morpheus_llm` python package. It also pulls down all the necessary Conda packages including [`morpheus-core`](https://anaconda.org/nvidia/morpheus-core).
108+
### Install additional PyPI dependencies
109+
Some of the stages in the library require additional dependencies that are hosted on PyPI. These dependencies are included as a requirements file in the `morpheus_llm` python package. And can be installed by running the following command:
110+
```bash
111+
MORPHEUS_LLM_PKG_DIR=$(dirname $(python -c "import morpheus_llm; print(morpheus_llm.__file__)"))
112+
pip install -r ${MORPHEUS_LLM_PKG_DIR}/requirements_morpheus_llm.txt
113+
```
114+
115+
## Miscellaneous
116+
### Morpheus Examples
117+
The Morpheus examples are not included in the Morpheus Conda packages. To use them you need to clone the Morpheus repository and run the examples from source. For details refer to the [Morpheus Examples](./examples.md).
118+
119+
### Namespace Updates
120+
If you were using a Morpheus release prior to 24.10 you may need to update the namespace for the DFP, LLM and vector database stages.
121+
122+
A script, `scripts/morpheus_namespace_update.py`, has been provide to help with that and can be run as follows:
123+
```bash
124+
python scripts/morpheus_namespace_update.py --directory <directory> --dfp
125+
```
126+
```bash
127+
python scripts/morpheus_namespace_update.py --directory <directory> --llm
128+
```

docs/source/developer_guide/contributing.md

+37-7
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,35 @@ This workflow utilizes a Docker container to set up most dependencies ensuring a
151151
152152
### Build in a Conda Environment
153153
154-
If a Conda environment on the host machine is preferred over Docker, it is relatively easy to install the necessary dependencies (In reality, the Docker workflow creates a Conda environment inside the container).
154+
If a [Conda](https://docs.conda.io/projects/conda/en/latest/) environment on the host machine is preferred over Docker, it is relatively easy to install the necessary dependencies (In reality, the Docker workflow creates a Conda environment inside the container).
155+
156+
#### Conda Environment YAML Files
157+
Morpheus provides multiple Conda environment files to support different workflows. Morpheus utilizes [rapids-dependency-file-generator](https://pypi.org/project/rapids-dependency-file-generator/) to manage these multiple environment files. All of Morpheus' Conda and [pip](https://pip.pypa.io/en/stable/) dependencies along with the different environments are defined in the `dependencies.yaml` file.
158+
159+
The following are the available Conda environment files, all are located in the `conda/environments` directory, with the following naming convention: `<environment>_<cuda_version>_arch-<architecture>.yaml`.
160+
| Environment | File | Description |
161+
| --- | --- | --- |
162+
| `all` | `all_cuda-125_arch-x86_64.yaml` | All dependencies required to build, run and test Morpheus, along with all of the examples. This is a superset of the `dev`, `runtime` and `examples` environments. |
163+
| `dev` | `dev_cuda-125_arch-x86_64.yaml` | Dependencies required to build, run and test Morpheus. This is a superset of the `runtime` environment. |
164+
| `examples` | `examples_cuda-125_arch-x86_64.yaml` | Dependencies required to run all examples. This is a superset of the `runtime` environment. |
165+
| `model-utils` | `model-utils_cuda-125_arch-x86_64.yaml` | Dependencies required to train models independent of Morpheus. |
166+
| `runtime` | `runtime_cuda-125_arch-x86_64.yaml` | Minimal set of dependencies strictly required to run Morpheus. |
167+
168+
169+
##### Updating Morpheus Dependencies
170+
Changes to Morpheus dependencies can be made in the `dependencies.yaml` file, then run `rapids-dependency-file-generator` to update the individual environment files in the `conda/environments` directory .
171+
172+
Install `rapids-dependency-file-generator` into the base Conda environment:
173+
```bash
174+
conda run -n base --live-stream pip install rapids-dependency-file-generator
175+
```
176+
177+
Then to generate update the individual environment files run:
178+
```bash
179+
conda run -n base --live-stream rapids-dependency-file-generator
180+
```
181+
182+
When ready, commit both the changes to the `dependencies.yaml` file and the updated environment files into the repo.
155183

156184
#### Prerequisites
157185

@@ -170,19 +198,21 @@ If a Conda environment on the host machine is preferred over Docker, it is relat
170198
```bash
171199
git submodule update --init --recursive
172200
```
173-
1. Create the Morpheus Conda environment
201+
1. Create the Morpheus Conda environment using either the `dev` or `all` environment file. Refer to the [Conda Environment YAML Files](#conda-environment-yaml-files) section for more information.
174202
```bash
175203
conda env create --solver=libmamba -n morpheus --file conda/environments/dev_cuda-125_arch-x86_64.yaml
176-
conda activate morpheus
177204
```
205+
or
206+
```bash
207+
conda env create --solver=libmamba -n morpheus --file conda/environments/all_cuda-125_arch-x86_64.yaml
178208
179-
This creates a new environment named `morpheus`, and activates that environment.
209+
```
180210

181-
> **Note**: The `dev_cuda-121_arch-x86_64.yaml` Conda environment file specifies all of the dependencies required to build Morpheus and run Morpheus. However many of the examples, and optional packages such as `morpheus_llm` require additional dependencies. Alternately the following command can be used to create the Conda environment:
211+
This creates a new environment named `morpheus`. Activate the environment with:
182212
```bash
183-
conda env create --solver=libmamba -n morpheus --file conda/environments/all_cuda-121_arch-x86_64.yaml
184213
conda activate morpheus
185214
```
215+
186216
1. Build Morpheus
187217
```bash
188218
./scripts/compile.sh
@@ -345,7 +375,7 @@ Launching a full production Kafka cluster is outside the scope of this project;
345375

346376
### Pipeline Validation
347377

348-
To verify that all pipelines are working correctly, validation scripts have been added at `${MORPHEUS_ROOT}/scripts/validation`. There are scripts for each of the main workflows: Anomalous Behavior Profiling (ABP), Humans-as-Machines-Machines-as-Humans (HAMMAH), Phishing Detection (Phishing), and Sensitive Information Detection (SID).
378+
To verify that all pipelines are working correctly, validation scripts have been added at `${MORPHEUS_ROOT}/scripts/validation`. There are scripts for each of the main workflows: Anomalous Behavior Profiling (ABP), Phishing Detection (Phishing), and Sensitive Information Detection (SID).
349379

350380
To run all of the validation workflow scripts, use the following commands:
351381

0 commit comments

Comments
 (0)