You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Run AutoEncoder Digital Fingerprinting Pipeline
429
-
The following AutoEncoder pipeline example shows how to train and validate the AutoEncoder model and write the inference results to a specified location. Digital fingerprinting has also been referred to as **HAMMAH (Human as Machine <> Machine as Human)**.
430
-
These use cases are currently implemented to detect user behavior changes that indicate a change from a human to a machine or a machine to a human, thus leaving a "digital fingerprint." The model is an ensemble of an autoencoder and fast Fourier transform reconstruction.
431
-
432
-
Inference and training based on a user ID (`user123`). The model is trained once and inference is conducted on the supplied input entries in the example pipeline below. The `--train_data_glob` parameter must be removed for continuous training.
For more information on the Digital Fingerprint use cases, refer to the starter example and a more production-ready example that can be found in the `examples` source directory.
464
-
465
425
### Run NLP Phishing Detection Pipeline
466
426
467
427
The following Phishing Detection pipeline examples use a pre-trained NLP model to analyze emails (body) and determine phishing or benign. Here is the sample data as shown below is used to pass as an input to the pipeline.
Copy file name to clipboardexpand all lines: docs/source/developer_guide/contributing.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -375,7 +375,7 @@ Launching a full production Kafka cluster is outside the scope of this project;
375
375
376
376
### Pipeline Validation
377
377
378
-
To verify that all pipelines are working correctly, validation scripts have been added at `${MORPHEUS_ROOT}/scripts/validation`. There are scripts for each of the main workflows: Anomalous Behavior Profiling (ABP), Humans-as-Machines-Machines-as-Humans (HAMMAH), Phishing Detection (Phishing), and Sensitive Information Detection (SID).
378
+
To verify that all pipelines are working correctly, validation scripts have been added at `${MORPHEUS_ROOT}/scripts/validation`. There are scripts for each of the main workflows: Anomalous Behavior Profiling (ABP), Phishing Detection (Phishing), and Sensitive Information Detection (SID).
379
379
380
380
To run all of the validation workflow scripts, use the following commands:
Copy file name to clipboardexpand all lines: docs/source/developer_guide/guides/5_digital_fingerprinting.md
+12-43
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ Every account, user, service, and machine has a digital fingerprint that represe
23
23
To construct this digital fingerprint, we will be training unsupervised behavioral models at various granularities, including a generic model for all users in the organization along with fine-grained models for each user to monitor their behavior. These models are continuously updated and retrained over time, and alerts are triggered when deviations from normality occur for any user.
24
24
25
25
## Training Sources
26
-
The data we will want to use for the training and inference will be any sensitive system that the user interacts with, such as VPN, authentication and cloud services. The digital fingerprinting example (`examples/digital_fingerprinting/README.md`) included in Morpheus ingests logs from [AWS CloudTrail](https://docs.aws.amazon.com/cloudtrail/index.html), [Azure Active Directory](https://docs.microsoft.com/en-us/azure/active-directory/reports-monitoring/concept-sign-ins), and [Duo Authentication](https://duo.com/docs/adminapi).
26
+
The data we will want to use for the training and inference will be any sensitive system that the user interacts with, such as VPN, authentication and cloud services. The digital fingerprinting example (`examples/digital_fingerprinting/README.md`) included in Morpheus ingests logs from [Azure Active Directory](https://docs.microsoft.com/en-us/azure/active-directory/reports-monitoring/concept-sign-ins), and [Duo Authentication](https://duo.com/docs/adminapi).
27
27
28
28
The location of these logs could be either local to the machine running Morpheus, a shared file system like NFS, or on a remote store such as [Amazon S3](https://aws.amazon.com/s3/).
29
29
@@ -44,54 +44,23 @@ Adding a new source for the DFP pipeline requires defining five critical pieces:
44
44
1. A [`DataFrameInputSchema`](6_digital_fingerprinting_reference.md#dataframe-input-schema-dataframeinputschema) for the [`DFPFileToDataFrameStage`](6_digital_fingerprinting_reference.md#file-to-dataframe-stage-dfpfiletodataframestage) stage.
45
45
1. A [`DataFrameInputSchema`](6_digital_fingerprinting_reference.md#dataframe-input-schema-dataframeinputschema) for the [`DFPPreprocessingStage`](6_digital_fingerprinting_reference.md#preprocessing-stage-dfppreprocessingstage).
46
46
47
-
## DFP Examples
48
-
The DFP workflow is provided as two separate examples: a simple, "starter" pipeline for new users and a complex, "production" pipeline for full scale deployments. While these two examples both perform the same general tasks, they do so in very different ways. The following is a breakdown of the differences between the two examples.
49
-
50
-
### The "Starter" Example
51
-
52
-
This example is designed to simplify the number of stages and components and provide a fully contained workflow in a single pipeline.
53
-
54
-
Key Differences:
55
-
* A single pipeline which performs both training and inference
56
-
* Requires no external services
57
-
* Can be run from the Morpheus CLI
58
-
59
-
This example is described in more detail in `examples/digital_fingerprinting/starter/README.md`.
60
-
61
-
### The "Production" Example
47
+
## Production Deployment Example
62
48
63
49
This example is designed to illustrate a full-scale, production-ready, DFP deployment in Morpheus. It contains all of the necessary components (such as a model store), to allow multiple Morpheus pipelines to communicate at a scale that can handle the workload of an entire company.
64
50
65
-
Key Differences:
51
+
Key Features:
66
52
* Multiple pipelines are specialized to perform either training or inference
67
-
*Requires setting up a model store to allow the training and inference pipelines to communicate
53
+
*Uses a model store to allow the training and inference pipelines to communicate
68
54
* Organized into a docker-compose deployment for easy startup
69
55
* Contains a Jupyter notebook service to ease development and debugging
70
56
* Can be deployed to Kubernetes using provided Helm charts
71
57
* Uses many customized stages to maximize performance.
72
58
73
59
This example is described in `examples/digital_fingerprinting/production/README.md` as well as the rest of this document.
74
60
75
-
###DFP Features
61
+
## DFP Features
76
62
77
-
#### AWS CloudTrail
78
-
| Feature | Description |
79
-
| ------- | ----------- |
80
-
|`userIdentityaccessKeyId`| for example, `ACPOSBUM5JG5BOW7B2TR`, `ABTHWOIIC0L5POZJM2FF`, `AYI2CM8JC3NCFM4VMMB4`|
81
-
|`userAgent`| for example, `Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 10.0; Trident/5.1)`, `Mozilla/5.0 (Linux; Android 4.3.1) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/62.0.822.0 Safari/536.1`, `Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10 7_0; rv:1.9.4.20) Gecko/2012-06-10 12:09:43 Firefox/3.8`|
82
-
|`userIdentitysessionContextsessionIssueruserName`| for example, `role-g`|
83
-
|`sourceIPAddress`| for example, `208.49.113.40`, `123.79.131.26`, `128.170.173.123`|
84
-
|`userIdentityaccountId`| for example, `Account-123456789`|
85
-
|`errorMessage`| for example, `The input fails to satisfy the constraints specified by an AWS service.`, `The specified subnet cannot be found in the VPN with which the Client VPN endpoint is associated.`, `Your account is currently blocked. Contact [email protected] if you have questions.`|
86
-
|`userIdentitytype`| for example, `FederatedUser`|
87
-
|`eventName`| for example, `GetSendQuota`, `ListTagsForResource`, `DescribeManagedPrefixLists`|
88
-
|`userIdentityprincipalId`| for example, `39c71b3a-ad54-4c28-916b-3da010b92564`, `0baf594e-28c1-46cf-b261-f60b4c4790d1`, `7f8a985f-df3b-4c5c-92c0-e8bffd68abbf`|
89
-
|`errorCode`| for example, success, `MissingAction`, `ValidationError`|
90
-
|`eventSource`| for example, `lopez-byrd.info`, `robinson.com`, `lin.com`|
91
-
|`userIdentityarn`| for example, `arn:aws:4a40df8e-c56a-4e6c-acff-f24eebbc4512`, `arn:aws:573fd2d9-4345-487a-9673-87de888e4e10`, `arn:aws:c8c23266-13bb-4d89-bce9-a6eef8989214`|
92
-
|`apiVersion`| for example, `1984-11-26`, `1990-05-27`, `2001-06-09`|
93
-
94
-
#### Azure Active Directory
63
+
### Azure Active Directory
95
64
| Feature | Description |
96
65
| ------- | ----------- |
97
66
|`appDisplayName`| for example, `Windows sign in`, `MS Teams`, `Office 365` |
@@ -104,14 +73,14 @@ This example is described in `examples/digital_fingerprinting/production/README.
104
73
|`location.countryOrRegion`| country or region name |
105
74
|`location.city`| city name |
106
75
107
-
#####Derived Features
76
+
#### Derived Features
108
77
| Feature | Description |
109
78
| ------- | ----------- |
110
79
|`logcount`| tracks the number of logs generated by a user within that day (increments with every log) |
111
80
|`locincrement`| increments every time we observe a new city (`location.city`) in a user's logs within that day |
112
81
|`appincrement`| increments every time we observe a new app (`appDisplayName`) in a user's logs within that day |
113
82
114
-
####Duo Authentication
83
+
### Duo Authentication
115
84
| Feature | Description |
116
85
| ------- | ----------- |
117
86
|`auth_device.name`| phone number |
@@ -121,7 +90,7 @@ This example is described in `examples/digital_fingerprinting/production/README.
121
90
|`reason`| reason for the results, for example, `User Cancelled`, `User Approved`, `User Mistake`, `No Response` |
122
91
|`access_device.location.city`| city name |
123
92
124
-
#####Derived Features
93
+
#### Derived Features
125
94
| Feature | Description |
126
95
| ------- | ----------- |
127
96
|`logcount`| tracks the number of logs generated by a user within that day (increments with every log) |
@@ -133,16 +102,16 @@ DFP in Morpheus is accomplished via two independent pipelines: training and infe
Copy file name to clipboardexpand all lines: docs/source/getting_started.md
-30
Original file line number
Diff line number
Diff line change
@@ -375,36 +375,6 @@ Commands:
375
375
trigger Buffer data until the previous stage has completed.
376
376
validate Validate pipeline output for testing.
377
377
```
378
-
379
-
And for the AE pipeline:
380
-
381
-
```
382
-
$ morpheus run pipeline-ae --help
383
-
Usage: morpheus run pipeline-ae [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...
384
-
385
-
<Help Paragraph Omitted>
386
-
387
-
Commands:
388
-
add-class Add detected classifications to each message.
389
-
add-scores Add probability scores to each message.
390
-
buffer (Deprecated) Buffer results.
391
-
delay (Deprecated) Delay results for a certain duration.
392
-
filter Filter message by a classification threshold.
393
-
from-azure Source stage is used to load Azure Active Directory messages.
394
-
from-cloudtrail Load messages from a CloudTrail directory.
395
-
from-duo Source stage is used to load Duo Authentication messages.
396
-
inf-pytorch Perform inference with PyTorch.
397
-
inf-triton Perform inference with Triton Inference Server.
398
-
monitor Display throughput numbers at a specific point in the pipeline.
399
-
preprocess Prepare Autoencoder input DataFrames for inference.
400
-
serialize Includes & excludes columns from messages.
401
-
timeseries Perform time series anomaly detection and add prediction.
402
-
to-file Write all messages to a file.
403
-
to-kafka Write all messages to a Kafka cluster.
404
-
train-ae Train an Autoencoder model on incoming data.
405
-
trigger Buffer data until the previous stage has completed.
406
-
validate Validate pipeline output for testing.
407
-
```
408
378
Note: The available commands for different types of pipelines are not the same. This means that the same stage, when used in different pipelines, may have different options. Check the CLI help for the most up-to-date information during development.
0 commit comments