Skip to content

Commit 4c21316

Browse files
committed
feat: set vector as default data pipeline
1 parent bea7a48 commit 4c21316

6 files changed

Lines changed: 59 additions & 53 deletions

File tree

docs/technical_documentation/concepts/pipelines.rst

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,26 @@ Pipelines
33

44
Aspects provide two Pipelines which are detailed below.
55

6+
Vector Pipeline
7+
###############
8+
9+
The Vector pipeline is the default pipeline. It works by capturing the standard output from
10+
the LMS logs and sending them directly to configured "sinks" or data destinations.
11+
It implements two similar pipelines, one for xAPI data and one for tracking logs.
12+
13+
Vector is lighter weight, and generally data will arrive faster.
14+
It can also be a good choice if you want to add other listeners for that data
15+
(ex: to store xAPI statements to S3).
16+
17+
To learn more about Vector, see the `Vector documentation <https://vector.dev/docs/>`_.
18+
19+
To configure Vector as your pipeline, see the :ref:`Quick Start - Vector guide <quick-start-vector>`.
20+
21+
622
Ralph Pipeline
723
##############
824

9-
The Ralph pipeline is the default pipeline, and is the most robust. It will retry the
25+
The Ralph pipeline is an alternative pipeline, and is the most robust. It will retry the
1026
most important failed events, and will catch most duplicates before they hit the database.
1127
This pipeline consist of a plugin in the LMS (`event-routing-backends`) that will send
1228
through HTTP the events to the Ralph API.
@@ -19,18 +35,3 @@ Ralph is for sharing xAPI data using the LRS standard.
1935
To learn more about Ralph, see the `Ralph documentation <https://openfun.github.io/ralph/>`_.
2036

2137
To configure Ralph as your pipeline, see the :ref:`Quick Start - Ralph guide <quick-start-ralph>`.
22-
23-
Vector Pipeline
24-
###############
25-
26-
The Vector pipeline instead works by capturing the standard output from the LMS logs
27-
and sending them directly to configured "sinks" or data destinations. It implements two
28-
similar pipelines, one for xAPI data and one for tracking logs.
29-
30-
Vector is lighter weight, and generally data will arrive a little faster, but doesn’t retry.
31-
It can also be a good choice if you want to add other listeners for that data
32-
(ex: to store xAPI statements to S3).
33-
34-
To learn more about Vector, see the `Vector documentation <https://vector.dev/docs/>`_.
35-
36-
To configure Vector as your pipeline, see the :ref:`Quick Start - Vector guide <quick-start-vector>`.

docs/technical_documentation/concepts/ralph.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,13 @@ Although Ralph has usages such as:
1414
- Validate xAPI statements.
1515
- Store events to different backends `backends <https://openfun.github.io/ralph/latest/features/backends/>`_.
1616

17-
In the aspects project, Ralph is optionally used as the API server that connects Open edX
17+
In the aspects project, Ralph is an optional API server that connects Open edX
1818
and Clickhouse database. Ralph receives the xAPI statements from Open edX and stores them
1919
in the Clickhouse database after validating the data.
2020

21+
To use Ralph as your xAPI transport, you must set ``ASPECTS_XAPI_SOURCE: ralph`` and
22+
``RUN_RALPH: True`` in your Tutor configuration.
23+
2124
By default, Ralph is connected to the Open edX platform via Event Routing Backends without any filter
2225
and receives all the xAPI statements. To learn more about event-routing-backends, please
2326
refer to the `documentation <https://event-routing-backends.readthedocs.io/en/latest/>`_.

docs/technical_documentation/concepts/vector.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Vector
44
******
55

66
Vector is lightweight and ultra-fast tool for building observability pipelines.
7-
In the Aspects project, Vector can optionally be used as a replacement for Ralph to
7+
In the Aspects project, Vector is the default tool used to
88
capture xAPI learner statements in the ClickHouse database, and/or as a way to
99
store raw tracking log statements. It can be used as a general purpose log collector
1010
and forwarder.
@@ -43,6 +43,6 @@ Those tables are controlled by the variables:
4343
4444
ASPECTS_VECTOR_DATABASE: "openedx"
4545
ASPECTS_VECTOR_RAW_TRACKING_LOGS_TABLE: "_tracking"
46-
ASPECTS_VECTOR_RAW_XAPI_TABLE: "xapi_events_all"
46+
ASPECTS_RAW_XAPI_TABLE: "xapi_events_all"
4747
4848
To learn more about Vector, see the `Vector documentation <https://vector.dev/docs/>`_.

docs/technical_documentation/how-tos/production_configuration.rst

Lines changed: 24 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,30 @@ Aspects can be configured to send xAPI events to ClickHouse in several different
1010

1111
At a high level the options are:
1212

13+
Vector (default)
14+
----------------
15+
16+
**Recommended for:** Most deployments, from resource-constrained Tutor local environments to larger production stacks.
17+
18+
Vector is a log forwarding service that monitors the logs from docker containers or Kubernetes pods. It writes events directly to ClickHouse and automatically batches events based on volume. The LMS is configured to transform and log xAPI events in-process and Vector picks them up by reading the logs.
19+
20+
Pros:
21+
22+
- Removes the need to run or scale Ralph
23+
- Automatic batching adjustments
24+
- Fastest delivery times to ClickHouse
25+
- Vector failures do not impact other systems
26+
- Allows to backup and restore data from an S3 compatible backend
27+
28+
Cons:
29+
30+
- It is a new service for most operators
31+
- Events are not de-duplicated before insert, which can result in some (mostly temporary) incorrect data in a disaster recovery
32+
- Disaster recovery hasn't been tested with Aspects yet
33+
- Needs a pod run for every LMS or CMS Kubernetes worker
34+
- When run in-process, adds a small amount of overhead to any LMS request that sends an xAPI statement
35+
36+
1337
Celery tasks without batching (default as of 1.0.0)
1438
---------------------------------------------------
1539

@@ -52,29 +76,6 @@ Cons:
5276
- Batching is not as well tested (as of Redwood) and may have edge cases until it has been used in production
5377

5478

55-
Vector
56-
------
57-
58-
**Recommended for:** Resource-constrained Tutor local environments, experienced operators on larger deployments.
59-
60-
Vector is a log forwarding service that monitors the logs from docker containers or Kubernetes pods. It writes events directly to ClickHouse and automatically batches events based on volume. The LMS can be configured to transform and log xAPI events in-process and Vector will pick them up by reading the logs.
61-
62-
Pros:
63-
64-
- Removes the need to run or scale Ralph
65-
- Automatic batching adjustments
66-
- Fastest delivery times to ClickHouse
67-
- Vector failures do not impact other systems
68-
69-
Cons:
70-
71-
- It is a new service for most operators
72-
- Events are not de-duplicated before insert, which can result in some (mostly temporary) incorrect data in a disaster recovery
73-
- Disaster recovery hasn't been tested with Aspects yet
74-
- Needs a pod run for every LMS or CMS Kubernetes worker
75-
- When run in-process, adds a small amount of overhead to any LMS request that sends an xAPI statement
76-
77-
7879
Event Bus (experimental)
7980
------------------------
8081

docs/technical_documentation/quickstarts/ralph.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,19 @@ Ralph
55

66
Installation instructions for Aspects are available on the plugin site: https://github.com/openedx/tutor-contrib-aspects
77

8-
Ralph is the default option to send xAPI events to Clickhouse. To run it make sure to enable the `RUN_RALPH` option in the `config.yml` file.
8+
Ralph is an alternative option to send xAPI events to Clickhouse, providing full LRS support and deduplication. To use Ralph as your xAPI pipeline, you need to enable it and set it as the source in your `config.yml` file.
99

1010
.. code-block:: yaml
1111
1212
RUN_RALPH: True
13+
ASPECTS_XAPI_SOURCE: ralph
1314
14-
# We recommend only running Ralph or Vector for performance reasons, so
15-
# suggest turning off Vector here
15+
# We recommend only running one transport for performance reasons, so
16+
# suggest turning off Vector if you are using Ralph for xAPI
1617
RUN_VECTOR: False
1718
19+
When ``ASPECTS_XAPI_SOURCE`` is set to ``ralph``, the xAPI data will be stored in the database defined by ``RALPH_DATABASE`` (defaults to ``xapi``).
20+
1821

1922
Aspects provides the following configuration options:
2023

docs/technical_documentation/quickstarts/vector.rst

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,16 @@ Vector
55

66
Installation instructions for Aspects are available on the plugin site: https://github.com/openedx/tutor-contrib-aspects
77

8-
Vector is an alternative option to send xAPI events to Clickhouse. It can be run along with Ralph, but to optimize resources we encourage you to only use one.
9-
10-
To configure Vector as the xAPI event handler, you can use the following configuration:
8+
Vector is the default option to send xAPI events to Clickhouse in Aspects. It is enabled by default with the following settings:
119

1210
.. code-block:: yaml
1311
14-
# Disable ralph
15-
RUN_RALPH: False
16-
# Enable vector
12+
# Default settings
1713
RUN_VECTOR: True
18-
# Change the xAPI database to the one Vector uses
19-
ASPECTS_XAPI_DATABASE: "openedx"
14+
RUN_RALPH: False
15+
ASPECTS_XAPI_SOURCE: vector
16+
17+
When ``ASPECTS_XAPI_SOURCE`` is set to ``vector``, the xAPI data will be stored in the database defined by ``ASPECTS_VECTOR_DATABASE`` (defaults to ``openedx``).
2018

2119

2220
Aspects provides the following configuration options:

0 commit comments

Comments
 (0)