@@ -6,258 +6,152 @@ With this set of C++ plugins, an iRODS server can populate an Elasticsearch or O
66
77When a collection is annotated for indexing, all its subcollections and data objects are queued for indexing. If a new data object is added, or a data object is modified, it will be queued for indexing.
88
9- ## Working with Metalnx
9+ ## Installation
1010
11- The Metalnx web application provides a browse and search interface into iRODS. By default, its searches query the iRODS Catalog directly. But, with an additional search endpoint, it can get its results from Elasticsearch or OpenSearch instead.
11+ Install the following plugins for the Indexing Capability:
1212
13- - [ 1. iRODS] ( #irods )
14- - [ 2. Elasticsearch or OpenSearch] ( #elasticsearch-or-opensearch )
15- - [ 3. Indexing Capability] ( #indexing-capability )
16- - [ 4. Pluggable Search Endpoint] ( #pluggable-search-endpoint )
17- - [ 5. Metalnx] ( #metalnx )
13+ - irods-rule-engine-plugin-indexing
14+ - irods-rule-engine-plugin-elasticsearch
1815
19- ![ Indexing with Metalnx ] ( ../images/indexing.png )
16+ The Elasticsearch plugin is only required if you are using Elasticsearch. Other plugins can be installed as new technologies are implemented.
2017
21- ### 1. iRODS
18+ Packages can also be built from source: [ https://github.com/irods/irods_capability_indexing ] ( https://github.com/irods/irods_capability_indexing )
2219
23- requires 4.2.10+
20+ ## Configuration
2421
25- ### 2. Elasticsearch or OpenSearch
22+ Here's an example configuration to enable the Indexing and Elasticsearch plugins. Comments shown are for purposes of explanation and should not be included in an actual configuration. Values shown are the default values.
23+ ``` javascript
24+ " rule_engines" : [
25+ {
26+ " instance_name" : " irods_rule_engine_plugin-indexing-instance" ,
27+ " plugin_name" : " irods_rule_engine_plugin-indexing" ,
28+ " plugin_specific_configuration" : {
29+ // The lower limit for randomly generated delay task intervals.
30+ " minimum_delay_time" : 1 ,
2631
27- #### run
32+ // The upper limit for randomly generated delay task intervals.
33+ " maximum_delay_time" : 30 ,
2834
29- ```
30- $ docker run -d \
31- -p 9200:9200 -p 9300:9300 \
32- -e "discovery.type=single-node" \
33- -v elast_data:/usr/share/elasticsearch/data \
34- docker.elastic.co/elasticsearch/elasticsearch:latest
35- ```
36-
37- #### add indices
38-
39- ```
40- $ curl -X PUT -H'Content-Type: application/json' http://localhost:9200/full_text
41-
42- $ curl -X PUT -H'Content-Type: application/json' http://localhost:9200/full_text/_mapping/text --data '{ "properties" : { "absolutePath" : { "type" : "keyword" }, "data" : { "type" : "text" } } }'
43- ```
44-
45- ```
46- $ curl -X PUT -H'Content-Type: application/json' http://localhost:9200/metadata
47-
48- $ curl -X PUT -H'Content-Type: application/json' http://localhost:9200/metadata/_mapping/text --data-binary "@/home/repos/irods_capability_indexing/es_mapping.json"
49-
50- ```
51-
52- show indices
53-
54- ```
55- $ curl http://localhost:9200/_cat/indices
56- ```
57-
58- ### 3. Indexing Capability
59-
60- requires 4.2.11.0+
61-
62- #### install
63-
64- install from ` packages.irods.org ` :
65-
66- ```
67- $ sudo yum install -y irods-rule-engine-plugin-elasticsearch irods-rule-engine-plugin-indexing
68- ```
69-
70- #### configure
71-
72- add to ` server_config.json ` (note the es_version):
73-
74- ```
75- "rule_engines": [
76- {
77- "instance_name": "irods_rule_engine_plugin-indexing-instance",
78- "plugin_name": "irods_rule_engine_plugin-indexing",
79- "plugin_specific_configuration": {
80- }
81- },
82- {
83- "instance_name": "irods_rule_engine_plugin-elasticsearch-instance",
84- "plugin_name": "irods_rule_engine_plugin-elasticsearch",
85- "plugin_specific_configuration": {
86- "hosts" : ["http://localhost:9200/"],
87- "bulk_count" : 100,
88- "es_version" : "6.x",
89- "read_size" : 4194304
90- }
91- },
92- ]
93- ```
94-
95- #### use
96-
97- create a collection, and designate it for indexing
98-
99- ```
100- $ imkdir indexme
101- $ imeta add -C indexme irods::indexing::index metadata::metadata elasticsearch
102- ```
103-
104- confirm the system is watching that collection
35+ // The maximum number of delay rules allowed to be scheduled for
36+ // a particular collection at a time.
37+ //
38+ // If set to 0, the limit is disabled.
39+ " job_limit_per_collection_indexing_operation" : 1000
40+ }
41+ },
42+ {
43+ " instance_name" : " irods_rule_engine_plugin-elasticsearch-instance" ,
44+ " plugin_name" : " irods_rule_engine_plugin-elasticsearch" ,
45+ " plugin_specific_configuration" : {
46+ // The list of URLs identifying the elasticsearch service.
47+ //
48+ // Important things to keep in mind:
49+ //
50+ // - URLs must contain the port number
51+ // - If TLS communication is desired, the URL must begin with "https"
52+ " hosts" : [
53+ " http://localhost:9200"
54+ ],
55+
56+ // The number of text chunks processed at once for elasticsearch
57+ // full-text indexing.
58+ " bulk_count" : 100 ,
59+
60+ // The size of an individual text chunk for elasticsearch full-text
61+ // indexing.
62+ " read_size" : 4194304 ,
63+
64+ // The absolute path to a TLS certificate used for secure communication
65+ // with elasticsearch. If empty, OS-dependent default paths are used for
66+ // certificates verification.
67+ //
68+ // This option only takes effect for host entries beginning with "https".
69+ " tls_certificate_file" : " " ,
70+
71+ // The encoded basic authentication credentials for elasticsearch. The
72+ // value must match one of the following:
73+ //
74+ // - base64_encode(url_encode(username) + ":" + url_encode(password))
75+ // - base64_encode(username + ":" + password)
76+ //
77+ // This option is not used when empty. Recommended when using TLS, but
78+ // not required.
79+ " authorization_basic_credentials" : " "
80+ }
81+ },
10582
106- ```
107- $ iqstat
108- id name
109- 10018 {"collection-name":"/tempZone/home/rods/indexme","index-name":"metadata","index-type":"metadata","indexer":"elasticsearch","rule-engine-instance-name":"irods_rule_engine_plugin-indexing-instance","rule-engine-operation":"irods_policy_indexing_collection_index","user-name":"rods"}
83+ // ... Previously installed rule engine plugin configs ...
84+ ]
11085```
11186
112- #### confirm
87+ ## Setting up indexing
11388
114- wait for the delay server to wake up a couple times...
89+ ### Create an index
11590
116- confirm elastic has been informed
117-
118- ```
119- $ curl -XGET 'localhost:9200/metadata/_search?pretty' -H 'Content-Type: application/json' -d'{}'
120- {
121- "took" : 2,
122- "timed_out" : false,
123- "_shards" : {
124- "total" : 5,
125- "successful" : 5,
126- "skipped" : 0,
127- "failed" : 0
128- },
129- "hits" : {
130- "total" : 1,
131- "max_score" : 1.0,
132- "hits" : [
133- {
134- "_index" : "metadata",
135- "_type" : "_doc",
136- "_id" : "10016",
137- "_score" : 1.0,
138- "_source" : {
139- "absolutePath" : "/tempZone/home/rods/indexme",
140- "dataSize" : 0,
141- "fileName" : "indexme",
142- "isFile" : false,
143- "lastModifiedDate" : 1635537255,
144- "metadataEntries" : [ ],
145- "mimeType" : "",
146- "parentPath" : "/tempZone/home/rods",
147- "url" : "http://tempZone/home/rods/indexme",
148- "zoneName" : "tempZone"
91+ To create a full-text index, run the following:
92+ ``` bash
93+ curl -X PUT -H ' Content-Type: application/json' http://localhost:9200/full_text_index -d ' {
94+ "mappings": {
95+ "properties": {
96+ "absolutePath": {"type": "keyword"},
97+ "data": {"type": "text"}
98+ }
99+ }
100+ }'
101+ ```
102+
103+ To create a metadata index, run the following:
104+ ``` bash
105+ curl -X PUT -H ' Content-Type: application/json' http://localhost:9200/metadata_index -d ' {
106+ "mappings": {
107+ "properties": {
108+ "url": {"type": "text"},
109+ "zoneName": {"type": "keyword"},
110+ "absolutePath": {"type": "keyword"},
111+ "fileName": {"type": "text" },
112+ "parentPath": {"type": "text"},
113+ "isFile": {"type": "boolean"},
114+ "dataSize": {"type": "long"},
115+ "mimeType": {"type": "keyword"},
116+ "lastModifiedDate": {"type": "date", "format": "epoch_second"},
117+ "metadataEntries": {
118+ "type": "nested",
119+ "properties": {
120+ "attribute": {"type": "keyword"},
121+ "value": {"type": "text"},
122+ "unit": {"type": "keyword"}
149123 }
150124 }
151- ]
125+ }
152126 }
153- }
154- ```
155-
156- ### 4. Pluggable Search Endpoint
157-
158- eventually via docker hub, but for now, unreleased
159-
160- #### configure
161-
162- ```
163- $ git clone https://github.com/irods/metalnx_search_endpoint_elasticsearch
164- $ cd metalnx_search_endpoint_elasticsearch
165- $ cp server.properties.template server.properties
166- ```
167-
168- edit ` es.baseurl ` and ` jwt.secret `
169-
170- ```
171- es.baseurl=http://172.17.0.1:9200
172- jwt.issuer=metalnx
173- jwt.secret=secretsecretsecretsecretsecretsecretsecretsecret
174- jwt.lifetime.seconds=600
175- jwt.algo=HS384
176- project.url.prefix=deprecated
177- ```
178-
179- the ` jwt.secret ` must match the configured value in Metalnx
180-
181- #### build
182-
183- ```
184- $ docker build -t metalnx_search_elasticsearch .
185- ```
186-
187- #### run
188-
127+ }'
189128```
190- $ docker run -p 8082:8082 -v $(pwd)/server.properties:/etc/irods-ext/project-and-sample-search.properties metalnx_search_elasticsearch
191- ```
192-
193- ### 5. Metalnx
129+ Note: Properties shown above represent all the currently supported metadata, but can be excluded as desired.
194130
195- 2.6.0+ will work from docker hub.
131+ ### Tag a collection for indexing
196132
197- #### configure
133+ Indexing operates on specific AVUs annotated to iRODS collections. Indexing metadata takes the following form:
198134
199- clone repository
135+ - A: ` irods::indexing::index `
136+ - V: ` <index_name>::<index_type> `
137+ - U: ` <technology `
200138
201- ```
202- $ git clone https://github.com/irods-contrib/metalnx-web
203- $ cd metalnx-web
139+ In order to indicate a collection ` full_text_collection ` for a ` full_text ` index with Elasticsearch, you can annotate metadata to it like this:
140+ ``` bash
141+ imeta set -C full_text_collection irods::indexing::index full_text_index::full_text elasticsearch
204142```
205143
206- copy three configuration files
207-
208- ```
209- $ mkdir metalnx-configuration
210- $ cd metalnx-configuration
211- $ cp ../docker-test-framework/etc/irods-ext/customMetalnxConfig.xml .
212- $ cp ../docker-test-framework/etc/irods-ext/metalnx.properties .
213- $ cp ../docker-test-framework/etc/irods-ext/metalnxConfig.xml .
214- ```
144+ If any data objects exist in ` full_text_collection ` , these will immediately be scheduled for indexing; and from this point forward, any new data objects which are created under ` full_text_collection ` or its sub-collections will be scheduled to be indexed upon creation.
215145
216- update ` metalnx.properties `
146+ The same process applies to ` metadata ` indexes or any other ` index_type ` s which may exist.
217147
218- ```
219- irods.host=172.17.0.1
220- db.url=jdbc:postgresql://db:5432/metalnxdb
221- db.username=metalnxuser
222- db.password=superdupersecret
223- jwt.secret=secretsecretsecretsecretsecretsecretsecretsecret
224- pluggablesearch.endpointRegistryList=http://172.17.0.1:8082/v1
225- pluggablesearch.enabled=true
226- ```
148+ ### Tag a resource for indexing
227149
228- create ` docker-compose.yml `
150+ An administrator may wish to restrict indexing activities to particular resources, for example when automatically ingesting data.
229151
152+ In order to indicate a resource is available for indexing it may be annotated with metadata like so:
153+ ``` bash
154+ imeta add -R < resource_name> irods::indexing::index true
230155```
231- version: '3'
232-
233- services:
234-
235- db:
236- image: postgres:11
237- restart: always
238- environment:
239- POSTGRES_PASSWORD: superdupersecret
240- POSTGRES_USER: metalnxuser
241- POSTGRES_DB: metalnxdb
242156
243- metalnx:
244- image: irods/metalnx:latest
245- restart: always
246- volumes:
247- - ./metalnx-configuration:/etc/irods-ext
248- ports:
249- - 9000:8080
250- ```
251-
252- #### run
253-
254- get docker compose
255-
256- ```
257- $ sudo curl -L "https://github.com/docker/compose/releases/download/latest/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
258- $ sudo chmod +x /usr/local/bin/docker-compose
259- ```
260-
261- ```
262- $ docker-compose up
263- ```
157+ If no resource has this metadata it is assumed that all resources are available for indexing. Should the tag exist on * any* resource in the system, it is assumed that all available resources for indexing are tagged.
0 commit comments