Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
1cfa50d
Add a new field - objectPath and format the class.
taojing2002 May 9, 2025
b5e1625
Parse docid rather than file path.
taojing2002 May 13, 2025
aab8f06
Added the tests for the header doc_id.
taojing2002 May 13, 2025
98f696f
Splitted long statements.
taojing2002 May 13, 2025
e71c01d
Refactoried the code to make the DataManager class support both hashs…
taojing2002 May 14, 2025
de4e435
In the ObjectManagerFactory class, renamed the method from getInstanc…
taojing2002 May 14, 2025
565e45b
Added the junit test for the ObjectmanagerFactory class.
taojing2002 May 14, 2025
12c8dba
Added a new class - LegacyStoreObjManager to handle the Metacat old s…
taojing2002 May 14, 2025
a1d16f4
Added test methods for ObjectManager.
taojing2002 May 15, 2025
23b67a1
Changed the evn variable name.
taojing2002 May 15, 2025
e99406a
Added the method to test the getSystemMetadataByAPI method.
taojing2002 May 15, 2025
bd12080
Env variables overwrites the properies values.
taojing2002 May 15, 2025
707af75
Rewored some log statements.
taojing2002 May 15, 2025
d34348e
Added a new class to test LegacyStoreObjManager.
taojing2002 May 16, 2025
e793993
Added the tests of the failed scenarios of getSystemMetadata and getO…
taojing2002 May 16, 2025
c2984c8
Added the code to handle docid.
taojing2002 May 19, 2025
986d51f
Merge branch 'develop' into feature-222-legacy-metacat-storage
taojing2002 May 19, 2025
85eebc2
Evaluated the exception as ServiceFailure as well.
taojing2002 May 20, 2025
f9a05d0
Added a test python client file.
taojing2002 May 20, 2025
269045a
Fixed a bug in the log statement when the identifier is null.
taojing2002 May 20, 2025
127095d
In the constructor, added the code to initialize the d1node.
taojing2002 May 21, 2025
c1c9533
Added the code to parse the command line.
taojing2002 May 21, 2025
6505045
Remove an unneeded log statement for error.
taojing2002 Jun 11, 2025
253b9ae
Merge pull request #244 from DataONEorg/bug-243-remove-confusing-log
taojing2002 Jun 11, 2025
3b8d77a
shorten the conflict version waiting time to 10 mini-seconds.
taojing2002 Jun 11, 2025
3984d2d
Fixed a typo.
taojing2002 Jun 12, 2025
3f6b77b
Change the default version conflict waiting time to 10 milliseconds.
taojing2002 Jun 12, 2025
4874cd1
Merge pull request #246 from DataONEorg/bug-245-decrease-wait-time-ve…
taojing2002 Jun 12, 2025
ee6673c
Merge branch 'develop' into feature-222-legacy-metacat-storage
taojing2002 Jun 12, 2025
9b7b06c
Excluded the postgresql jar.
taojing2002 Jun 17, 2025
2cb9723
Merge pull request #248 from DataONEorg/bug-247-exclude-postgres
taojing2002 Jun 17, 2025
db6ff56
Added the documentation about switching the storage system.
taojing2002 Jun 18, 2025
0d5e2d0
Fixed a typo.
taojing2002 Jun 20, 2025
f773a27
Fixed a typo.
taojing2002 Jun 20, 2025
3641c95
Changed the name of the env variable to DATAONE_INDEXER_AUTH_TOKEN
taojing2002 Jun 20, 2025
4d6e1b4
Fixed a typo.
taojing2002 Jun 20, 2025
07f8343
Combined the catch clauses into one.
taojing2002 Jun 20, 2025
a8813e9
Rewored a sentence and changed a format of an assignment.
taojing2002 Jun 20, 2025
02cdf8d
If the ObjectManager is for the legacy store, it will use the error l…
taojing2002 Jun 23, 2025
36230ee
Merge pull request #249 from DataONEorg/feature-222-legacy-metacat-st…
taojing2002 Jun 23, 2025
fdde328
version -> 3.1.5; chart 1.3.2
artntek Jun 25, 2025
7e0bec0
release notes
artntek Jun 25, 2025
439e616
Merge pull request #253 from DataONEorg/feature-251-v315-release-prep
artntek Jun 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,14 @@ vhost = / # Used as default for declare / delete / list
for n in $(seq 1 30); do echo $n; rabbitmqadmin -c rmq.conf -N default -U rmq -p $RMQPW publish exchange=testexchange routing_key=testqueue payload="Message: ${n}" --vhost=/; done
```

## Switching the Storage System
The Dataone Indexer can be configured to use different storage systems by setting the environmental
variable `DATAONE_INDEXER_OBJECT_MANAGER_CLASS_NAME`.
By default, this variable is not set, and the indexer uses
`org.dataone.cn.indexer.object.hashstore.HashStoreObjManager`, which enables support for Hashstore.
To use the legacy storage system instead, set the variable to
`org.dataone.cn.indexer.object.legacystore.LegacyStoreObjManager`.

## History

This is a refactored version of the original DataONE [d1_cn_index_processor](https://github.com/DataONEorg/d1_cn_index_processor) that runs
Expand Down
22 changes: 22 additions & 0 deletions RELEASE-NOTES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,27 @@
# dataone-indexer Release Notes

> [!CAUTION]
> **If upgrading from Helm chart v1.2.0 or earlier, note the section entitled:
> `Caution - ENSURE THAT THE RABBITMQ QUEUE IS EMPTY,` [in the release notes for helm chart v1.3.0
> below!](#helm-chart-version-130)**,

## dataone-indexer version 3.1.5 & helm chart version 1.3.2

### Release date: 2025-06-26

### dataone-indexer version 3.1.5

This is a patch release with the following minor fixes and upgrades

- Dataone-indexer can handle legacy Metacat object repository ([Issue #222](https://github.com/DataONEorg/dataone-indexer/issues/222))
- Remove some extra log statements (for version conflict retries) that are confusing to users ([Issue #243](https://github.com/DataONEorg/dataone-indexer/issues/243))
- Indexer performance improvement: Decrease the re-try waiting time for a version conflict error ([Issue #245](https://github.com/DataONEorg/dataone-indexer/issues/245))
- Remove unnecessary dependency on PostrgeSQL jar ([Issue #247](https://github.com/DataONEorg/dataone-indexer/issues/247))

### helm chart version 1.3.2
- Bump indexer App version to 3.1.5


## dataone-indexer version 3.1.4 & helm chart version 1.3.1

### Release date: 2025-05-20
Expand Down
4 changes: 2 additions & 2 deletions helm/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,13 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: "1.3.1"
version: "1.3.2"

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "3.1.4"
appVersion: "3.1.5"

# Chart dependencies
dependencies:
Expand Down
2 changes: 1 addition & 1 deletion helm/config/dataone-indexer.properties
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ dataone.mn.registration.serviceType.url={{ .Values.idxworker.d1_serviceType_url

index.resourcemap.waitingComponent.time={{ default 800 .Values.idxworker.resourcemapWaitMs }}
index.resourcemap.waitingComponent.max.attempts={{ default 25 .Values.idxworker.resourcemapMaxTries }}
index.solr.versionConflict.waiting.time={{ default 1000 .Values.idxworker.solrVerConflictWaitMs }}
index.solr.versionConflict.waiting.time={{ default 10 .Values.idxworker.solrVerConflictWaitMs }}
index.solr.versionConflict.max.attempts={{ default 25000 .Values.idxworker.solrVerConflictMaxTries }}

# Storage properties
Expand Down
2 changes: 1 addition & 1 deletion helm/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ spec:
- name: IDX_JAVA_MEM
value: {{ .Values.idxworker.javaMem | quote }}
{{- end }}
- name: DATAONE_AUTH_TOKEN
- name: DATAONE_INDEXER_AUTH_TOKEN
valueFrom:
secretKeyRef:
name: {{ .Release.Name }}-indexer-token
Expand Down
2 changes: 1 addition & 1 deletion helm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ idxworker:
## @param idxworker.solrVerConflictWaitMs wait time (mS) before indexer grabs a newer version
## of solr doc after a version conflict
##
solrVerConflictWaitMs: 1000
solrVerConflictWaitMs: 10

## @param idxworker.solrVerConflictMaxTries Number of tries to get a newer version of solr doc
## after a version conflict
Expand Down
8 changes: 7 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<modelVersion>4.0.0</modelVersion>
<groupId>org.dataone</groupId>
<artifactId>dataone-index-worker</artifactId>
<version>3.1.4</version>
<version>3.1.5</version>
<packaging>jar</packaging>
<name>dataone-index-worker</name>
<url>http://maven.apache.org</url>
Expand Down Expand Up @@ -306,6 +306,12 @@
<groupId>org.dataone</groupId>
<artifactId>hashstore</artifactId>
<version>1.1.0</version>
<exclusions>
<exclusion>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>uk.org.webcompere</groupId>
Expand Down
37 changes: 23 additions & 14 deletions src/main/java/org/dataone/cn/indexer/IndexWorker.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import java.io.File;
import java.io.IOException;
import java.lang.reflect.InvocationTargetException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
Expand All @@ -18,14 +19,13 @@
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPathExpressionException;

import com.rabbitmq.client.ShutdownSignalException;
import org.apache.commons.codec.EncoderException;
import org.apache.commons.configuration.ConfigurationException;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.solr.client.solrj.SolrServerException;
import org.dataone.cn.indexer.annotation.OntologyModelService;
import org.dataone.cn.indexer.object.ObjectManager;
import org.dataone.cn.indexer.object.ObjectManagerFactory;
import org.dataone.configuration.Settings;
import org.dataone.exceptions.MarshallingException;
import org.dataone.indexer.queue.IndexQueueMessageParser;
Expand Down Expand Up @@ -59,7 +59,7 @@ public class IndexWorker {
//The create is the index task type for the action when a new object was created. So the solr index will be generated.
//delete is the index task type for the action when an object was deleted. So the solr index will be deleted
//sysmeta is the index task type for the action when the system metadata of an existing object was updated.
public final static String CREATE_INDEXT_TYPE = "create";
public final static String CREATE_INDEX_TYPE = "create";
public final static String DELETE_INDEX_TYPE = "delete";
public final static String SYSMETA_CHANGE_TYPE = "sysmeta"; //this handle for resource map only

Expand Down Expand Up @@ -219,7 +219,10 @@ public static void loadAdditionalPropertyFile(String propertyFile) {
* @throws TimeoutException
* @throws ServiceFailure
*/
public IndexWorker() throws IOException, TimeoutException, ServiceFailure {
public IndexWorker()
throws IOException, TimeoutException, ServiceFailure, ClassNotFoundException,
InvocationTargetException, NoSuchMethodException, InstantiationException,
IllegalAccessException {
this(true);
}

Expand All @@ -231,7 +234,9 @@ public IndexWorker() throws IOException, TimeoutException, ServiceFailure {
* @throws TimeoutException
* @throws ServiceFailure
*/
public IndexWorker(Boolean initialize) throws IOException, TimeoutException {
public IndexWorker(Boolean initialize)
throws IOException, TimeoutException, ClassNotFoundException, InvocationTargetException,
NoSuchMethodException, InstantiationException, IllegalAccessException {
String value = System.getenv("KUBERNETES_SERVICE_HOST");
// Java doc says: the string value of the variable, or null if the variable is not defined
// in the system environment
Expand All @@ -243,7 +248,7 @@ public IndexWorker(Boolean initialize) throws IOException, TimeoutException {
initExecutorService();//initialize the executor first
initIndexQueue();
initIndexParsers();
ObjectManager.getInstance();
ObjectManagerFactory.getObjectManager();
OntologyModelService.getInstance();
}
}
Expand Down Expand Up @@ -385,9 +390,11 @@ public void run() {
indexObject(parser, multipleThread);
}
} catch (InvalidRequest e) {
logger.error(
"cannot index the task for identifier " + parser.getIdentifier().getValue()
+ " since " + e.getMessage());
String error = "Cannot index the task for the object since " + e.getMessage();
if (parser.getIdentifier() != null) {
error = error + " with the identifier " + parser.getIdentifier().getValue();
}
logger.error(error);
boolean requeue = false;
rabbitMQchannel.basicReject(envelope.getDeliveryTag(), requeue);
}
Expand Down Expand Up @@ -449,21 +456,22 @@ private void indexObject(IndexQueueMessageParser parser, boolean multipleThread)
Identifier pid = parser.getIdentifier();
String indexType = parser.getIndexType();
int priority = parser.getPriority();
String docId = parser.getDocId();// It can be null.
try {
long threadId = Thread.currentThread().getId();
logger.info("IndexWorker.consumer.indexObject by multiple thread? " + multipleThread
+ ", with the thread id " + threadId
+ " - Received the index task from the index queue with the identifier: "
+ pid.getValue() + " , the index type: " + indexType
+ ", the priority: " + priority);
+ ", the priority: " + priority + ", the docId(can be null): " + docId);
switch (indexType) {
case CREATE_INDEXT_TYPE -> {
case CREATE_INDEX_TYPE -> {
boolean sysmetaOnly = false;
solrIndex.update(pid, sysmetaOnly);
solrIndex.update(pid, sysmetaOnly, docId);
}
case SYSMETA_CHANGE_TYPE -> {
boolean sysmetaOnly = true;
solrIndex.update(pid, sysmetaOnly);
solrIndex.update(pid, sysmetaOnly, docId);
}
case DELETE_INDEX_TYPE -> solrIndex.remove(pid);
default -> throw new InvalidRequest(
Expand All @@ -482,7 +490,8 @@ private void indexObject(IndexQueueMessageParser parser, boolean multipleThread)
ServiceFailure | XPathExpressionException | UnsupportedType | SAXException |
ParserConfigurationException | SolrServerException | MarshallingException |
EncoderException | InterruptedException | IOException | InstantiationException |
IllegalAccessException e) {
IllegalAccessException | ClassNotFoundException | InvocationTargetException |
NoSuchMethodException e) {
logger.error("Cannot index the task for identifier " + pid.getValue()
+ " since " + e.getMessage(), e);
}
Expand Down
Loading
Loading