-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #391 from apache/refactoring/390-Merge-uimaFIT-mod…
…ules-into-UIMAJ-repository Issue #390: Merge uimaFIT modules into UIMA-J repository
- Loading branch information
Showing
432 changed files
with
44,818 additions
and
515 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# How to contribute to Apache UIMA Java SDK | ||
|
||
Thank you for your intention to contribute to the Apache UIMA Java SDK project. | ||
As an open-source community, we highly appreciate external contributions to our project. | ||
|
||
To make the process smooth for the project *committers* (those who review and accept changes) and *contributors* (those who propose new changes via pull requests), there are a few rules to follow. | ||
|
||
## Contribution Guidelines | ||
|
||
Please check out the [How to get involved](https://uima.apache.org/get-involved.html) to understand how contributions are made. | ||
A detailed list of coding standards can be found at [Apache UIMA Code Conventions](https://uima.apache.org/codeConventions.html) which also contains a list of coding guidelines that you should follow. | ||
For pull requests, there is a [check list](PULL_REQUEST_TEMPLATE.md) with criteria for acceptable contributions. | ||
|
||
## Preparing a Pull Request (PR) | ||
|
||
In order to contribute to the project, you need to create a **pull request**. | ||
This section briefly guides you through the best way of doing this: | ||
|
||
* Before creating a pull request, create an issue in the issue tracker of the project to which | ||
you wish to contribute | ||
* Fork the project on GitHub | ||
* Create a branch based on the branch to which you wish to contribute. Normally, you should create | ||
this branch from the **main** branch of the respective project. In the case you want to fix | ||
a bug in the latest released version, you should consider to branch off the latest maintenance | ||
branch (e.g. **2.4.x**). If you are not sure, ask via the issue you have just created. Do **not** | ||
make changes directly to the master or maintenance branches in your fork. The name of the branch | ||
should be e.g. `feature/UIMA-[ISSUE-NUMBER]-[SHORT-ISSUE-DESCRIPTION]` or `bugfix/UIMA-[ISSUE-NUMBER]-[SHORT-ISSUE-DESCRIPTION]`. | ||
* Now you make changes to your branch. When committing to your branch, use the format shown below | ||
for your commit messages. | ||
``` | ||
[UIMA-<ISSUE-NUMBER>] <ISSUE TITLE> | ||
<EMPTY LINE> | ||
- <CHANGE 1> | ||
- <CHANGE 2> | ||
- ... | ||
``` | ||
* You can create the pull request any time after your first commit. I.e. you do not have to wait | ||
until you are completely finished with your implementation. Creating a pull request early | ||
tells other developers that you are actively working on an issue and facilitates asking questions | ||
about and discussing implementation details. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Enables the "dependency-check" profile |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -157,6 +157,106 @@ for (var anno : cas.<Annotation>select(entityType)) { | |
``` | ||
|
||
|
||
#### Using uimaFIT | ||
|
||
Configuring UIMA components is generally achieved by creating XML descriptor | ||
files which tell the framework at runtime how components should be | ||
instantiated and deployed. These XML descriptor files are very tightly | ||
coupled with the Java implementation of the components they describe. | ||
We have found that it is very difficult to keep the two consistent | ||
with each other especially when code refactoring is very frequent. | ||
uimaFIT provides Java annotations for describing UIMA components which | ||
can be used to directly describe the UIMA components in the code. This | ||
greatly simplifies refactoring a component definition (e.g. changing a | ||
configuration parameter name). It also makes it possible to generate | ||
XML descriptor files as part of the build cycle rather than being | ||
performed manually in parallel with code creation. uimaFIT also makes | ||
it easy to instantiate UIMA components without using XML descriptor | ||
files at all by providing a number of convenience factory methods | ||
which allow programmatic/dynamic instantiation of UIMA components. | ||
This makes uimaFIT an ideal library for testing UIMA components | ||
because the component can be easily instantiated and invoked without | ||
requiring a descriptor file to be created first. uimaFIT is also | ||
helpful in research environments in which programmatic/dynamic | ||
instantiation of a pipeline can simplify experimentation. For example, | ||
when performing 10-fold cross-validation across a number of | ||
experimental conditions it can be quite laborious to create a | ||
different set of descriptor files for each run or even a script that | ||
generates such descriptor files. uimaFIT is type system agnostic and | ||
does not depend on (or provide) a specific type system. | ||
|
||
uimaFIT is a library that provides factories, injection, and testing | ||
utilities for UIMA. The following list highlights some of the features | ||
uimaFIT provides: | ||
|
||
* **Factories:** simplify instantiating UIMA components programmatically | ||
without descriptor files. For example, to instantiate an AnalysisEngine a | ||
call like this could be made: | ||
|
||
AnalysisEngineFactory.createEngine(MyAEImpl.class, myTypeSystem, | ||
paramName1, paramValue2, | ||
paramName2, paramValue2, | ||
...) | ||
|
||
* **Injection:** handles the binding of configuration parameter values to the | ||
corresponding member variables in the analysis engines and handles the binding of | ||
external resources. For example, to bind a configuration parameter just annotate | ||
a member variable with `@ConfigurationParameter`. External resources can likewise | ||
by injected via the `@ExternalResource` annotation. | ||
Then add one line of code to your initialize method: | ||
|
||
ConfigurationParameterInitializer.initialize(this, uimaContext). | ||
|
||
This is handled automatically if you extend the uimaFIT `JCasAnnotator_ImplBase` class. | ||
|
||
* **Testing:** uimaFIT simplifies testing in a number of ways described in the | ||
documentation. By making it easy to instantiate your components without | ||
descriptor files a large amount of difficult-to-maintain and unnecessary XML can | ||
be eliminated from your test code. This makes tests easier to write and | ||
maintain. Also, running components as a pipeline can be accomplished with a | ||
method call like this: | ||
|
||
SimplePipeline.runPipeline(reader, ae1, ..., aeN, consumer1, ... consumerN) | ||
|
||
uimaFIT is a part of the Apache UIMA(TM) project. uimaFIT can only be used in | ||
conjunction with a compatible version of the Java version of the Apache UIMA SDK. | ||
For your convenience, the binary distribution package of uimaFIT includes all | ||
libraries necessary to use uimaFIT. In particular for novice users, it is strongly | ||
advised to obtain a copy of the full UIMA SDK separately. | ||
|
||
uimaFIT is available via Maven Central. If you use Maven for your build | ||
environment, then you can add uimaFIT as a dependency to your pom.xml file with the | ||
following: | ||
|
||
<dependencies> | ||
<dependency> | ||
<groupId>org.apache.uima</groupId> | ||
<artifactId>uimafit-core</artifactId> | ||
<version>3.5.0</version> | ||
</dependency> | ||
</dependencies> | ||
|
||
|
||
**Modules** | ||
- **uimafit-core** - the main uimaFIT module | ||
- **uimafit-cpe** - support for the Collection Processing Engine | ||
(multi-threaded pipelines) | ||
- **uimafit-maven** - a Maven plugin to automatically enhance UIMA components with | ||
uimaFIT metadata and to generate XML descriptors for uimaFIT-enabled components. | ||
- **uimafit-junit** - convenience code facilitating the implementation of UIMA/ | ||
uimaFIT tests in JUnit tests | ||
- **uimafit-assertj** - adds assertions for UIMA/uimaFIT types via the AssertJ | ||
framework | ||
- **uimafit-spring** - an experimental module serving as a proof-of-concept for the | ||
integration of UIMA with the Spring Framework. It is currently not considered | ||
finished and uses invasive reflection in order to patch the UIMA framework such | ||
that it passes all components created by UIMA through Spring to provide for the | ||
wiring of Spring context dependencies. This module is made available for | ||
the adventurous but currently not considered stable, finished, or even a | ||
proper part of the package. E.g. it is not included in the binary | ||
distribution package. | ||
|
||
|
||
#### Building | ||
|
||
To build Apache UIMA, you need at least a Java 17 JDK and a recent Maven 3 version. | ||
|
@@ -238,7 +338,54 @@ XMI format. | |
|
||
The Apache UIMA Java SDK is a Java-based implementation of the [UIMA specification][OASIS-UIMA]. | ||
|
||
#### Support | ||
|
||
Please direct questions to [email protected]. | ||
|
||
#### Reference | ||
|
||
If you use uimaFIT to support academic research, then please consider citing the | ||
following paper as appropriate: | ||
|
||
@InProceedings{ogren-bethard:2009:SETQA-NLP, | ||
author = {Ogren, Philip and Bethard, Steven}, | ||
title = {Building Test Suites for {UIMA} Components}, | ||
booktitle = {Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)}, | ||
month = {June}, | ||
year = {2009}, | ||
address = {Boulder, Colorado}, | ||
publisher = {Association for Computational Linguistics}, | ||
pages = {1--4}, | ||
url = {http://www.aclweb.org/anthology/W/W09/W09-1501} | ||
} | ||
|
||
#### History | ||
|
||
* **Early 2000s:** UIMA was originally developed by IBM as part of research into analyzing unstructured information (like text, audio, and video). It was designed to process large volumes of unstructured data in a scalable way, targeting natural language processing (NLP) applications. | ||
|
||
* **2004:** UIMA was open-sourced allowing for broader use and contributions from outside IBM. | ||
|
||
* **2006:** The UIMA project was accepted into the Apache Incubator, starting the formal process of becoming an Apache project. | ||
|
||
* **2008:** UIMA graduated from the Apache Incubator and became a top-level Apache project, signifying its maturity and active development. | ||
|
||
* **2009:** Apache UIMA-AS (Asynchronous Scaleout) was introduced, enabling distributed and asynchronous processing of UIMA pipelines. | ||
|
||
* **2012:** uimaFIT was contributed to the Apache UIMA project. Apache uimaFIT was formerly known as uimaFIT, which in turn was formerly known as UUTUC. Prior to its contribution, is was collaborative | ||
effort between the Center for Computational Pharmacology at the University of Colorado Denver, the | ||
Center for Computational Language and Education Research at the University of Colorado at Boulder, | ||
and the Ubiquitous Knowledge Processing (UKP) Lab at the Technische Universität Darmstadt. | ||
|
||
* **2013:** UIMA DUCC (Distributed UIMA Cluster Computing) was introduced as a sub-project of Apache UIMA. | ||
|
||
* **2016:** Apache UIMA Ruta (Rule-based Text Annotation) was introduced as an extension, providing a scripting language for rule-based text processing. | ||
|
||
* **2023:** UIMA DUCC and UIMA-AS were retired. | ||
|
||
* **2024:** uimaFIT has been merged into the UIMA Java SDK | ||
|
||
[UIMA]: https://uima.apache.org | ||
[OASIS-UIMA]: https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=uima | ||
[MAVEN-CENTRAL]: https://search.maven.org/search?q=org.apache.uima | ||
[DKPRO-CASSIS]: https://github.com/dkpro/dkpro-cassis | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Enables the "dependency-check" profile |
Oops, something went wrong.