|
| 1 | + |
| 2 | + |
| 3 | +# cmd.csp.similarity |
| 4 | +[](https://opensource.org/licenses/MIT) |
| 5 | +[](https://GitHub.com/swelcker/cmd.csp.similarity/graphs/commit-activity) |
| 6 | +[](https://GitHub.com/swelcker/cmd.csp.similarity/releases/) |
| 7 | +[](https://GitHub.com/swelcker/cmd.csp.similarity/tags/) |
| 8 | +[](https://GitHub.com/swelcker/cmd.csp.similarity/commit/) |
| 9 | +[](https://GitHub.com/swelcker/cmd.csp.similarity/graphs/contributors/) |
| 10 | + |
| 11 | +A library implementing different string similarity and distance measures for ease of use. A dozen of algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented. |
| 12 | +Used in the Cognitive Service Platform cmd.csp for NLP and classifier part. |
| 13 | + |
| 14 | + |
| 15 | +### Prerequisites |
| 16 | + |
| 17 | +There are no prerequisites. |
| 18 | + |
| 19 | +Included dependencies: |
| 20 | +```xml |
| 21 | +<dependency> |
| 22 | + <groupId>net.jcip</groupId> |
| 23 | + <artifactId>jcip-annotations</artifactId> |
| 24 | + <version>1.0</version> |
| 25 | +</dependency> |
| 26 | +``` |
| 27 | +### Installing/Usage |
| 28 | + |
| 29 | +To use, merge the following into your Maven POM (or the equivalent into your Gradle build script): |
| 30 | + |
| 31 | +```xml |
| 32 | +<repository> |
| 33 | + <id>github</id> |
| 34 | + <name>GitHub swelcker Apache Maven Packages</name> |
| 35 | + <url>https://maven.pkg.github.com/swelcker</url> |
| 36 | +</repository> |
| 37 | + |
| 38 | +<dependency> |
| 39 | + <groupId>cmd.csp</groupId> |
| 40 | + <artifactId>cspsimilarity</artifactId> |
| 41 | + <version>1.0.0</version> |
| 42 | +</dependency> |
| 43 | +``` |
| 44 | + |
| 45 | +Then, import cmd.csp.postagger.*;` in your application : |
| 46 | + |
| 47 | +```java |
| 48 | +// Example |
| 49 | +import cspsimilarity.*; |
| 50 | +... |
| 51 | + private NormalizedLevenshtein engineNL = new NormalizedLevenshtein(); |
| 52 | + private JaroWinkler engineJW = new JaroWinkler(); |
| 53 | + private MetricLCS engineMLCS = new MetricLCS(); |
| 54 | + private NGram engineNGRAM = new NGram(3); |
| 55 | + private Cosine engineCOSINE = new Cosine(9); |
| 56 | + private Jaccard engineJACARD = new Jaccard(9); |
| 57 | + private SorensenDice engineSOREDICE= new SorensenDice(9); |
| 58 | +... |
| 59 | + String source = (sourceText); |
| 60 | + String search = (toSearch); |
| 61 | + |
| 62 | + double sS=0d; |
| 63 | + |
| 64 | + sS=(engineNL.similarity(source, search)); |
| 65 | + sS=(engineJW.similarity(source, search)); |
| 66 | + sS=(1d-engineMLCS.distance(source, search)); |
| 67 | + sS=(1d-engineNGRAM.distance(source, search)); |
| 68 | + sS=(engineCOSINE.similarity(source, search)); |
| 69 | + sS=(engineJACARD.similarity(source, search)); |
| 70 | + sS=(engineSOREDICE.similarity(source, search)); |
| 71 | +``` |
| 72 | + |
| 73 | +## Built With |
| 74 | + |
| 75 | +* [Maven](https://maven.apache.org/) - Dependency Management |
| 76 | + |
| 77 | + |
| 78 | +## Contributing |
| 79 | + |
| 80 | +Please read [CONTRIBUTING.md](https://gist.github.com/PurpleBooth/b24679402957c63ec426) for details on our code of conduct, and the process for submitting pull requests to us. |
| 81 | + |
| 82 | +## Versioning |
| 83 | + |
| 84 | +We use [SemVer](http://semver.org/) for versioning. For the versions available, see the [tags on this repository](https://github.com/swelcker/cmd.csp.similarity/tags). |
| 85 | + |
| 86 | +## Authors |
| 87 | + |
| 88 | +* **Stefan Welcker** - *Modifications based on tdebatty/java-string-similarity* |
| 89 | + |
| 90 | +See also the list of [contributors](https://github.com/swelcker/cmd.csp.stemmer/contributors) who participated in this project. |
| 91 | + |
| 92 | +## License |
| 93 | + |
| 94 | +This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details |
| 95 | + |
0 commit comments