See: Description
| Interface | Description | 
|---|---|
| EditDistance<R> | Interface for Edit Distances. | 
| SimilarityScore<R> | Interface for the concept of a string similarity score. | 
| Class | Description | 
|---|---|
| CosineDistance | Measures the cosine distance between two character sequences. | 
| CosineSimilarity | Measures the Cosine similarity of two vectors of an inner product space and
 compares the angle between them. | 
| EditDistanceFrom<R> | 
 This stores a  EditDistanceimplementation and aCharSequence"left" string. | 
| FuzzyScore | A matching algorithm that is similar to the searching algorithms implemented in editors such
 as Sublime Text, TextMate, Atom and others. | 
| HammingDistance | The hamming distance between two strings of equal length is the number of
 positions at which the corresponding symbols are different. | 
| JaccardDistance | Measures the Jaccard distance of two sets of character sequence. | 
| JaccardSimilarity | Measures the Jaccard similarity (aka Jaccard index) of two sets of character
 sequence. | 
| JaroWinklerDistance | A similarity algorithm indicating the percentage of matched characters between two character sequences. | 
| LevenshteinDetailedDistance | An algorithm for measuring the difference between two character sequences. | 
| LevenshteinDistance | An algorithm for measuring the difference between two character sequences. | 
| LevenshteinResults | Container class to store Levenshtein distance between two character sequences. | 
| LongestCommonSubsequence | A similarity algorithm indicating the length of the longest common subsequence between two strings. | 
| LongestCommonSubsequenceDistance | An edit distance algorithm based on the length of the longest common subsequence between two strings. | 
| SimilarityScoreFrom<R> | 
 This stores a  SimilarityScoreimplementation and aCharSequence"left" string. | 
Provides algorithms for string similarity.
The algorithms that implement the EditDistance interface follow the same simple principle: the more similar (closer) strings are, lower is the distance. For example, the words house and hose are closer than house and trousers.
The following algorithms are available at the moment:
Cosine DistanceCosine SimilarityFuzzy ScoreHamming DistanceJaro-Winkler DistanceLevenshtein DistanceLongest Commons Subsequence DistanceThe Cosine Distance
 utilises a regular expression tokenizer (\w+).
 And the Levenshtein Distance's
 behaviour can be changed to take into consideration a maximum throughput.
Copyright © 2014–2018 The Apache Software Foundation. All rights reserved.