public class LSH
extends java.lang.Object
Constructor and Description |
---|
LSH(java.util.List<Vector> dataset,
HashFamily hashFamily) |
Modifier and Type | Method and Description |
---|---|
void |
benchmark(int neighboursSize,
DistanceMeasure measure)
Benchmark the current LSH construction.
|
void |
buildIndex(int numberOfHashes,
int numberOfHashTables)
Build an index by creating a new one and adding each vector.
|
static java.util.List<Vector> |
linearSearch(java.util.List<Vector> dataset,
Vector query,
int resultSize,
DistanceMeasure measure)
Search for the actual nearest neighbours for a query vector using an
exhaustive linear search.
|
static void |
main(java.lang.String[] args) |
java.util.List<Vector> |
query(Vector query,
int neighboursSize)
Find the nearest neighbours for a query in the index.
|
static java.util.List<Vector> |
readDataset(java.lang.String file,
int maxSize)
Read a data set from a text file.
|
public LSH(java.util.List<Vector> dataset, HashFamily hashFamily)
public void buildIndex(int numberOfHashes, int numberOfHashTables)
numberOfHashes
- The number of hashes to use in each hash table.numberOfHashTables
- The number of hash tables to use.public void benchmark(int neighboursSize, DistanceMeasure measure)
neighboursSize
- the expected size of the neighbourhood.measure
- The measure to use to check for correctness.public java.util.List<Vector> query(Vector query, int neighboursSize)
query
- The query vector.neighboursSize
- The size of the neighbourhood. The returned list length
contains the maximum number of elements, or less. Zero
elements are possible.public static java.util.List<Vector> linearSearch(java.util.List<Vector> dataset, Vector query, int resultSize, DistanceMeasure measure)
dataset
- The data set with a bunch of vectors.query
- The query vector.resultSize
- The k nearest neighbours to find. Returns k vectors if the
data set size is larger than k.measure
- The distance measure used to sort the priority queue with.public static java.util.List<Vector> readDataset(java.lang.String file, int maxSize)
[Identifier] coord1 coord2 ... coordN [Identifier] coord1 coord2 ... coordNFor example a data set with two elements with 4 dimensions looks like this:
Hans 12 24 18.5 -45.6 Jane 13 19 -12.0 49.8
file
- The file to read.maxSize
- The maximum number of elements in the data set (even if the
file defines more points).public static void main(java.lang.String[] args)