be.hogent.tarsos.lsh
Class LSH

java.lang.Object
  extended by be.hogent.tarsos.lsh.LSH

public class LSH
extends java.lang.Object

Implements a Locality Sensitive Hash scheme.

Author:
Joren Six

Constructor Summary
LSH(java.util.List<Vector> dataset, HashFamily hashFamily)
           
 
Method Summary
 void benchmark(int neighboursSize, DistanceMeasure measure)
          Benchmark the current LSH construction.
static void benchmarkFamilies()
           
 void buildIndex(int numberOfHashes, int numberOfHashTables)
          Build an index by creating a new one and adding each vector.
static java.util.List<Vector> linearSearch(java.util.List<Vector> dataset, Vector query, int resultSize, DistanceMeasure measure)
          Search for the actual nearest neighbours for a query vector using an exhaustive linear search.
static void main(java.lang.String[] args)
           
 java.util.List<Vector> query(Vector query, int neighboursSize)
          Find the nearest neighbours for a query in the index.
static java.util.List<Vector> readDataset(java.lang.String file, int maxSize)
          Read a data set from a text file.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LSH

public LSH(java.util.List<Vector> dataset,
           HashFamily hashFamily)
Method Detail

buildIndex

public void buildIndex(int numberOfHashes,
                       int numberOfHashTables)
Build an index by creating a new one and adding each vector.

Parameters:
numberOfHashes - The number of hashes to use in each hash table.
numberOfHashTables - The number of hash tables to use.

benchmark

public void benchmark(int neighboursSize,
                      DistanceMeasure measure)
Benchmark the current LSH construction.

Parameters:
neighboursSize - the expected size of the neighbourhood.
measure - The measure to use to check for correctness.

query

public java.util.List<Vector> query(Vector query,
                                    int neighboursSize)
Find the nearest neighbours for a query in the index.

Parameters:
query - The query vector.
neighboursSize - The size of the neighbourhood. The returned list length contains the maximum number of elements, or less. Zero elements are possible.
Returns:
A list of nearest neigbours, according to the index. The returned list length contains the maximum number of elements, or less. Zero elements are possible.

linearSearch

public static java.util.List<Vector> linearSearch(java.util.List<Vector> dataset,
                                                  Vector query,
                                                  int resultSize,
                                                  DistanceMeasure measure)
Search for the actual nearest neighbours for a query vector using an exhaustive linear search. For each vector a priority queue is created, the distance between the query and other vectors is used to sort the priority queue. The closest k neighbours show up at the head of the priority queue.

Parameters:
dataset - The data set with a bunch of vectors.
query - The query vector.
resultSize - The k nearest neighbours to find. Returns k vectors if the data set size is larger than k.
measure - The distance measure used to sort the priority queue with.
Returns:
The list of k nearest neighbours to the query vector, according to the given distance measure.

readDataset

public static java.util.List<Vector> readDataset(java.lang.String file,
                                                 int maxSize)
Read a data set from a text file. The file has the following contents, with identifier being an optional string identifying the vector and a list of N coordinates (which should be doubles). This results in an N-dimensional vector.
 [Identifier] coord1 coord2 ... coordN 
 [Identifier] coord1 coord2 ... coordN
 
For example a data set with two elements with 4 dimensions looks like this:
 Hans 12 24 18.5 -45.6 
 Jane 13 19 -12.0 49.8
 

Parameters:
file - The file to read.
maxSize - The maximum number of elements in the data set (even if the file defines more points).
Returns:
a list of vectors, the data set.

main

public static void main(java.lang.String[] args)

benchmarkFamilies

public static void benchmarkFamilies()