be.hogent.tarsos.lsh
Class Index

java.lang.Object
  extended by be.hogent.tarsos.lsh.Index

public class Index
extends java.lang.Object

The index makes it easy to store vectors and lookup queries efficiently. For the moment the index is stored in memory. It holds a number of hash tables, each with a couple of hashes. Together they can be used for efficient lookup of nearest neighbours.

Author:
Joren Six

Constructor Summary
Index(HashFamily family, int numberOfHashes, int numberOfHashTables)
          Create a new index.
 
Method Summary
 int getNumberOfHashes()
          The number of hashes used in each hash table in the current index.
 int getNumberOfHashTables()
          The number of hash tables used in the current index.
 int getTouched()
          The number of near neighbour candidates that are evaluated during the queries on this index.
 void index(Vector vector)
          Add a vector to the current index.
 java.util.List<Vector> query(Vector query, int maxSize)
          Query for the k nearest neighbours in using the current index.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Index

public Index(HashFamily family,
             int numberOfHashes,
             int numberOfHashTables)
Create a new index.

Parameters:
family - The family of hash functions to use.
numberOfHashes - The number of hashes that are concatenated in each hash table. More concatenated hashes means that less candidates are selected for evaluation.
numberOfHashTables - The number of hash tables in use, recall increases with the number of hash tables. Memory use also increases. Time needed to compute a hash also increases marginally.
Method Detail

index

public void index(Vector vector)
Add a vector to the current index. The hashes are calculated with the current hash family and added in the right place.

Parameters:
vector - The vector to add.

getNumberOfHashTables

public int getNumberOfHashTables()
The number of hash tables used in the current index.

Returns:
The number of hash tables used in the current index.

getNumberOfHashes

public int getNumberOfHashes()
The number of hashes used in each hash table in the current index.

Returns:
The number of hashes used in each hash table in the current index.

query

public java.util.List<Vector> query(Vector query,
                                    int maxSize)
Query for the k nearest neighbours in using the current index. The performance (in computing time and recall/precision) depends mainly on how the current index is constructed and how the underlying data looks.

Parameters:
query - The query vector. The center of the neighbourhood.
maxSize - The maximum number of neighbours to return. Beware, the number of neighbours returned lays between zero and the chosen maximum.
Returns:
A list of nearest neighbours, the number of neighbours returned lays between zero and a chosen maximum.

getTouched

public int getTouched()
The number of near neighbour candidates that are evaluated during the queries on this index. Can be used to calculate the average evaluations per query.

Returns:
The number of near neighbour candidates that are evaluated during the queries on this index.