Simple Neighbors API Reference

class simpleneighbors.SimpleNeighbors(dims, metric='angular', backend=None)[source]

A Simple Neighbors index.

This class wraps backend implementations of approximate nearest neighbors indexes with a user-friendly API. When you instantiate this class, it will automatically select a backend implementation based on packages installed in your environment. It is HIGHLY RECOMMENDED that you install Annoy (pip install annoy) to enable the Annoy backend! (The alternatives are slower and not as accurate.) Alternatively, you can specify a backend of your choosing with the backend parameter.

Specify the number of dimensions in your data (i.e., the length of the list or array you plan to provide for each item) and the distance metric you want to use. The default is angular distance, an approximation of cosine distance. This metric is supported by all backends, as is euclidean (for Euclidean distance). Both of these parameters are passed directly to the backend; see the backend documentation for more details.

Parameters:
  • dims – the number of dimensions in your data
  • metric – the distance metric to use
  • backend – the nearest neighbors backend to use (default is annoy)
add_one(item, vector)[source]

Adds an item to the index.

You need to provide the item to add and a vector that corresponds to that item. (For example, if the item is the name of a color, the vector might be a (R, G, B) triplet corresponding to that color. If the item is a word, the vector might be a word2vec or GloVe vector corresponding to that word.

Items can be any hashable Python object. Vectors must be sequences of numbers. (Lists, tuples, and Numpy arrays should all be fine, for example.)

Note: If the index has already been built, you won’t be able to add new items.

Parameters:
  • item – the item to add
  • vector – the vector corresponding to that item
Returns:

None

build(n=10, params=None)[source]

Build the index.

After adding all of your items, call this method to build the index. The meaning of parameter n is different for each backend implementation. For the Annoy backend, it specifies the number of trees in the underlying Annoy index (a higher number will take longer to build but provide more precision when querying). For the Sklearn backend, the number specifies the leaf size when building the ball tree. (The Brute Force Pure Python backend ignores this value entirely.)

After you call build, you’ll no longer be able to add new items to the index.

Parameters:
  • n – backend-dependent (for Annoy: number of trees)
  • params – dictionary with extra parameters to pass to backend
dist(a, b)[source]

Returns the distance between two items.

Parameters:
  • a – first item
  • b – second item
Returns:

distance between a and b

feed(items)[source]

Add multiple items to the index.

Supply to this method a sequence of (item, vector) tuples (e.g., a list of tuples, a generator that produces tuples, etc.) and they’ll all be added to the index. Great for adding multiple items to the index at once.

Items can be any hashable Python object. Vectors must be sequences of numbers. (Lists, tuples, and Numpy arrays should all be fine, for example.)

Parameters:items – a sequence of (item, vector) tuples
Returns:None
classmethod load(prefix)[source]

Restores a previously-saved index.

This class method restores a previously-saved index using the specified file prefix.

Parameters:prefix – prefix used when saving
Returns:SimpleNeighbors object restored from specified files
nearest(vec, n=12)[source]

Returns the items nearest to a given vector.

The specified vector must have the same number of dimensions as the number given when initializing the index. The nearest neighbor search is limited to the given number of items, and results are sorted in order of proximity.

>>> from simpleneighbors import SimpleNeighbors
>>> sim = SimpleNeighbors(2, 'euclidean')
>>> sim.feed([('a', (4, 5)),
...     ('b', (0, 3)),
...     ('c', (-2, 8)),
...     ('d', (2, -2))])
>>> sim.build()
>>> sim.nearest((1, -1), n=1)
['d']
Parameters:
  • vec – search vector
  • n – number of results to return
Returns:

a list of items sorted in order of proximity

nearest_matching(vec, n=12, check=lambda x: True)[source]

Returns the items nearest a given vector that pass a test.

This method looks for the items in the index nearest the given vector that meet a particular criterion. It tries to find at least n items, expanding the search as needed. (It may yield fewer than the desired number if enough items can’t be found in the entire index.)

The function passed as check will be called with a single parameter: the item in question. It should return True if the item should be included in the results, and False otherwise.

This search process might be slow; in order to make it easier to display incremental results, this method returns a generator. You can easily get the results of this method as a list by enclosing your call inside the list() function.

>>> from simpleneighbors import SimpleNeighbors
>>> sim = SimpleNeighbors(2, 'euclidean')
>>> sim.feed([('a', (4, 5)),
...     ('b', (0, 3)),
...     ('c', (-2, 8)),
...     ('d', (2, -2))])
>>> sim.build()
>>> list(sim.nearest_matching((3.5, 4.5), n=1,
...     check=lambda x: ord(x) <= ord('b')))
['a']
Parameters:
  • vec – search vector
  • n – number of items to return
  • check – function to call on each item
Returns:

a generator yielding up to n items

neighbors(item, n=12)[source]

Returns the items nearest another item in the index.

This method returns the items closest to a given item in the index in order of proximity, limiting results to the number specified. (It’s just like nearest() except using the vector of an item already in the corpus.)

>>> from simpleneighbors import SimpleNeighbors
>>> sim = SimpleNeighbors(2, 'euclidean')
>>> sim.feed([('a', (4, 5)),
...     ('b', (0, 3)),
...     ('c', (-2, 8)),
...     ('d', (2, -2))])
>>> sim.build()
>>> sim.neighbors('b', n=3)
['b', 'a', 'c']
Parameters:
  • item – a data item in that has already been added to the index
  • n – the number of items to return
Returns:

a list of items sorted in order of proximity

neighbors_matching(item, n=12, check=None)[source]

Returns the items nearest an indexed item that pass a test.

This method is just like nearest_matching(), but finds items nearest a given item already in the index, instead of an arbitrary vector.

Parameters:
  • item – search item
  • n – number of items to return
  • check – function to call on each item
Returns:

a generator yielding up to n items

save(prefix)[source]

Saves the index to disk.

This method saves the index to disk. Each backend manages serialization a little bit differently: consult the documentation and source code for more details. For example, because Annoy indexes can’t be serialized with pickle, the Annoy backend’s implementation produces two files: the serialized Annoy index, and a pickle with the other data from the object.

This method’s parameter specifies the “prefix” to use for these files.

Parameters:prefix – filename prefix for Annoy index and object data
Returns:None
vec(item)[source]

Returns the vector for an item.

This method returns the vector that was originally provided when indexing the specified item. (Depending on how it was originally specified, they may have been converted to a different data type; e.g., integer vectors are converted to floats.)

Parameters:item – item to lookup
Returns:vector for item