πŸ“¦ facets moduleΒΆ

Hyperspace indexing and operations.

class pupyl.indexer.facets.Index(size, data_dir=None, trees=0.01, volatile=False)ΒΆ

Procedures over multidimensional spaces.

__enter__()ΒΆ

Context opening for an index.

Returns:

Context initialization.

Return type:

self

__exit__(exc_type, exc_val, exc_tb)ΒΆ

Context closing for an index.

__getitem__(position)ΒΆ

Return item at index, supporting negative slicing.

Parameters:

position (int) – The id of desired item to be returned.

Returns:

With indexed tensors.

Return type:

list

Example

index[10] # Returns the 10th item.

index[-1] # Returns the last item.

__init__(size, data_dir=None, trees=0.01, volatile=False)ΒΆ

Indexing tensors operations and approximate nearest neighbours search.

Parameters:
  • size (int) – Shape of unidimensional vectors which will be indexed

  • data_dir (str) – Location where to load or save the index

  • trees (float) – Defines the factor over the number of trees to be created based on the dataset size. Should be a number between 0 and 1.

  • volatile (bool) – If the index will be temporary or not.

Raises:
  • OSError: – When the data_dir parameter is not a directory.

  • NoDataDirForPermanentIndex: – When no data_dir was passed for a permament index.

  • DataDirDefinedForVolatileIndex: – If a data_dir was defined for a volatile index.

  • FileIsNotAnIndex: – When an index was tried to be loaded but it wasn’t a valid file.

__iter__()ΒΆ

Returns an iterable for the index.

Yields:

list – With indexed tensors.

__len__()ΒΆ

Returns how many items are indexed.

Returns:

Describing how many items are indexed.

Return type:

int

Example

len(index) # Will return 10 for an index with 10 elements indexed

__next__()ΒΆ

Iterates over the iterable.

Returns:

With an indexed tensor.

Return type:

list

Raises:

StopIteration: – When the iterable is exhausted.

__weakref__ΒΆ

list of weak references to the object (if defined)

append(tensor, check_unique=False)ΒΆ

Inserts a new tensor at the end of the index.

Attention

Be advised that this operation is linear on the index size (\(O(n)\)).

Parameters:
  • tensor (numpy.ndarray or list) – The tensor to insert into the index.

  • check_unique (bool) – Defines if the append method should verify the existence of a really similar tensor on the current index. In other words, it checks for the unicity of the value.

Warning

Be advised that the unicity check (check_unique=True) creates an overhead on the append process.

Raises:

NullTensorError: – If a null (empty) tensor is passed through.

export_by_group_by(path, top=10, **kwargs)ΒΆ

Export images, grouping them by similarity.

Parameters:
  • path (str) – Place to create the directories and export the images.

  • top (int) – How many similar internal images should be filtered.

  • position (int) – Returns the groups based on a specified position.

export_results(path, similars, keep_ids=False, keep_names=False)ΒΆ

Export internal image at position by copying it to path.

Parameters:
  • path (str) – Place where to export the images.

  • similars (iterable) – Containing image ids to export to path.

  • keep_ids (bool) – If the original ids must be preserved or not.

  • keep_names (bool) – If the original names must be preserved or not.

flush()ΒΆ

Commits an indexer work.

group_by(top=10, **kwargs)ΒΆ

Returns all (or some position) on the index which is similar with each other inside index.

Parameters:
  • top (int) – How many similar internal images should be returned.

  • position (int) – Returns the groups based on a specified position.

Yields:
  • list – If position is defined.

  • dict – Generator with a dictionary containing internal ids as key and a list of similar images as values.

Raises:
  • EmptyIndexError: – If the underlying index is null.

  • TopNegativeOrZero: – If top parameter is zero or below.

index(tensor)ΒΆ

Searchs for the first and most similar image compared to the query image.

Parameters:

tensor (numpy.ndarray or list) – A vector to search for the most similar.

Returns:

Describing the most similar resulting index.

Return type:

int

property index_nameΒΆ

Getter for property index_name.

Returns:

With current index name.

Return type:

str

item(position, top=10, distances=False)ΒΆ

Searchs the index using an internal position

Parameters:
  • position (int) – The item id within index.

  • top (int) – How many similar items should be returned.

  • distances (bool) – If should be returned also the distances between items.

Returns:

  • list of tuples – if distances is True, this list containing pairs of items and distances.

  • list – if distances is False, this list containing similar items.

items()ΒΆ

Returns indexed items.

Yields:

int – With item identification.

items_values()ΒΆ

Returns all items and values.

Yields:

tuple – With an int representing its id and a list with the actual tensor.

property pathΒΆ

Getter for property path.

Returns:

With the path set.

Return type:

str

pop(position=None)ΒΆ

Pops-out the index at position, returning it.

Attention

Be advised that this operation is linear on the index size (\(O(n)\)).

Parameters:

position (int) – Removes and returns the value at position.

Returns:

With the popped-out item.

Return type:

int

refresh()ΒΆ

Updates all information regarding the index file, first unloading it, followed by reloading back the index.

remove(position)ΒΆ

Removes the tensor at position from the database.

Attention

Be advised that this operation is linear on the index size (\(O(n)\)).

Parameters:

position (int) – The index that must be removed.

Raises:
  • IndexNotBuildYet: – If was tried to remove a tensor from a not built yet index file.

  • IndexError: – If position is bigger than the index current size.

remove_feature_cache(index)ΒΆ

Removes a feature cache used during an indexing process.

Parameters:

index (int) – index associated to a cache marked for removal.

search(tensor, results=16, return_distances=False)ΒΆ

Searchs for the most similar images compared to the query image (or with increasing distances).

Parameters:
  • tensor (numpy.ndarray or list) – A vector to search for the most similar ones.

  • results (int (optional)(default: 16)) – How many results to return. If similar images are less than results, it exhausts and will be returned current total results.

  • return_distances (bool (optional)(default: False)) – If the distances between tensors should be returned or not.

Yields:

int or tuple – An int representing the index of the most similar, the second one and so on or a tuple (in the case of return_distances=True), where the first element is the int representing the most similar index and a float with the distance between tensor and other tensors already indexed.

property sizeΒΆ

Getter for property size.

Returns:

Describing the size of a ANN tree.

Return type:

int

property treesΒΆ

Getter for property trees.

Returns:

With the factor over the index size to make trees.

Return type:

float

values()ΒΆ

Returns indexed values.

Yields:

list – With indexed tensors.

property volatileΒΆ

Getter for property volatile.

Returns:

If the index is volatile or not.

Return type:

bool