🔎 search module

🧿 pupyl

Pupyl is a really fast image search library which you can index your own (millions of) images and find similar images in milliseconds.

class pupyl.search.PupylImageSearch(data_dir=None, extreme_mode=True, **kwargs)

Encapsulates every aspects of pupyl, from feature extraction to indexing and image storaging.

__init__(data_dir=None, extreme_mode=True, **kwargs)

Pupyl image search factory.

Parameters:
  • data_dir (str) – The directory where all assets are stored.

  • extreme_mode (bool) – If should the extreme mode (faster execution but more memory consumption) be enabled or not.

  • import_images (bool) – If images should (or was) imported into the database.

  • characteristic (Enum or int or str) – The characteristic for feature extraction that must be used. If reading from an already created database, retrieves it from the (internal) configuration files. It supports retrieval by the Characteristics (enum), by its name (str) or by its value (int). For more information, see 📊 features module.

Notes

A characteristic defines a feature extractor, with its own complexity and balance between search precision and indexing speed. Below are the description of every possible characteristic, indexing the dataset https://github.com/policratus/pupyl/raw/main/samples/images.tar.xz, containing 594 images.

Name

Value

Network

Speed (GPU)

MINIMUMWEIGHT_FAST_SMALL_PRECISION

1

MobileNet

19 s

LIGHTWEIGHT_FAST_SMALL_PRECISION

2

ResNet50V2

21.2 s

LIGHTWEIGHT_FAST_SHORT_PRECISION

3

ResNet101V2

20.3 s

LIGHTWEIGHT_QUICK_SHORT_PRECISION

4

DenseNet169

32.7 s

MEDIUMWEIGHT_QUICK_GOOD_PRECISION

5

DenseNet201

31.2 s

MIDDLEWEIGHT_QUICK_GOOD_PRECISION

6

InceptionV3

27 s

MIDDLEWEIGHT_SLOW_GOOD_PRECISION

7

Xception

20.2 s

HEAVYWEIGHT_SLOW_GOOD_PRECISION

8

EfficientNetV2M

39.3 s

HEAVYWEIGHT_SLOW_HUGE_PRECISION

9

EfficientNetV2L

1min 1s

All the tests above were under the same circumstances and resources.

By default, pupyl chooses MINIMUMWEIGHT_FAST_SMALL_PRECISION, the fastest but with not so much accuracy than ‘HEAVYWEIGHT_SLOW_HUGE_PRECISION’, for instance.

Examples

from pupyl.search import PupylImageSearch

# Creating a database using the extractor number 3,

# LIGHTWEIGHT_FAST_SHORT_PRECISION, network ResNet101V2

pupyl = PupylImageSearch(characteristic=3)

# Creating a database with extractor ‘HEAVYWEIGHT_SLOW_GOOD_PRECISION’,

# based on EfficientNetV2M

characteristic = ‘HEAVYWEIGHT_SLOW_GOOD_PRECISION’

pupyl = PupylImageSearch(characteristic=characteristic)

_index_configuration(mode, **kwargs)

Loads or saves an index configuration file, if exists.

Parameters:
  • mode (str) – Defines which mode should be used over the configuration file. ‘r’ is for file reading, ‘w’ for writing.

  • feature_size (int) – The size of the current feature extraction method.

Returns:

Returns a dict if an already saved database are found, containing several database configurations or bool True if a new configuration file was created, or bool False if either a configuration couldn’t be created or loaded.

Return type:

dict or bool

index(uri, **kwargs)

Performs parallel image indexing.

Parameters:
  • uri (str) – Directory or file, or http(s) location.

  • check_unique (bool) – If, during the index process, imported images should have their unicity verified (to avoid duplicates).

Attention

If check_unique is True, consequentely unicity checks will be performed, which creates some overheads on the index process, making it slower.

remove(index)

Removes an indexed image from the storage.

Parameters:

index (int) – The image to be deleted, based on index (internal image identification).

Example

search.remove(12) # Will remove image with index 12 from the storage.

search(query, top=4, return_metadata=False, return_distances=False)

Executes the search for a similar image throughout the database based on the query image.

Parameters:
  • query (str) – URI of a image to be used as query.

  • top (int (optional)(default: 4)) – How many results should be returned from the search process.

  • return_metadata (bool (optional)(default: False)) – If the image results metadata should also be returned.

  • return_distances (bool (optional)(default: False)) – If the method should return the distances between the query image and other images present in the database.

Yields:

int, dict or tuple – Respectively describing the image index identification that is decresingly (ordered) similar from the query image, a dict with metadata information about this images (case when return_metadata=True) or a tuple when the method was asked to return the distances but not the image metadata (case when return_metadata=False and return_distances=True.