🔎 search module¶
🧿 pupyl
Pupyl is a really fast image search library which you can index your own (millions of) images and find similar images in milliseconds.
- class pupyl.search.PupylImageSearch(data_dir=None, extreme_mode=True, **kwargs)¶
Encapsulates every aspects of
pupyl, from feature extraction to indexing and image storaging.- __init__(data_dir=None, extreme_mode=True, **kwargs)¶
Pupyl image search factory.
- Parameters:
data_dir (
str) – The directory where all assets are stored.extreme_mode (
bool) – If should the extreme mode (faster execution but more memory consumption) be enabled or not.import_images (
bool) – If images should (or was) imported into the database.characteristic (
Enumorintorstr) – The characteristic for feature extraction that must be used. If reading from an already created database, retrieves it from the (internal) configuration files. It supports retrieval by theCharacteristics(enum), by itsname(str) or by itsvalue(int). For more information, see 📊 features module.
Notes
A
characteristicdefines a feature extractor, with its own complexity and balance between search precision and indexing speed. Below are the description of every possiblecharacteristic, indexing the dataset https://github.com/policratus/pupyl/raw/main/samples/images.tar.xz, containing 594 images.Name
Value
Network
Speed (GPU)
MINIMUMWEIGHT_FAST_SMALL_PRECISION
1
MobileNet
19 s
LIGHTWEIGHT_FAST_SMALL_PRECISION
2
ResNet50V2
21.2 s
LIGHTWEIGHT_FAST_SHORT_PRECISION
3
ResNet101V2
20.3 s
LIGHTWEIGHT_QUICK_SHORT_PRECISION
4
DenseNet169
32.7 s
MEDIUMWEIGHT_QUICK_GOOD_PRECISION
5
DenseNet201
31.2 s
MIDDLEWEIGHT_QUICK_GOOD_PRECISION
6
InceptionV3
27 s
MIDDLEWEIGHT_SLOW_GOOD_PRECISION
7
Xception
20.2 s
HEAVYWEIGHT_SLOW_GOOD_PRECISION
8
EfficientNetV2M
39.3 s
HEAVYWEIGHT_SLOW_HUGE_PRECISION
9
EfficientNetV2L
1min 1s
All the tests above were under the same circumstances and resources.
By default,
pupylchoosesMINIMUMWEIGHT_FAST_SMALL_PRECISION, the fastest but with not so much accuracy than ‘HEAVYWEIGHT_SLOW_HUGE_PRECISION’, for instance.Examples
from pupyl.search import PupylImageSearch
# Creating a database using the extractor number 3,
# LIGHTWEIGHT_FAST_SHORT_PRECISION, network ResNet101V2
pupyl = PupylImageSearch(characteristic=3)
# Creating a database with extractor ‘HEAVYWEIGHT_SLOW_GOOD_PRECISION’,
# based on EfficientNetV2M
characteristic = ‘HEAVYWEIGHT_SLOW_GOOD_PRECISION’
pupyl = PupylImageSearch(characteristic=characteristic)
- _index_configuration(mode, **kwargs)¶
Loads or saves an index configuration file, if exists.
- Parameters:
mode (
str) – Defines which mode should be used over the configuration file. ‘r’ is for file reading, ‘w’ for writing.feature_size (
int) – The size of the current feature extraction method.
- Returns:
Returns a
dictif an already saved database are found, containing several database configurations orboolTrueif a new configuration file was created, orboolFalseif either a configuration couldn’t be created or loaded.- Return type:
dictorbool
- index(uri, **kwargs)¶
Performs parallel image indexing.
- Parameters:
uri (
str) – Directory or file, or http(s) location.check_unique (
bool) – If, during the index process, imported images should have their unicity verified (to avoid duplicates).
Attention
If
check_uniqueisTrue, consequentely unicity checks will be performed, which creates some overheads on the index process, making it slower.
- remove(index)¶
Removes an indexed image from the storage.
- Parameters:
index (
int) – The image to be deleted, based onindex(internal image identification).
Example
search.remove(12) # Will remove image with
index12 from the storage.
- search(query, top=4, return_metadata=False, return_distances=False)¶
Executes the search for a similar image throughout the database based on the
queryimage.- Parameters:
query (
str) – URI of a image to be used as query.top (
int (optional)(default:4)) – How many results should be returned from the search process.return_metadata (
bool (optional)(default:False)) – If the image results metadata should also be returned.return_distances (
bool (optional)(default:False)) – If the method should return the distances between thequeryimage and other images present in the database.
- Yields:
int,dictortuple– Respectively describing the image index identification that is decresingly (ordered) similar from the query image, adictwith metadata information about this images (case whenreturn_metadata=True) or atuplewhen the method was asked to return the distances but not the image metadata (case whenreturn_metadata=Falseandreturn_distances=True.