π° Getting startedΒΆ
π¬ InstallationΒΆ
The installation process for pupyl
is pretty easy. Using your preferred (or
even not knowing what is a)
python virtual environment,
you just need to issue:
Installing from pypi (stable versions):
pip install pupyl
Installing from anaconda (stable versions):
conda install -c policratus pupyl
Installing manually from github (development versions):
git clone git@github.com:policratus/pupyl.git
pip install pupyl/
If you encounter any installation issues, please go to troubleshooting.
π IndexingΒΆ
After installing, the next step that you need is to
index your own images (preferably lots of images). pupyl
supports several
ways to import and index images, but the most notably are
from a local directory, from a local or remote references inside a text file,
same as before but from zip, bzip2, gzip, lzma compressed files,
from local or remote tar files containing text files referencing images or
even compressed tar files (using the already mentioned compression methods)
containing images itself or text files referencing images. So, letβs see every
possible way of doing this. Before we begin, first instantiate pupyl
wrapper class:
import os
from pupyl.search import PupylImageSearch
# This will create pupyl database on the current directory, inside a
# directory called 'pupyl'
PUPYL = PupylImageSearch(data_dir=os.path.join(os.curdir, 'pupyl'))
Now, letβs discuss each way of importing images:
From a local directory:
pupyl
can look for images inside a starting directory, extract metadata from each image and recursively tries to find new images on sub-directories indefinitely (or until this search exhausts). So, suppose that exists a directory calledimages
containing images inside it (and with several sub-directories also containing images). To index it, just run thepupyl.search.PupylImageSearch.index()
method:PUPYL.index(os.path.join('path', 'to', 'your', 'images'))
having as a possible result the following:
π Importing images: 709 items. Indexing images: π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦π¦ 67%
From a (remote or local) compressed tar file (images within): Another example is using a
tar
file with a lot of images inside it.pupyl
smart scanner infers the location type (and compression algorithm) and can index directly. For instance, using the same methodpupyl.search.PupylImageSearch.index()
:PUPYL.index( 'https://github.com/policratus/pupyl' '/raw/main/samples/images.tar.xz' )
where this method also supports
http
protocol. For local files, same syntax:PUPYL.index(os.path.join('path', 'to, 'your', 'images.tar.xz'))
, where both cases above uses the
lzma
algorithm, but it would be atar.zip
,tar.gzip
,tar.bzip2
or simply atar
container.From (compressed or not) text files with references to images: Sometimes, images takes too much of storage devices and you just need to have the references (or URIs). For example, consider the following text file:
1http://www.norfolkmills.co.uk/images/Hardingham%20turbine%20Aug1965.jpg 2http://farm4.static.flickr.com/3161/2815856063_0ba82bed8a.jpg 3http://farm1.static.flickr.com/179/456107224_81f6430266.jpg 4http://farm4.static.flickr.com/3603/3569845078_ffebb00ec0.jpg 5http://farm4.static.flickr.com/3286/2945310084_ac9fdf53fa.jpg 6http://farm2.static.flickr.com/1361/816405038_030f573b86.jpg 7http://cimg2.163.com/catchpic/4/48/4823CF83B0B0D7F52BA1B80A9910D59C.jpg 8http://farm1.static.flickr.com/196/504807098_a11aff3acc.jpg 9...
therefore, to read the file above and resolve its references (suppose that the file is called
images.txt
):PUPYL.index(os.path.join('path', 'to', 'your', 'images.txt'))
If the file above is compressed with the already mentioned algorithms (
zip
,gzip
,lzma
,bzip2
), same thing:# For instance, compressed with bzip2 PUPYL.index(os.path.join('path', 'to', 'your', 'images.txt.bz2'))
This method also supports remote files and it goes like this:
# For instance, remote compressed file with gzip PUPYL.index('http://domain.com/images/image_references.gz')
After indexing some images, the next step is searching.
π SearchingΒΆ
Now that you already installed pupyl
and
indexed your own images, itβs time to do some searches. For
this example, please consider the following sample (lzma
compressed) remote
tar
file containing images:
https://github.com/policratus/pupyl/raw/main/samples/images.tar.xz. It
contains 709 jpg
images (stored at pupyl
github repository). The
example below will create pupyl
database on your home folder, inside a
directory called pupyl
:
import os
from pathlib import Path
from pupyl.search import PupylImageSearch
PUPYL = PupylImageSearch(os.path.join(Path.home(), 'pupyl'))
PUPYL.index(
'https://github.com/policratus/pupyl'
'/raw/main/samples/images.tar.xz'
)
Searching is pretty simple, just needing to pick an image (local or remote) URI as a query image (an image that you want to search inside the database to look for other similar images). For this example, consider this beautiful image by @dlanor_s (taken from Unsplash):
Hence, to search the image above on the already indexed sample database, just
use the pupyl.search.PupylImageSearch.search()
method:
QUERY_IMAGE = 'https://images.unsplash.com/photo-1520763185298-1b434c919102'
[*PUPYL.search(QUERY_IMAGE)]
# Here's the simplest result
[427, 473, 129, 346]
, which will yield image ids
regarding the most similar images inside the
database. If you want a more detailed result, just set the parameter
return_metadata
to True
. For instance:
[*PUPYL.search(QUERY_IMAGE, return_metadata=True)]
# The results with image metadata
[{'original_file_name': '4240609837_2a679c2d59.jpg',
'original_path': '/tmp/tmpyrmpbshx',
'original_file_size': '127K',
'original_access_time': '2021-06-17T15:14:19',
'id': 427},
{'original_file_name': '27690205_216ccaac66.jpg',
'original_path': '/tmp/tmpyrmpbshx',
'original_file_size': '52K',
'original_access_time': '2021-06-17T15:14:18',
'id': 473},
{'original_file_name': '405530418_3d186f2a26.jpg',
'original_path': '/tmp/tmpyrmpbshx',
'original_file_size': '82K',
'original_access_time': '2021-06-17T15:14:18',
'id': 129},
{'original_file_name': '4176670899_7633d38542.jpg',
'original_path': '/tmp/tmpyrmpbshx',
'original_file_size': '124K',
'original_access_time': '2021-06-17T15:14:19',
'id': 346}]
π¦ͺ Command line interfaceΒΆ
Do you want to use pupyl
like any another command line tool?
Itβs possible! The pupyl
command line interface (CLI) exposes most of the
internal package behaviors and features to your preferred terminal emulator.
To use it, you must first install the library. After that,
just call pupyl
(or pupyl -h
) see the arguments:
usage: pupyl [-h] [--data_dir DATA_DIR] {index,serve,search,export} ...
π§Ώ Manage pupyl CLI arguments. pupyl is a really fast image search library
which you can index your own (millions of) images and find similar
images in milliseconds.
positional arguments:
{index,serve,search,export}
index indexes images into the database.
serve creates a web interface to interact with the database.
search search inside a database for similar images.
export search inside database, but export result files to a directory.
optional arguments:
-h, --help show this help message and exit
--data_dir DATA_DIR data directory for database assets.
π₯ Contribute to pupyl on https://github.com/policratus/pupyl
Indexing like described on indexing section can be done like this:
# Unix based systems
pupyl --data_dir /path/to/your/data/dir index /path/to/images/
# Windows systems
pupyl -data_dir C:\path\to\your\data\dir index C:\path\to\images\
Searching (described on section searching) is also possible through CLI and it goes like this:
# Unix based systems
pupyl --data_dir /path/to/your/data/dir search /path/to/image.ext
# Windows systems
pupyl -data_dir C:\path\to\your\data\dir index C:\path\to\images.ext
Every single option on CLI has some features and flags which is not shown here for the sake of simplicity. You are encouraged to test it all.
π Web interfaceΒΆ
pupyl
also has another interface, in this case very visual. Itβs the web
interface, allowing you to interact with a created image database. This
interface looks like this:
Hence, to use the web interface (and using the library directly):
import os
from pupyl.web import interface
interface.serve(data_dir=os.path.join('path', 'to', 'your', 'database'))
or using the command line interface:
# Unix based systems
pupyl --data_dir /path/to/your/data/dir serve
# Windows systems
pupyl -data_dir C:\path\to\your\data\dir serve
Finally, using the same picture described on the searching section and the database created on the section indexing, letβs search and check the results:
Thatβs it! For further information about pupyl
or helping on the development,
please check the github repository.
If you want to donate, go to the
Patreon page, the
Open collective page or the
LFX Crowdfunding.
If you had some idea for the library, please let us know on the
Discussions page.
Please, enjoy π§Ώ pupyl
.