πŸ’½ database moduleΒΆ

Operations and storage for images.

class pupyl.storage.database.ImageDatabase(import_images, data_dir=None, **kwargs)ΒΆ

Handling storage and database operations for images.

__getitem__(position)ΒΆ

Returns the item at index.

Parameters:

position (int) – The position inside database to return.

Returns:

With some metadata related to the item.

Return type:

dict

Example

img_db = ImageDatabase(import_images=True, data_dir='pupyl')

img_db[10]

# May return:

{'original_file_name': '2610447919_1b91946bd1.jpg', 'original_path': '/tmp/tmpekd0cuie', 'original_file_size': '52K', 'original_access_time': '2021-06-14T19:07:27', 'id': 10}

__init__(import_images, data_dir=None, **kwargs)ΒΆ

Image storage and operations.

Parameters:
  • import_images (bool) – If images must be imported (copied) to the internal database or not.

  • data_dir (str) – Location to save the image storage files and assets. If a value is ommited for this parameter, will be created a new temporary folder in the underlying (operating system) default temporary directory.

  • bucket_size (int) – Defines the number of files per bucket inside the database. Since each file and associated assets are saved together, splitting up the directories will help avoid issues like Too many files, also allowing read parallelization of assets among others features. In other words, this parameter describes how many image files will be saved on an internal database directory before starting to save to another new one.

  • image_size (tuple) – Defines the dimensions (in pixels, width x height) of saved images on the database. Only has some effect if import_images is True. Case a resize happens, the aspect ratio of the original image will be preserved, hence image_size is an approximation. In other words, the image will be resized to dimensions close to 800x600, but using one pair of dimensions that not offends the image aspect.

Caution

If no value is passed to data_dir, all database assets will be created on the defined temporary directory. By doing this, be advised that all your image search will (probably) vanish after a system reboot. If you don’t want that this happens, please, define a non-volatile data_dir.

__len__()ΒΆ

Return how many items are indexed in the database.

Returns:

Describing how may images are indexed on the current database.

Return type:

int

Example

img_db = ImageDatabase(import_images=True, data_dir='pupyl')

len(img_db) # May return 709

property bucket_sizeΒΆ

Getter for bucket_size property.

Returns:

With how many files per bucket will be stored.

Return type:

int

property image_sizeΒΆ

Getter for image_size property.

Returns:

Describing the internal (approximated) dimensions of each image. If _import_images is undefined, returns by default (800x600).

Return type:

tuple

property import_imagesΒΆ

Getter for import_images property.

Returns:

If images should be imported into the current database or not.

Return type:

bool

insert(index, uri)ΒΆ

Inserts an image into the database.

Parameters:
  • index (int) – The index number attributed to the image.

  • uri (str) – Where the original file is located.

list_images(return_ids=False, top=None)ΒΆ

Returns images on current database.

Parameters:
  • return_ids (bool) – If the method should also return the file ids inside database.

  • top (int) – How many pictures from image database should be listed. Not setting this parameter (which means not referencing it or setting it to zero or below) will return all images in the database.

Yields:

tuple or str – If return_ids=True, a tuple with (int, str) representing respectively the index and the path inside the database will be returned. Otherwise, if return_ids=False, just a str with the full path will return.

load_image(index, as_tensor=False)ΒΆ

Returns the image data at a specified index.

Parameters:
  • index (int) – The location of the image inside database.

  • as_tensor (bool) – How to return the image from database: as bytes (as_tensor=False) or as a numpy.ndarray tensor (as_tensor=True)

Returns:

Returns image bytes (as_tensor=False) or numpy.ndarray (as_tensor=True), containing image converted to its tensor representation.

Return type:

bytes or numpy.ndarray

load_image_metadata(index, **kwargs)ΒΆ

Loads the metadata for an image inside the database.

Parameters:
  • index (int) – Regarding the position of some image inside database.

  • filtered (iterable) – Describing which fields to filter (or select) for return.

  • distance (float) – The distance between the tensor represented by index and the query image.

Returns:

Containing the parsed json file.

Return type:

dict

Raises:

IndexError: – When index is not found.

mount_file_name(index, **kwargs)ΒΆ

Creates the full name path that the file will be saved inside database.

Parameters:
  • index (int) – The indexer id associated with the file.

  • extension (str) – Describing the file extension.

Returns:

With the full path inside the database.

Return type:

str

remove(index)ΒΆ

Removes the image at index.

Parameters:

index (int) – The image index to remove from database.

Danger

Use this method with caution. The deleted image cannot be restored. No prompt are shown before deletion.

Attention

Be advised that this operation is linear on the index size (\(O(n)\)). It provokes changes on the current image index, for all indexed images. For instance, if the index at 54 was deleted, every image with index greater than 54 will have the id decreased by one.

save_image_metadata(index, uri)ΒΆ

Stores image metadata information retrieved from the file.

Parameters:
  • index (int) – The index related to the image.

  • uri (str) – Location where the image is stored.

what_bucket(index)ΒΆ

Discovers in what bucket the file should be saved.

Parameters:

index (int) – The index that references an image in the database.

Returns:

With the bucket number that the image is saved.

Return type:

int