📂 file_io module

Operations over files, introspection and more.

class pupyl.duplex.file_io.FileIO

Operations over files.

Handling operations like temporary directories and files, retrieval of remote or local files, progress bars, file metadata, among others.

static _file_scheme_to_path(uri)

Converter from a file:// scheme to a path.

Parameters:

uri (str) – An URI to convert from file:// scheme to a path

Example

FileIO._file_scheme_to_path(file:///home/policratus/1073140.jpg) # Returns '/home/policratus/1073140.jpg'

classmethod _get_local(path)

Loads a local file returning its bytes.

Parameters:

path (str) – Location which the file is saved.

Returns:

With file binary information contained on the file.

Return type:

bytes

static _get_terminal_size()

Returns the number of columns of current terminal.

Returns:

Cointaning the number of columns on the current terminal emulator.

Return type:

int

classmethod _get_url(url, **kwargs)

Loads a file from a remote (http(s)) location.

Parameters:
  • url (str) – The URL where the image are stored.

  • headers (dict) – A header to be passed through the HTTP request. Usually contains a header with an user-agent defined, like {'User-Agent': 'Mozilla/5.0'}

  • info (bool) – Defines if should be returned metadata information from the url, instead of its bytes.

  • retry (int) – Counter for the number of retries already issued.

Returns:

bytes with image binary information or http.client.HTTPMessage with file information (case info is True).

Return type:

bytes or http.client.HTTPMessage

static _infer_protocol(uri)

Discovers the protocol which the passed uri may pertain.

Parameters:

uri (str) – URI that describes the file location.

Returns:

Referencing the discovered protocol

Return type:

Enum

static bind(dump_file, output_dir)

Reads a packaged database and import it.

Parameters:
  • dump_file (str) – The directory containing all database assets.

  • output_dir (str) – Location where to save the export file.

dump(data_dir, output_dir)

Reads an entire database tree, compress and export it.

Parameters:
  • data_dir (str) – The directory containing all database assets.

  • output_dir (str) – Location where to save the exported dump file.

static extension(uri)

Extract extension from uri

Parameters:

uri (str) – URI to extract the file extension.

classmethod get(uri)

Loads a file from specified location, remote or local.

Parameters:

uri (str) – Location where the file are stored.

Returns:

If successful, returns the file bytes, or an Enum describing that the format wasn’t recognized.

Return type:

bytes or Enum

classmethod get_metadata(uri)

Returns underlying file metadata.

Parameters:

uri (str) – Location where the file are stored.

Returns:

Describing several file metadata

Return type:

dict

infer_file_type_from_uri(uri, mimetype=False)

Infers the file type from an uri, with optional mime type discovery.

Parameters:
  • uri (str) – With the file location to be analyzed.

  • mimetype (bool) – If should be returned also the discovered mime type.

Returns:

str if mimetype is False, this case describing the format or tuple if mimetype is True, adding the mimetype to the return.

Return type:

str or tuple

Raises:

FileTypeNotSupportedYet – For a not supported file type.

Example

infer_file_type_from_uri('image.jpg') # Returns 'JPG'

infer_file_type_from_uri('image.jpg, mimetype=True') # Returns ('JPG' , 'image/jpeg')

classmethod progress(iterable, precise=False, message=None)

Utility method to interface process progress bar with users. It supports two way of unpacking the iterable, throughout precise parameters. If precise is set to False (which is the default), the parameter iterable will be unpacked as is. This leads to an imprecise rolling of items (in other words, the method doesn’t know apriori the total number of elements in iterable). Otherwise, if precise is set to True, an iterable which is not unpacked (like a generator) will be first unrolled, which is much slower in some cases, but leads to a precise progress bar.

Parameters:
  • iterable (iter) – Objects which supports iteration.

  • precise (bool) – If the progress should be precise (with actual percentage of completion) or just an interface during process running.

  • message (str) – A custom message when reporting progress.

Yields:

type – It returns any type on the iterable passed through the iterable parameter.

static pupyl_temp_data_dir()

Returns the path of a temporary directory to store pupyl assets.

Returns:

A path containing the underlying temporary directory, found in the current operating system, added with a special directory for saving pupyl assets.

Return type:

str

static resolve_path_end(path)

Removes directory separators from the end of some path (if it exists).

Parameters:

path (str) – Complete path to be analyzed.

Returns:

A path without an ending character.

Return type:

str

static safe_temp_file(**kwargs)

Creates a secure temporary file name, which means a file with an unique name.

If a file with the same name is found, it’s deleted before generating a new unique name.

Parameters:

file_name (str) – Defining a temporary file to assert.

Returns:

With the complete path of the new temporary file created.

Return type:

str

scan(uri)

Returns a validated uri, resolving several cases related to file types and methods for reading it. It also choose the best discovery method.

Parameters:

uri (str) – A file or directory to scan.

Yields:

str – With actual underlying data like bytes internally on the compressed file container.

classmethod scan_compressed_tar_file(uri, file_reader)

Scans a compressed tar file.

Parameters:
  • uri (str) – Location where the tar file is stored.

  • file_reader (str) – Suitable file reader type.

Yields:

str – Paths of the already untarred files on the temporary directory.

classmethod scan_csv(uri)

Scanner for CSV text files.

Parameters:

uri (str) – Where CSV file resides

Yields:

str – With the discovery file paths

Raises:

FileTypeNotSupportedYet – For a not supported file type.

classmethod scan_csv_bzip2(uri)

Scanner for CSV text files, compressed with Bzip2 algorithm.

Parameters:

uri (str) – Where the bzip2 csv file resides.

Yields:

str – With the discovered file paths.

classmethod scan_csv_gzip(uri)

Scanner for CSV formatted text files, compressed with gzip algorithm.

Parameters:

uri (str) – Where csv file resides

Yields:

str – With the discovery file paths

classmethod scan_csv_xz(uri)

Scanner for CSV text files, compressed with Lzma algorithm.

Parameters:

uri (str) – Where csv xz file resides.

Yields:

str – With the discovered file paths.

classmethod scan_csv_zip(uri)

Scanner for CSV text files, compressed with Zip algorithm.

Parameters:

uri (str) – Where zipped csv file resides.

Yields:

str – With the discovered file paths.

static timestamp_to_iso8601(timestamp)

Converts an Unix epoch integer to ISO8601 format. The converted date is in UTC (GMT-0).

Parameters:

timestamp (int) – With a integer timestamp (seconds after the zero hour of 1970).

Returns:

A string with formatted date using the mask %Y-%m-%dT%H:%M:%S

Return type:

str

class pupyl.duplex.file_io.Protocols(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Defines several possible protocol enumerators to be discovered.

Notes

The current supported protocols are:

UNKNOWN: Unknown protocol

HTTP: Hypertext Transfer Protocol (also Secure supported)

FILE: Local storage file.