📂 file_io module¶
Operations over files, introspection and more.
- class pupyl.duplex.file_io.FileIO¶
Operations over files.
Handling operations like temporary directories and files, retrieval of remote or local files, progress bars, file metadata, among others.
- static _file_scheme_to_path(uri)¶
Converter from a file:// scheme to a path.
- Parameters:
uri (
str) – An URI to convert from file:// scheme to a path
Example
FileIO._file_scheme_to_path(file:///home/policratus/1073140.jpg)# Returns '/home/policratus/1073140.jpg'
- classmethod _get_local(path)¶
Loads a local file returning its bytes.
- Parameters:
path (
str) – Location which the file is saved.- Returns:
With file binary information contained on the file.
- Return type:
bytes
- static _get_terminal_size()¶
Returns the number of columns of current terminal.
- Returns:
Cointaning the number of columns on the current terminal emulator.
- Return type:
int
- classmethod _get_url(url, **kwargs)¶
Loads a file from a remote (http(s)) location.
- Parameters:
url (
str) – The URL where the image are stored.headers (
dict) – A header to be passed through the HTTP request. Usually contains a header with an user-agent defined, like{'User-Agent': 'Mozilla/5.0'}info (
bool) – Defines if should be returned metadata information from theurl, instead of itsbytes.retry (
int) – Counter for the number of retries already issued.
- Returns:
byteswith image binary information orhttp.client.HTTPMessagewith file information (caseinfoisTrue).- Return type:
bytesorhttp.client.HTTPMessage
- static _infer_protocol(uri)¶
Discovers the protocol which the passed uri may pertain.
- Parameters:
uri (
str) – URI that describes the file location.- Returns:
Referencing the discovered protocol
- Return type:
Enum
- static bind(dump_file, output_dir)¶
Reads a packaged database and import it.
- Parameters:
dump_file (
str) – The directory containing all database assets.output_dir (
str) – Location where to save the export file.
- dump(data_dir, output_dir)¶
Reads an entire database tree, compress and export it.
- Parameters:
data_dir (
str) – The directory containing all database assets.output_dir (
str) – Location where to save the exported dump file.
- static extension(uri)¶
Extract extension from
uri- Parameters:
uri (
str) – URI to extract the file extension.
- classmethod get(uri)¶
Loads a file from specified location, remote or local.
- Parameters:
uri (
str) – Location where the file are stored.- Returns:
If successful, returns the file bytes, or an Enum describing that the format wasn’t recognized.
- Return type:
bytesorEnum
- classmethod get_metadata(uri)¶
Returns underlying file metadata.
- Parameters:
uri (
str) – Location where the file are stored.- Returns:
Describing several file metadata
- Return type:
dict
- infer_file_type_from_uri(uri, mimetype=False)¶
Infers the file type from an uri, with optional mime type discovery.
- Parameters:
uri (
str) – With the file location to be analyzed.mimetype (
bool) – If should be returned also the discovered mime type.
- Returns:
str if mimetype is False, this case describing the format or tuple if mimetype is True, adding the mimetype to the return.
- Return type:
strortuple- Raises:
FileTypeNotSupportedYet – For a not supported file type.
Example
infer_file_type_from_uri('image.jpg') # Returns 'JPG'infer_file_type_from_uri('image.jpg, mimetype=True') # Returns ('JPG' , 'image/jpeg')
- classmethod progress(iterable, precise=False, message=None)¶
Utility method to interface process progress bar with users. It supports two way of unpacking the iterable, throughout
preciseparameters. Ifpreciseis set toFalse(which is the default), the parameteriterablewill be unpacked as is. This leads to an imprecise rolling of items (in other words, the method doesn’t know apriori the total number of elements initerable). Otherwise, ifpreciseis set toTrue, aniterablewhich is not unpacked (like agenerator) will be first unrolled, which is much slower in some cases, but leads to a precise progress bar.- Parameters:
iterable (
iter) – Objects which supports iteration.precise (
bool) – If the progress should be precise (with actual percentage of completion) or just an interface during process running.message (
str) – A custom message when reporting progress.
- Yields:
type– It returns any type on the iterable passed through theiterableparameter.
- static pupyl_temp_data_dir()¶
Returns the path of a temporary directory to store pupyl assets.
- Returns:
A path containing the underlying temporary directory, found in the current operating system, added with a special directory for saving pupyl assets.
- Return type:
str
- static resolve_path_end(path)¶
Removes directory separators from the end of some path (if it exists).
- Parameters:
path (
str) – Complete path to be analyzed.- Returns:
A path without an ending character.
- Return type:
str
- static safe_temp_file(**kwargs)¶
Creates a secure temporary file name, which means a file with an unique name.
If a file with the same name is found, it’s deleted before generating a new unique name.
- Parameters:
file_name (
str) – Defining a temporary file to assert.- Returns:
With the complete path of the new temporary file created.
- Return type:
str
- scan(uri)¶
Returns a validated uri, resolving several cases related to file types and methods for reading it. It also choose the best discovery method.
- Parameters:
uri (
str) – A file or directory to scan.- Yields:
str– With actual underlying data like bytes internally on the compressed file container.
- classmethod scan_compressed_tar_file(uri, file_reader)¶
Scans a compressed tar file.
- Parameters:
uri (
str) – Location where the tar file is stored.file_reader (
str) – Suitable file reader type.
- Yields:
str– Paths of the already untarred files on the temporary directory.
- classmethod scan_csv(uri)¶
Scanner for CSV text files.
- Parameters:
uri (
str) – Where CSV file resides- Yields:
str– With the discovery file paths- Raises:
FileTypeNotSupportedYet – For a not supported file type.
- classmethod scan_csv_bzip2(uri)¶
Scanner for CSV text files, compressed with Bzip2 algorithm.
- Parameters:
uri (
str) – Where the bzip2 csv file resides.- Yields:
str– With the discovered file paths.
- classmethod scan_csv_gzip(uri)¶
Scanner for CSV formatted text files, compressed with gzip algorithm.
- Parameters:
uri (
str) – Where csv file resides- Yields:
str– With the discovery file paths
- classmethod scan_csv_xz(uri)¶
Scanner for CSV text files, compressed with Lzma algorithm.
- Parameters:
uri (
str) – Where csv xz file resides.- Yields:
str– With the discovered file paths.
- classmethod scan_csv_zip(uri)¶
Scanner for CSV text files, compressed with Zip algorithm.
- Parameters:
uri (
str) – Where zipped csv file resides.- Yields:
str– With the discovered file paths.
- static timestamp_to_iso8601(timestamp)¶
Converts an Unix epoch integer to ISO8601 format. The converted date is in UTC (GMT-0).
- Parameters:
timestamp (
int) – With a integer timestamp (seconds after the zero hour of 1970).- Returns:
A string with formatted date using the mask %Y-%m-%dT%H:%M:%S
- Return type:
str
- class pupyl.duplex.file_io.Protocols(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Defines several possible protocol enumerators to be discovered.
Notes
The current supported protocols are:
UNKNOWN: Unknown protocolHTTP: Hypertext Transfer Protocol (also Secure supported)FILE: Local storage file.