pleiades.utils.files module

File utilities for PLEIADES neutron imaging data processing.

This module provides utilities for file discovery, metadata extraction, and data export operations. It includes functions for finding image files with dominant extensions, extracting timing information from filenames, and exporting processed data to ASCII format.

The module supports: - Automatic file discovery with extension filtering - Filename-based metadata extraction for neutron imaging files - ASCII data export with proper formatting - Robust error handling for file operations

Example

Basic file discovery and export:

>>> files, ext = retrieve_list_of_most_dominant_extension_from_folder("/path/to/data")
>>> print(f"Found {len(files)} {ext} files")
>>>
>>> data_dict = {"energy": [1, 2, 3], "transmission": [0.8, 0.6, 0.4]}
>>> export_ascii(data_dict, "output.txt")
pleiades.utils.files.retrieve_list_of_most_dominant_extension_from_folder(folder: str = '', files: List[str] = None) Tuple[List[str], str][source]

Find and return files with the most common extension from a folder or file list.

Analyzes a folder or list of files to determine the most frequently occurring file extension, then returns all files with that extension. This is useful for automatically detecting the primary data format in imaging directories.

Parameters:
  • folder (str, optional) – Path to folder to search for files. If provided, files parameter is ignored. Defaults to “”.

  • files (List[str], optional) – List of file paths to analyze. Only used if folder is empty. Defaults to None.

Returns:

A tuple containing:
  • List of absolute file paths with the dominant extension, sorted alphabetically

  • The dominant file extension (e.g., ‘.tiff’, ‘.fits’)

Return type:

Tuple[List[str], str]

Example

From folder: >>> files, ext = retrieve_list_of_most_dominant_extension_from_folder(“/path/to/data”) >>> print(f”Found {len(files)} files with extension {ext}”) Found 100 files with extension .tiff

From file list: >>> file_list = [“/path/file1.tiff”, “/path/file2.tiff”, “/path/file3.fits”] >>> files, ext = retrieve_list_of_most_dominant_extension_from_folder(files=file_list) >>> ext ‘.tiff’

Note

  • If folder is provided, it takes precedence over files parameter

  • Files are returned as absolute paths and sorted alphabetically

  • Extension counting is case-sensitive

  • Hidden files (starting with ‘.’) are included in the search

Raises:
pleiades.utils.files.retrieve_number_of_frames_from_file_name(file_name: str) int[source]

Extract the number of time-of-flight frames from a neutron imaging filename.

Parses specially formatted filenames to extract the number of time frames. The expected format includes ‘T’ followed by the frame count, then ‘p’. This is commonly used in neutron imaging file naming conventions.

Parameters:

file_name (str) – Filename containing frame information in the format ‘…T{frame_count}p…’. Example: ‘image_m2M9997Ex512y512t1e6T2000p1e6P100.tiff’

Returns:

Number of time-of-flight frames extracted from the filename

Return type:

int

Example

>>> filename = "image_m2M9997Ex512y512t1e6T2000p1e6P100.tiff"
>>> frames = retrieve_number_of_frames_from_file_name(filename)
>>> frames
2000
>>> filename = "data_T500p.fits"
>>> frames = retrieve_number_of_frames_from_file_name(filename)
>>> frames
500
Raises:
  • ValueError – If the filename doesn’t contain required ‘T’ and ‘p’ markers

  • ValueError – If the extracted value cannot be converted to an integer

Note

  • The function looks for the pattern ‘T{number}p’ in the filename

  • Only the basename of the file is considered (path is ignored)

  • The number must be a valid integer

pleiades.utils.files.retrieve_time_bin_size_from_file_name(file_name: str) float[source]

Extract the time bin size from a neutron imaging filename.

Parses specially formatted filenames to extract the time bin size used for time-of-flight measurements. The expected format includes ‘t’ followed by the bin size, then ‘T’. Handles scientific notation with automatic correction for common formatting issues.

Parameters:

file_name (str) – Filename containing time bin information in the format ‘…t{bin_size}T…’. Example: ‘image_m2M9997Ex512y512t1e6T2000p1e6P100.tiff’ Scientific notation like ‘1e6’ is supported and corrected to ‘1e-6’.

Returns:

Time bin size in seconds (typically microseconds as 1e-6)

Return type:

float

Example

>>> filename = "image_m2M9997Ex512y512t1e6T2000p1e6P100.tiff"
>>> bin_size = retrieve_time_bin_size_from_file_name(filename)
>>> bin_size
1e-06
>>> filename = "data_t0.001T500p.fits"
>>> bin_size = retrieve_time_bin_size_from_file_name(filename)
>>> bin_size
0.001
Raises:
  • ValueError – If the filename doesn’t contain required ‘t’ and ‘T’ markers

  • ValueError – If the extracted value cannot be converted to a float

Note

  • The function looks for the pattern ‘t{number}T’ in the filename

  • Automatically corrects ‘e’ to ‘e-’ in scientific notation (common formatting)

  • Only the basename of the file is considered (path is ignored)

  • Supports both decimal and scientific notation

pleiades.utils.files.export_ascii(data_dict: Dict[str, List | Any], file_path: str) None[source]

Export processed data to a tab-separated ASCII file.

Converts a dictionary of data arrays to a formatted ASCII file suitable for analysis in external tools. The output uses tab separation with column headers for easy import into spreadsheet or analysis software.

Parameters:
  • data_dict (Dict[str, Union[List, Any]]) – Dictionary containing data to export. Keys become column headers, values become data columns. All values should be array-like with the same length.

  • file_path (str) – Path to the output ASCII file. Parent directories will be created if they don’t exist.

Example

Basic export: >>> data = { … “energy_eV”: [1.0, 2.0, 3.0], … “transmission”: [0.8, 0.6, 0.4], … “uncertainties”: [0.1, 0.08, 0.06] … } >>> export_ascii(data, “transmission_results.txt”) Data exported to transmission_results.txt

Output file format: energy_eV transmission uncertainties 1.0 0.8 0.1 2.0 0.6 0.08 3.0 0.4 0.06

Raises:
  • ValueError – If data_dict is empty or contains mismatched array lengths

  • IOError – If file cannot be written (permissions, disk space, etc.)

  • KeyError – If data_dict contains invalid data types

Note

  • Uses tab separation for easy import into analysis software

  • Includes column headers in the first row

  • Creates parent directories if they don’t exist

  • Overwrites existing files without warning

  • All data columns must have the same length