API Reference

Below are the main modules in the S2Omics pipeline, with purpose, CLI parameters, and I/O descriptions. Autodoc will generate detailed function and class listings from docstrings.

p1_histology_preprocess

Purpose: Scale and pad raw H&E stained images to a target resolution so that image dimensions are divisible by the patch size.

Inputs: - Raw image file (he-raw.jpg or .png/.tiff/.svs). - pixel-size-raw.txt – Physical resolution in µm/pixel. - pixel-size.txt – Target resolution after rescaling (default 0.5 µm).

Outputs: - he-scaled.jpg – rescaled image. - he.jpg – scaled + padded image.

CLI Arguments:

Argument

Default

Description

prefix

(pos.)

Path to H&E image folder.

p2_superpixel_quality_control

Purpose: Split histology image (he.jpg) into superpixels and filter out tiles without nuclei or with low structural quality, using density and texture analysis.

Outputs: - qc_preserve_indicator.pickle – Boolean matrix of valid superpixels. - QC mask image (qc_mask.jpg).

CLI Arguments:

Argument

Default

Description

prefix

(pos.)

Input histology folder.

–save_folder

S2Omics_output | Output directory name.

–patch_size

16

Superpixel dimension (px).

–density_thresh

100

RGB density threshold.

–clean_background_flag

off

Preserve fibrous regions if set.

p3_feature_extraction

Purpose: Apply a foundation model (UNI / Virchow / GigaPath) to extract hierarchical features from superpixels.

Outputs: - Hierarchical embeddings saved in multiple .pickle parts.

CLI Arguments:

p4_get_histology_segmentation

Purpose: Cluster PCA-reduced embeddings into morphological clusters using chosen algorithm.

Outputs: - cluster_image.pickle – Cluster map. - Cluster RGB image.

CLI Arguments:

p5_merge_over_clusters

Purpose: Merge morphological clusters with high similarity to target number using hierarchical linkage.

Outputs: - adjusted_cluster_image.pickle – Merged cluster map. - Adjusted segmentation image.

CLI Arguments:

Argument

Default

Description

prefix

(pos.)

Input folder.

–save_folder

S2Omics_output | Output directory.

–target_n_clusters

15

Desired final cluster number.

p6_roi_selection_rectangle

Purpose: Automatically select rectangular ROIs based on scoring criteria: - Scale score (size coverage) - Coverage score (valid cell proportion) - Balance score (match desired cluster composition)

Outputs: - ROI visualizations on segmentation and raw histology image. - best_roi.pickle – ROI details and score breakdown.

CLI Arguments:

Argument

Default

Description

prefix

(pos.)

Input folder.

–save_folder

S2Omics_output | Output folder.

–roi_size

[6.5,6.5]

Physical size in mm (width height).

–num_roi

0

Number of ROIs (0 = auto-determine optimal).

–positive_prior

[]

Clusters to emphasize.

–negative_prior

[]

Clusters to de-prioritize.

–negative_prior | [] | Clusters to de-prioritize. |

p6_roi_selection_circle

Purpose: Same as rectangular ROI selection, but using circular geometry. Suitable for TMA core or circular ROI scans.

CLI Arguments: Similar to rectangle, with –roi_size interpreted as radius.

p7_cell_label_broadcasting

Purpose: Train an Autoencoder-based classifier using ROI-scale spatial omics cell annotations, then broadcast labels to the entire slide.

Outputs: - S2Omics_whole_slide_prediction.jpg – Predicted whole-slide cell type map.

CLI Arguments:

Argument

Default

Description

WSI_datapath

(pos.)

Whole-slide input folder.

SO_datapath

(pos.)

Spatial omics ROI input folder.

–foundation_model

uni

Model for embeddings.

–foundation_model| uni | Model for embeddings. |

Utility Modules

Low-level utilities used in multiple steps (I/O helpers, seeding, image operations).