API Reference

Below are the main modules in the S2Omics pipeline, with purpose, CLI parameters, and I/O descriptions. Autodoc will generate detailed function and class listings from docstrings.

s2omics.p1_histology_preprocess

histology_preprocess()

Purpose:

Scale and pad raw H&E stained images to a target resolution so that image dimensions are divisible by the patch size.

Parameters:

  • prefix: path to H&E image folder, str

  • show_image: if output the H&E image or not, bool, default = False

Return:

  • he-scaled.jpg: rescaled image. Saved under prefix folder

  • he.jpg: scaled + padded image. Saved under prefix folder

s2omics.p2_superpixel_quality_control

superpixel_quality_control()

Purpose:

Split histology image (he.jpg) into superpixels and filter out tiles without nuclei or with low structural quality, using density and texture analysis. The new version of S2-omics use QC package HistoSweep for this step.

Parameters:

  • prefix: path to H&E image folder, str

  • save_folder: path to results folder, str

  • density_thresh: HistoSweep parameter, threshold for identifying low density superpixels, int, default=100

  • clean_background_flag: HistoSweep parameter, whether to preserve fibrous regions that are otherwise being incorrectly filtered out, bool, default=False

  • patch_size: the shape of superpixels, int, default=16 means that all superpixels are 16*16 pathces

  • show_image: if output the H&E image or not, bool, default = False

Return:

  • shapes.pickle: image and superpixel shape information in pickle format, saved under save_folder/pickle_files

  • qc_preserve_indicator.pickle: binary mask in pickle format, saved under save_folder/pickle_files

  • mask-small.png: binary mask, saved under HistoSweep_output folder

s2omics.p3_feature_extraction

feature_extraction()

Purpose:

Apply a foundation model (UNI / Virchow / GigaPath) to extract hierarchical features from superpixels.

Parameters:

  • prefix: folder path of H&E stained image, ‘/home/H&E_image/’ for an example

  • save_folder: the name of save folder

  • foundation_model: the name of foundation model used for feature extraction, user can select from uni, virchow and gigapath

  • ckpt_path: the path to foundation model parameter files (should be named as ‘pytorch_model.bin’), ‘./checkpoints/uni/’ for an example

  • device: default = ‘cuda:0’

  • batch_size: default = 32

  • down_samp_step: the down-sampling step, default = 10 refers to only extract features for superpixels whose row_index and col_index can both be divided by 10 (roughly 1:100 down-sampling rate). down_samp_step = 1 means extract features for every superpixel

  • num_workers: default = 4

Return:

  • Hierarchical embeddings saved in multiple .pickle parts, saved under save_folder/pickle_files

s2omics.single_section.p4_get_histology_segmentation

get_histology_segmentation()

Purpose: Cluster PCA-reduced embeddings into morphological clusters using chosen algorithm.

Parameters:

  • prefix: folder path of H&E stained image, ‘/home/H&E_image/’ for an example

  • save_folder: the name of save folder

  • foundation_model: the name of foundation model used for feature extraction, user can select from uni, virchow and gigapath

  • cache_path: the path to exatracted feature embedding files

  • down_samp_step: the down-sampling step for feature extraction, default = 10, which refers to 1:10^2 down-sampling rate

  • clustering_method: the clustering method used for H&E image segmentation, user can select among {‘kmeans’: k-means++, ‘fcm’: fuzzy c-means, ‘louvain’: Louvain algorithm, ‘leiden’: Leiden algorithm}, default = ‘kmeans’

  • n_clusters: initial number of clusters for histology segmentation when using kmeans or fcm for clustering. default=20. Please notice that this is not the final number of clusters when clustering method is fcm.

  • resolution: resolution for leiden algorithm, default=1.0

  • if_evaluate: if evaluate the clustering results by quantitative metrics, default=False

Return:

  • cluster_image.pickle: Cluster map.

  • Cluster RGB image.

s2omics.single_section.p5_merge_over_clusters

merge_over_clusters()

Purpose:

Merge morphological clusters with high similarity to target number using hierarchical linkage.

Parameters:

  • prefix: folder path of H&E stained image, ‘/home/H&E_image/’ for an example

  • save_folder: the name of save folder

  • target_n_clusters: the final number of clusters user want to preserve, default=15

Return:

  • adjusted_cluster_image.pickle: Merged cluster map.

  • Adjusted segmentation image.

s2omics.single_section.p6_roi_selection_rectangle

roi_selection_for_single_section()

Purpose:

Automatically select rectangular ROIs based on scoring criteria:

  • Scale score (size coverage)

  • Coverage score (valid cell proportion)

  • Balance score (match desired cluster composition)

Parameters:

  • prefix: folder path of H&E stained image, ‘/home/H&E_image/’ for an example

  • save_folder: the name of save folder

  • has_annotation: if True, use the cell type annotation file instead of histology segmentation results for ROI selection

  • cache_path: if user want to specify another segmentation result for ROi selection, please insert the path here

  • down_samp_step: the down-sampling step for feature extraction, default = 10, which refers to 1:10^2 down-sampling rate

  • roi_size: the physical size (mm x mm) of ROIs, default = [6.5, 6.5] which is the physical size for Visium HD ROI

  • rotation_seg: the number of difference angles ROI can rotate, default=6 means the a ROI can rotate to 30/60/90/120/150/180 degrees

  • num_roi: number of ROIs to be selected, default = 0 refers to automatic determination

  • optimal_roi_thres: hyper-parameter for automatic ROI determination, default = 0.03 is suitable for most cases, recommend to be set as 0 when selecting FOVs. If you want to select more ROIs, please lower this parameter

  • fusion_weights: the weight of three scores, default=[0.33,0.33,0.33], the sum of three weights should be equal to 1 (if not they will be normalized)

  • emphasize_clusters, discard_clusters: prior information about interested and not-interested histology clusters, default = [],[]

  • prior_preference: the larger this parameter is, S2Omics will focus more on those interested histology clusters, default= 1

Return:

  • ROI visualizations on segmentation and raw histology image.

  • best_roi.pickle: ROI details and score breakdown.

s2omics.single_section.p6_roi_selection_circle

roi_selection_for_single_section()

Purpose: Same as rectangular ROI selection, but using circular geometry. Suitable for TMA core or circular ROI scans.

Parameters:

  • prefix: folder path of H&E stained image, ‘/home/H&E_image/’ for an example

  • save_folder: the name of save folder

  • has_annotation: if True, use the cell type annotation file instead of histology segmentation results for ROI selection

  • cache_path: if user want to specify another segmentation result for ROi selection, please insert the path here

  • down_samp_step: the down-sampling step for feature extraction, default = 10, which refers to 1:10^2 down-sampling rate

  • roi_size: the physical size (mm x mm) of circle-shaped ROIs, default = [0.5, 0.5] means the r=0.5

  • rotation_seg: the number of difference angles ROI can rotate, default=6 means the a ROI can rotate to 30/60/90/120/150/180 degrees

  • num_roi: number of ROIs to be selected, default = 0 refers to automatic determination

  • optimal_roi_thres: hyper-parameter for automatic ROI determination, default = 0.03 is suitable for most cases, recommend to be set as 0 when selecting FOVs. If you want to select more ROIs, please lower this parameter

  • fusion_weights: the weight of three scores, default=[0.33,0.33,0.33], the sum of three weights should be equal to 1 (if not they will be normalized)

  • emphasize_clusters, discard_clusters: prior information about interested and not-interested histology clusters, default = [],[]

  • prior_preference: the larger this parameter is, S2Omics will focus more on those interested histology clusters, default= 1

Return:

  • ROI visualizations on segmentation and raw histology image.

  • best_roi.pickle: ROI details and score breakdown.

s2omics.single_section.p7_cell_label_broadcasting

label_broadcasting()

Purpose:

After user obtained the spatial omics data of the selected small ROI, we can annotate the superpixels in the paired H&E image with cell type labels.

Afterwards, we can transfer the label information to the previously stained whole-slide H&E image to obtain whole-slide level cell type spatial distribution.

This function trains an Autoencoder-based classifier using ROI-scale spatial omics cell annotations, then broadcast labels to the entire slide.

Parameters:

  • WSI_datapath: path to the whole slide H&E image

  • WSI_save_folder: save path to the whole slide H&E image results

  • SO_datapath: path to the spatial omics data and accroding H&E image

  • SO_save_folder: save path to the spatial omics data and accroding H&E image results

  • WSI_cache_path: path to the extracted histology feature of the WSI, if it is already obtained, default=’’

  • SO_cache_path: path to the extracted histology feature of the SO, if it is already obtained, default=’’

  • device: default=’cuda:0’

  • foundation_model: the name of foundation model used for feature extraction, user can select from uni, virchow and gigapath

Return:

  • S2Omics_whole_slide_prediction.jpg: Predicted whole-slide cell type map.

s2omics.multiple_sections.p4_get_histology_segmentation

get_joint_histology_segmentation()

Purpose: Jointly cluster PCA-reduced embeddings of multiple slides into morphological clusters using chosen algorithm.

Parameters:

  • prefix_list: list of folder path of H&E stained image, [‘/home/H&E_image/’] for an example

  • save_folder_list: list of the name of save folder

  • foundation_model: the name of foundation model used for feature extraction, user can select from uni, virchow and gigapath

  • cache_path: the path to exatracted feature embedding files

  • down_samp_step: the down-sampling step for feature extraction, default = 10, which refers to 1:10^2 down-sampling rate

  • clustering_method: the clustering method used for H&E image segmentation, user can select among {‘kmeans’: k-means++, ‘fcm’: fuzzy c-means, ‘louvain’: Louvain algorithm, ‘leiden’: Leiden algorithm}, default = ‘kmeans’

  • n_clusters: initial number of clusters for histology segmentation when using kmeans or fcm for clustering. default=20. Please notice that this is not the final number of clusters when clustering method is fcm.

  • resolution: resolution for leiden algorithm, default=1.0

  • if_evaluate: if evaluate the clustering results by quantitative metrics, default=False

Return:

  • cluster_image.pickle: Cluster map.

  • Cluster RGB image.

s2omics.multiple_sections.p6_roi_selection_rectangle

roi_selection_for_multiple_sections()

Purpose:

Automatically select rectangular ROIs based on scoring criteria:

  • Scale score (size coverage)

  • Coverage score (valid cell proportion)

  • Balance score (match desired cluster composition)

Parameters:

  • prefix_list: list of folder path of H&E stained image, [‘/home/H&E_image/’] for an example

  • save_folder_list: list of the name of save folder

  • has_annotation: if True, use the cell type annotation file instead of histology segmentation results for ROI selection

  • cache_path: if user want to specify another segmentation result for ROi selection, please insert the path here

  • down_samp_step: the down-sampling step for feature extraction, default = 10, which refers to 1:10^2 down-sampling rate

  • roi_size: the physical size (mm x mm) of ROIs, default = [6.5, 6.5] which is the physical size for Visium HD ROI

  • rotation_seg: the number of difference angles ROI can rotate, default=6 means the a ROI can rotate to 30/60/90/120/150/180 degrees

  • num_roi: number of ROIs to be selected, default = 0 refers to automatic determination

  • optimal_roi_thres: hyper-parameter for automatic ROI determination, default = 0.03 is suitable for most cases, recommend to be set as 0 when selecting FOVs. If you want to select more ROIs, please lower this parameter

  • fusion_weights: the weight of three scores, default=[0.33,0.33,0.33], the sum of three weights should be equal to 1 (if not they will be normalized)

  • emphasize_clusters, discard_clusters: prior information about interested and not-interested histology clusters, default = [],[]

  • prior_preference: the larger this parameter is, S2Omics will focus more on those interested histology clusters, default= 1

Return:

  • ROI visualizations on segmentation and raw histology image.

  • best_roi.pickle: ROI details and score breakdown.

s2omics.single_section.p6_roi_selection_circle

roi_selection_for_multiple_sections()

Purpose: Same as rectangular ROI selection, but using circular geometry. Suitable for TMA core or circular ROI scans.

Parameters:

  • prefix_list: list of folder path of H&E stained image, [‘/home/H&E_image/’] for an example

  • save_folder_list: list of the name of save folder

  • has_annotation: if True, use the cell type annotation file instead of histology segmentation results for ROI selection

  • cache_path: if user want to specify another segmentation result for ROi selection, please insert the path here

  • down_samp_step: the down-sampling step for feature extraction, default = 10, which refers to 1:10^2 down-sampling rate

  • roi_size: the physical size (mm x mm) of circle-shaped ROIs, default = [0.5, 0.5] means the r=0.5

  • rotation_seg: the number of difference angles ROI can rotate, default=6 means the a ROI can rotate to 30/60/90/120/150/180 degrees

  • num_roi: number of ROIs to be selected, default = 0 refers to automatic determination

  • optimal_roi_thres: hyper-parameter for automatic ROI determination, default = 0.03 is suitable for most cases, recommend to be set as 0 when selecting FOVs. If you want to select more ROIs, please lower this parameter

  • fusion_weights: the weight of three scores, default=[0.33,0.33,0.33], the sum of three weights should be equal to 1 (if not they will be normalized)

  • emphasize_clusters, discard_clusters: prior information about interested and not-interested histology clusters, default = [],[]

  • prior_preference: the larger this parameter is, S2Omics will focus more on those interested histology clusters, default= 1

Return:

  • ROI visualizations on segmentation and raw histology image.

  • best_roi.pickle: ROI details and score breakdown.

s2omics.multiple_sections.p6_cell_label_broadcasting

label_broadcasting()

Purpose:

After user obtained the spatial omics data of the selected small ROI, we can annotate the superpixels in the paired H&E image with cell type labels.

Afterwards, we can transfer the label information to the previously stained whole-slide H&E image to obtain whole-slide level cell type spatial distribution.

This function trains an Autoencoder-based classifier using ROI-scale spatial omics cell annotations, then broadcast labels to the entire slide.

Parameters:

  • WSI_datapath: path to the whole slide H&E image

  • WSI_save_folder: save path to the whole slide H&E image results

  • SO_datapath: path to the spatial omics data and accroding H&E image

  • SO_save_folder: save path to the spatial omics data and accroding H&E image results

  • WSI_cache_path: path to the extracted histology feature of the WSI, if it is already obtained, default=’’

  • SO_cache_path: path to the extracted histology feature of the SO, if it is already obtained, default=’’

  • device: default=’cuda:0’

  • foundation_model: the name of foundation model used for feature extraction, user can select from uni, virchow and gigapath

Return:

  • S2Omics_whole_slide_prediction.jpg: Predicted whole-slide cell type map.