API Reference ============= Below are the main modules in the S2Omics pipeline, with purpose, CLI parameters, and I/O descriptions. Autodoc will generate detailed function and class listings from docstrings. ---------------------------------------- s2omics.p1_histology_preprocess ---------------------------------------- **histology_preprocess()** **Purpose:** Scale and pad raw H&E stained images to a target resolution so that image dimensions are divisible by the patch size. **Parameters:** - prefix: path to H&E image folder, str - show_image: if output the H&E image or not, bool, default = False **Return:** - `he-scaled.jpg`: rescaled image. Saved under prefix folder - `he.jpg`: scaled + padded image. Saved under prefix folder ---------------------------------------- s2omics.p2_superpixel_quality_control ---------------------------------------- **superpixel_quality_control()** **Purpose:** Split histology image (`he.jpg`) into superpixels and filter out tiles without nuclei or with low structural quality, using density and texture analysis. The new version of S2-omics use QC package HistoSweep for this step. **Parameters:** - prefix: path to H&E image folder, str - save_folder: path to results folder, str - density_thresh: HistoSweep parameter, threshold for identifying low density superpixels, int, default=100 - clean_background_flag: HistoSweep parameter, whether to preserve fibrous regions that are otherwise being incorrectly filtered out, bool, default=False - patch_size: the shape of superpixels, int, default=16 means that all superpixels are 16*16 pathces - show_image: if output the H&E image or not, bool, default = False **Return:** - `shapes.pickle`: image and superpixel shape information in pickle format, saved under save_folder/pickle_files - `qc_preserve_indicator.pickle`: binary mask in pickle format, saved under save_folder/pickle_files - `mask-small.png`: binary mask, saved under HistoSweep_output folder ---------------------------------------- s2omics.p3_feature_extraction ---------------------------------------- **feature_extraction()** **Purpose:** Apply a foundation model (UNI / Virchow / GigaPath) to extract hierarchical features from superpixels. **Parameters:** - prefix: folder path of H&E stained image, '/home/H&E_image/' for an example - save_folder: the name of save folder - foundation_model: the name of foundation model used for feature extraction, user can select from uni, virchow and gigapath - ckpt_path: the path to foundation model parameter files (should be named as 'pytorch_model.bin'), './checkpoints/uni/' for an example - device: default = 'cuda:0' - batch_size: default = 32 - down_samp_step: the down-sampling step, default = 10 refers to only extract features for superpixels whose row_index and col_index can both be divided by 10 (roughly 1:100 down-sampling rate). down_samp_step = 1 means extract features for every superpixel - num_workers: default = 4 **Return:** - Hierarchical embeddings saved in multiple `.pickle` parts, saved under save_folder/pickle_files ---------------------------------------- s2omics.single_section.p4_get_histology_segmentation ---------------------------------------- **get_histology_segmentation()** **Purpose:** Cluster PCA-reduced embeddings into morphological clusters using chosen algorithm. **Parameters:** - prefix: folder path of H&E stained image, '/home/H&E_image/' for an example - save_folder: the name of save folder - foundation_model: the name of foundation model used for feature extraction, user can select from uni, virchow and gigapath - cache_path: the path to exatracted feature embedding files - down_samp_step: the down-sampling step for feature extraction, default = 10, which refers to 1:10^2 down-sampling rate - clustering_method: the clustering method used for H&E image segmentation, user can select among {'kmeans': k-means++, 'fcm': fuzzy c-means, 'louvain': Louvain algorithm, 'leiden': Leiden algorithm}, default = 'kmeans' - n_clusters: initial number of clusters for histology segmentation when using kmeans or fcm for clustering. default=20. Please notice that this is not the final number of clusters when clustering method is fcm. - resolution: resolution for leiden algorithm, default=1.0 - if_evaluate: if evaluate the clustering results by quantitative metrics, default=False **Return:** - `cluster_image.pickle`: Cluster map. - Cluster RGB image. ---------------------------------------- s2omics.single_section.p5_merge_over_clusters ---------------------------------------- **merge_over_clusters()** **Purpose:** Merge morphological clusters with high similarity to target number using hierarchical linkage. **Parameters:** - prefix: folder path of H&E stained image, '/home/H&E_image/' for an example - save_folder: the name of save folder - target_n_clusters: the final number of clusters user want to preserve, default=15 **Return:** - `adjusted_cluster_image.pickle`: Merged cluster map. - Adjusted segmentation image. ---------------------------------------- s2omics.single_section.p6_roi_selection_rectangle ---------------------------------------- **roi_selection_for_single_section()** **Purpose:** Automatically select rectangular ROIs based on scoring criteria: - **Scale score** (size coverage) - **Coverage score** (valid cell proportion) - **Balance score** (match desired cluster composition) **Parameters:** - prefix: folder path of H&E stained image, '/home/H&E_image/' for an example - save_folder: the name of save folder - has_annotation: if True, use the cell type annotation file instead of histology segmentation results for ROI selection - cache_path: if user want to specify another segmentation result for ROi selection, please insert the path here - down_samp_step: the down-sampling step for feature extraction, default = 10, which refers to 1:10^2 down-sampling rate - roi_size: the physical size (mm x mm) of ROIs, default = [6.5, 6.5] which is the physical size for Visium HD ROI - rotation_seg: the number of difference angles ROI can rotate, default=6 means the a ROI can rotate to 30/60/90/120/150/180 degrees - num_roi: number of ROIs to be selected, default = 0 refers to automatic determination - optimal_roi_thres: hyper-parameter for automatic ROI determination, default = 0.03 is suitable for most cases, recommend to be set as 0 when selecting FOVs. If you want to select more ROIs, please lower this parameter - fusion_weights: the weight of three scores, default=[0.33,0.33,0.33], the sum of three weights should be equal to 1 (if not they will be normalized) - emphasize_clusters, discard_clusters: prior information about interested and not-interested histology clusters, default = [],[] - prior_preference: the larger this parameter is, S2Omics will focus more on those interested histology clusters, default= 1 **Return:** - ROI visualizations on segmentation and raw histology image. - `best_roi.pickle`: ROI details and score breakdown. ---------------------------------------- s2omics.single_section.p6_roi_selection_circle ---------------------------------------- **roi_selection_for_single_section()** **Purpose:** Same as rectangular ROI selection, but using circular geometry. Suitable for TMA core or circular ROI scans. **Parameters:** - prefix: folder path of H&E stained image, '/home/H&E_image/' for an example - save_folder: the name of save folder - has_annotation: if True, use the cell type annotation file instead of histology segmentation results for ROI selection - cache_path: if user want to specify another segmentation result for ROi selection, please insert the path here - down_samp_step: the down-sampling step for feature extraction, default = 10, which refers to 1:10^2 down-sampling rate - roi_size: the physical size (mm x mm) of circle-shaped ROIs, default = [0.5, 0.5] means the r=0.5 - rotation_seg: the number of difference angles ROI can rotate, default=6 means the a ROI can rotate to 30/60/90/120/150/180 degrees - num_roi: number of ROIs to be selected, default = 0 refers to automatic determination - optimal_roi_thres: hyper-parameter for automatic ROI determination, default = 0.03 is suitable for most cases, recommend to be set as 0 when selecting FOVs. If you want to select more ROIs, please lower this parameter - fusion_weights: the weight of three scores, default=[0.33,0.33,0.33], the sum of three weights should be equal to 1 (if not they will be normalized) - emphasize_clusters, discard_clusters: prior information about interested and not-interested histology clusters, default = [],[] - prior_preference: the larger this parameter is, S2Omics will focus more on those interested histology clusters, default= 1 **Return:** - ROI visualizations on segmentation and raw histology image. - `best_roi.pickle`: ROI details and score breakdown. ---------------------------------------- s2omics.single_section.p7_cell_label_broadcasting ---------------------------------------- **label_broadcasting()** **Purpose:** After user obtained the spatial omics data of the selected small ROI, we can annotate the superpixels in the paired H&E image with cell type labels. Afterwards, we can transfer the label information to the previously stained whole-slide H&E image to obtain whole-slide level cell type spatial distribution. This function trains an Autoencoder-based classifier using ROI-scale spatial omics cell annotations, then broadcast labels to the entire slide. **Parameters:** - WSI_datapath: path to the whole slide H&E image - WSI_save_folder: save path to the whole slide H&E image results - SO_datapath: path to the spatial omics data and accroding H&E image - SO_save_folder: save path to the spatial omics data and accroding H&E image results - WSI_cache_path: path to the extracted histology feature of the WSI, if it is already obtained, default='' - SO_cache_path: path to the extracted histology feature of the SO, if it is already obtained, default='' - device: default='cuda:0' - foundation_model: the name of foundation model used for feature extraction, user can select from uni, virchow and gigapath **Return:** - `S2Omics_whole_slide_prediction.jpg`: Predicted whole-slide cell type map. ---------------------------------------- s2omics.multiple_sections.p4_get_histology_segmentation ---------------------------------------- **get_joint_histology_segmentation()** **Purpose:** Jointly cluster PCA-reduced embeddings of multiple slides into morphological clusters using chosen algorithm. **Parameters:** - prefix_list: list of folder path of H&E stained image, ['/home/H&E_image/'] for an example - save_folder_list: list of the name of save folder - foundation_model: the name of foundation model used for feature extraction, user can select from uni, virchow and gigapath - cache_path: the path to exatracted feature embedding files - down_samp_step: the down-sampling step for feature extraction, default = 10, which refers to 1:10^2 down-sampling rate - clustering_method: the clustering method used for H&E image segmentation, user can select among {'kmeans': k-means++, 'fcm': fuzzy c-means, 'louvain': Louvain algorithm, 'leiden': Leiden algorithm}, default = 'kmeans' - n_clusters: initial number of clusters for histology segmentation when using kmeans or fcm for clustering. default=20. Please notice that this is not the final number of clusters when clustering method is fcm. - resolution: resolution for leiden algorithm, default=1.0 - if_evaluate: if evaluate the clustering results by quantitative metrics, default=False **Return:** - `cluster_image.pickle`: Cluster map. - Cluster RGB image. ---------------------------------------- s2omics.multiple_sections.p6_roi_selection_rectangle ---------------------------------------- **roi_selection_for_multiple_sections()** **Purpose:** Automatically select rectangular ROIs based on scoring criteria: - **Scale score** (size coverage) - **Coverage score** (valid cell proportion) - **Balance score** (match desired cluster composition) **Parameters:** - prefix_list: list of folder path of H&E stained image, ['/home/H&E_image/'] for an example - save_folder_list: list of the name of save folder - has_annotation: if True, use the cell type annotation file instead of histology segmentation results for ROI selection - cache_path: if user want to specify another segmentation result for ROi selection, please insert the path here - down_samp_step: the down-sampling step for feature extraction, default = 10, which refers to 1:10^2 down-sampling rate - roi_size: the physical size (mm x mm) of ROIs, default = [6.5, 6.5] which is the physical size for Visium HD ROI - rotation_seg: the number of difference angles ROI can rotate, default=6 means the a ROI can rotate to 30/60/90/120/150/180 degrees - num_roi: number of ROIs to be selected, default = 0 refers to automatic determination - optimal_roi_thres: hyper-parameter for automatic ROI determination, default = 0.03 is suitable for most cases, recommend to be set as 0 when selecting FOVs. If you want to select more ROIs, please lower this parameter - fusion_weights: the weight of three scores, default=[0.33,0.33,0.33], the sum of three weights should be equal to 1 (if not they will be normalized) - emphasize_clusters, discard_clusters: prior information about interested and not-interested histology clusters, default = [],[] - prior_preference: the larger this parameter is, S2Omics will focus more on those interested histology clusters, default= 1 **Return:** - ROI visualizations on segmentation and raw histology image. - `best_roi.pickle`: ROI details and score breakdown. ---------------------------------------- s2omics.single_section.p6_roi_selection_circle ---------------------------------------- **roi_selection_for_multiple_sections()** **Purpose:** Same as rectangular ROI selection, but using circular geometry. Suitable for TMA core or circular ROI scans. **Parameters:** - prefix_list: list of folder path of H&E stained image, ['/home/H&E_image/'] for an example - save_folder_list: list of the name of save folder - has_annotation: if True, use the cell type annotation file instead of histology segmentation results for ROI selection - cache_path: if user want to specify another segmentation result for ROi selection, please insert the path here - down_samp_step: the down-sampling step for feature extraction, default = 10, which refers to 1:10^2 down-sampling rate - roi_size: the physical size (mm x mm) of circle-shaped ROIs, default = [0.5, 0.5] means the r=0.5 - rotation_seg: the number of difference angles ROI can rotate, default=6 means the a ROI can rotate to 30/60/90/120/150/180 degrees - num_roi: number of ROIs to be selected, default = 0 refers to automatic determination - optimal_roi_thres: hyper-parameter for automatic ROI determination, default = 0.03 is suitable for most cases, recommend to be set as 0 when selecting FOVs. If you want to select more ROIs, please lower this parameter - fusion_weights: the weight of three scores, default=[0.33,0.33,0.33], the sum of three weights should be equal to 1 (if not they will be normalized) - emphasize_clusters, discard_clusters: prior information about interested and not-interested histology clusters, default = [],[] - prior_preference: the larger this parameter is, S2Omics will focus more on those interested histology clusters, default= 1 **Return:** - ROI visualizations on segmentation and raw histology image. - `best_roi.pickle`: ROI details and score breakdown. ---------------------------------------- s2omics.multiple_sections.p6_cell_label_broadcasting ---------------------------------------- **label_broadcasting()** **Purpose:** After user obtained the spatial omics data of the selected small ROI, we can annotate the superpixels in the paired H&E image with cell type labels. Afterwards, we can transfer the label information to the previously stained whole-slide H&E image to obtain whole-slide level cell type spatial distribution. This function trains an Autoencoder-based classifier using ROI-scale spatial omics cell annotations, then broadcast labels to the entire slide. **Parameters:** - WSI_datapath: path to the whole slide H&E image - WSI_save_folder: save path to the whole slide H&E image results - SO_datapath: path to the spatial omics data and accroding H&E image - SO_save_folder: save path to the spatial omics data and accroding H&E image results - WSI_cache_path: path to the extracted histology feature of the WSI, if it is already obtained, default='' - SO_cache_path: path to the extracted histology feature of the SO, if it is already obtained, default='' - device: default='cuda:0' - foundation_model: the name of foundation model used for feature extraction, user can select from uni, virchow and gigapath **Return:** - `S2Omics_whole_slide_prediction.jpg`: Predicted whole-slide cell type map.