Tutorial 3: Design spatial omics experiment for consecutive breast cancer sections

Please download according demo data from following link and place it under the demo folder:

google drive: https://drive.google.com/drive/folders/1z1nk0sF_e25LKMyHxJVMtROFjuWet2G_?usp=drive_link

Please also download the checkpoint file for the pathology foundation model and place it under the checkpoints folder

Step 1: Preprocess the H&E image

Make sure the physical size of each pixel is 0.5 micron

[1]:
import sys
sys.path.append('..')
from s2omics.p1_histology_preprocess import histology_preprocess

prefix_list = ['../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/',
              '../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/',
              '../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/']
for prefix in prefix_list:
    histology_preprocess(prefix, show_image=True)
Image loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/he-raw.jpg
Rescaling image (scale: 0.571)...
282 sec
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/he-scaled.jpg
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/he.jpg
Preprocessed H&E image saved!
Image loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/he-raw.jpg
Rescaling image (scale: 0.571)...
277 sec
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/he-scaled.jpg
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/he.jpg
Preprocessed H&E image saved!
Image loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/he-raw.jpg
Rescaling image (scale: 0.571)...
272 sec
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/he-scaled.jpg
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/he.jpg
Preprocessed H&E image saved!
../_images/notebooks_Tutorial_3_Consecutive_ROI_selection_breast_2_1.png

Step 2: Quality control for all superpixels

Superpixels are 8 microns * 8 microns square-shaped pseudo cells

We use our new QC package HistoSweep for this procedure

[2]:
from s2omics.p2_superpixel_quality_control import superpixel_quality_control

save_folder_list = ['../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output',
                    '../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/S2Omics_output',
                    '../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/S2Omics_output']
for (prefix, save_folder) in zip(prefix_list, save_folder_list):
    superpixel_quality_control(prefix, save_folder, show_image=True)
Image loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/he.jpg
0 0
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/pickle_files/shapes.pickle
[compute_metrics_memory_optimized] Current memory: 0.0452 GB; Peak memory: 1.0486 GB
[compute_low_density_mask] Current memory: 0.0012 GB; Peak memory: 0.0714 GB
Total selected for density filtering:  116905
✅ Entropy map saved as 'glcm_entropy_map_colored.png'
✅ Energy map saved as 'glcm_energy_map_colored.png'
✅ Homogeneity map saved as 'glcm_homogeneity_map_colored.png'

=== GLCM Metric Means ===
   homogeneity    energy   entropy
0     0.811977  0.364315  0.393492
1     0.433210  0.087103  0.793905
2     0.585925  0.166482  0.641603
3     0.325897  0.040935  0.879235

=== Cluster Scores ===
Cluster 0: Score = 0.7828
Cluster 1: Score = -0.2736
Cluster 2: Score = 0.1108
Cluster 3: Score = -0.5124

=== Number of Observations per Cluster ===
Cluster 0: 1073
Cluster 1: 4918
Cluster 2: 4614
Cluster 3: 8222
Total: 18827

✅ Clustered texture map saved as 'cluster_labels_colored.png'
[run_texture_analysis] Current memory: 0.0014 GB; Peak memory: 2.8918 GB
[run_ratio_filtering] Current memory: 0.0011 GB; Peak memory: 0.0274 GB
(1212416,)
✅ Final masks saved in: HistoSweep_output
[generate_final_mask] Current memory: 0.0000 GB; Peak memory: 0.2926 GB
Running successfully!
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/pickle_files/qc_preserve_indicator.pickle
Image loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/he.jpg
0 0
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/S2Omics_output/pickle_files/shapes.pickle
[compute_metrics_memory_optimized] Current memory: 0.0466 GB; Peak memory: 1.0760 GB
[compute_low_density_mask] Current memory: 0.0012 GB; Peak memory: 0.0729 GB
Total selected for density filtering:  153677
✅ Entropy map saved as 'glcm_entropy_map_colored.png'
✅ Energy map saved as 'glcm_energy_map_colored.png'
✅ Homogeneity map saved as 'glcm_homogeneity_map_colored.png'

=== GLCM Metric Means ===
   homogeneity    energy   entropy
0     0.461211  0.098785  0.757332
1     0.327859  0.042905  0.871905
2     0.632621  0.188482  0.604731
3     0.838028  0.349599  0.385862

=== Cluster Scores ===
Cluster 0: Score = -0.1973
Cluster 1: Score = -0.5011
Cluster 2: Score = 0.2164
Cluster 3: Score = 0.8018

=== Number of Observations per Cluster ===
Cluster 0: 5982
Cluster 1: 7926
Cluster 2: 4069
Cluster 3: 1547
Total: 19524

✅ Clustered texture map saved as 'cluster_labels_colored.png'
[run_texture_analysis] Current memory: 0.0012 GB; Peak memory: 2.9821 GB
[run_ratio_filtering] Current memory: 0.0012 GB; Peak memory: 0.0275 GB
(1250304,)
✅ Final masks saved in: HistoSweep_output
[generate_final_mask] Current memory: 0.0000 GB; Peak memory: 0.3017 GB
Running successfully!
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/S2Omics_output/pickle_files/qc_preserve_indicator.pickle
Image loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/he.jpg
0 0
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/S2Omics_output/pickle_files/shapes.pickle
[compute_metrics_memory_optimized] Current memory: 0.0466 GB; Peak memory: 1.0760 GB
[compute_low_density_mask] Current memory: 0.0012 GB; Peak memory: 0.0729 GB
Total selected for density filtering:  153432
✅ Entropy map saved as 'glcm_entropy_map_colored.png'
✅ Energy map saved as 'glcm_energy_map_colored.png'
✅ Homogeneity map saved as 'glcm_homogeneity_map_colored.png'

=== GLCM Metric Means ===
   homogeneity    energy   entropy
0     0.473863  0.104302  0.771179
1     0.768603  0.391862  0.425878
2     0.629004  0.196726  0.613754
3     0.369296  0.048044  0.865714

=== Cluster Scores ===
Cluster 0: Score = -0.1930
Cluster 1: Score = 0.7346
Cluster 2: Score = 0.2120
Cluster 3: Score = -0.4484

=== Number of Observations per Cluster ===
Cluster 0: 5641
Cluster 1: 1923
Cluster 2: 5789
Cluster 3: 7664
Total: 21017

✅ Clustered texture map saved as 'cluster_labels_colored.png'
[run_texture_analysis] Current memory: 0.0012 GB; Peak memory: 2.9821 GB
[run_ratio_filtering] Current memory: 0.0012 GB; Peak memory: 0.0275 GB
(1250304,)
✅ Final masks saved in: HistoSweep_output
[generate_final_mask] Current memory: 0.0000 GB; Peak memory: 0.3017 GB
Running successfully!
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/S2Omics_output/pickle_files/qc_preserve_indicator.pickle
../_images/notebooks_Tutorial_3_Consecutive_ROI_selection_breast_4_1.png

Step 3: Histology feature extraction

[3]:
from s2omics.p3_feature_extraction import histology_feature_extraction

# down_samp_step: the down-sampling step,
# default = 10 refers to only extract features for superpixels whose row_index and col_index can both be divided by 10 (roughly 1:100 down-sampling rate).
# down_samp_step = 1 means extract features for every superpixel
for (prefix, save_folder) in zip(prefix_list, save_folder_list):
    histology_feature_extraction(prefix, save_folder,
                                 foundation_model='uni',
                                 ckpt_path='../checkpoints/uni/',
                                 device='cuda:0',
                                 batch_size=32,
                                 down_samp_step=10,
                                 num_workers=4)
/data1/msyuan/anaconda3/envs/S2Omics/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Histology foundation model loaded!
    Foundation model name: uni
    Start extracting histology feature embeddings...
Image loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/he.jpg
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/pickle_files/num_patches.pickle
  0%|          | 0/384 [00:00<?, ?it/s]
Batch 0:
Shape of patches: torch.Size([32, 3, 224, 224])
Shape of positions[0]: torch.Size([32])
Content of positions[0][:10]: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Content of positions[1][:10]: tensor([   0,  160,  320,  480,  640,  800,  960, 1120, 1280, 1440])
Shape of feature_emb: torch.Size([32, 197, 1024])
Shape of patch_emb: torch.Size([32, 1024, 14, 14])
100%|█████████▉| 383/384 [04:45<00:00,  1.27it/s]
Part 0 patch number: 12257
100%|██████████| 384/384 [04:46<00:00,  1.34it/s]
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/pickle_files/uni_embeddings_downsamp_10_part_0.pickle

Histology foundation model loaded!
    Foundation model name: uni
    Start extracting histology feature embeddings...
Image loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/he.jpg
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/S2Omics_output/pickle_files/num_patches.pickle
  0%|          | 0/395 [00:00<?, ?it/s]
Batch 0:
Shape of patches: torch.Size([32, 3, 224, 224])
Shape of positions[0]: torch.Size([32])
Content of positions[0][:10]: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Content of positions[1][:10]: tensor([   0,  160,  320,  480,  640,  800,  960, 1120, 1280, 1440])
Shape of feature_emb: torch.Size([32, 197, 1024])
Shape of patch_emb: torch.Size([32, 1024, 14, 14])
100%|█████████▉| 394/395 [05:10<00:00,  1.27it/s]
Part 0 patch number: 12614
100%|██████████| 395/395 [05:11<00:00,  1.27it/s]
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/S2Omics_output/pickle_files/uni_embeddings_downsamp_10_part_0.pickle

Histology foundation model loaded!
    Foundation model name: uni
    Start extracting histology feature embeddings...
Image loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/he.jpg
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/S2Omics_output/pickle_files/num_patches.pickle
  0%|          | 0/395 [00:00<?, ?it/s]
Batch 0:
Shape of patches: torch.Size([32, 3, 224, 224])
Shape of positions[0]: torch.Size([32])
Content of positions[0][:10]: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Content of positions[1][:10]: tensor([   0,  160,  320,  480,  640,  800,  960, 1120, 1280, 1440])
Shape of feature_emb: torch.Size([32, 197, 1024])
Shape of patch_emb: torch.Size([32, 1024, 14, 14])
100%|█████████▉| 394/395 [05:08<00:00,  1.28it/s]
Part 0 patch number: 12614
100%|██████████| 395/395 [05:09<00:00,  1.28it/s]
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/S2Omics_output/pickle_files/uni_embeddings_downsamp_10_part_0.pickle

Step 4: Joint histology segmentation

[4]:
from s2omics.multiple_sections.p4_get_histology_segmentation import get_joint_histology_segmentation

get_joint_histology_segmentation(prefix_list, save_folder_list,
                                 foundation_model='uni',
                                 down_samp_step=10,
                                 clustering_method='kmeans',
                                 n_clusters=20)
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/pickle_files/shapes.pickle
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/pickle_files/qc_preserve_indicator.pickle
Loading histology feature embeddings for image 0...
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/pickle_files/uni_embeddings_downsamp_10_part_0.pickle
Sucessfully loaded and normalized all histology feature embeddings!
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/S2Omics_output/pickle_files/shapes.pickle
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/S2Omics_output/pickle_files/qc_preserve_indicator.pickle
Loading histology feature embeddings for image 1...
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/S2Omics_output/pickle_files/uni_embeddings_downsamp_10_part_0.pickle
Sucessfully loaded and normalized all histology feature embeddings!
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/S2Omics_output/pickle_files/shapes.pickle
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/S2Omics_output/pickle_files/qc_preserve_indicator.pickle
Loading histology feature embeddings for image 2...
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/S2Omics_output/pickle_files/uni_embeddings_downsamp_10_part_0.pickle
Sucessfully loaded and normalized all histology feature embeddings!
2025-10-15 17:43:06,918 - harmonypy - INFO - Computing initial centroids with sklearn.KMeans...
2025-10-15 17:43:17,014 - harmonypy - INFO - sklearn.KMeans initialization complete.
2025-10-15 17:43:17,074 - harmonypy - INFO - Iteration 1 of 10
2025-10-15 17:43:20,632 - harmonypy - INFO - Iteration 2 of 10
2025-10-15 17:43:24,161 - harmonypy - INFO - Iteration 3 of 10
2025-10-15 17:43:27,401 - harmonypy - INFO - Iteration 4 of 10
2025-10-15 17:43:28,962 - harmonypy - INFO - Iteration 5 of 10
2025-10-15 17:43:30,433 - harmonypy - INFO - Iteration 6 of 10
2025-10-15 17:43:33,569 - harmonypy - INFO - Converged after 6 iterations
Start segmenting the histology image, clustering method: kmeans
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/pickle_files/cluster_image.pickle
Segmentation image is stored at: ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/image_files/cluster_image_num_clusters_20.jpg
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/S2Omics_output/pickle_files/cluster_image.pickle
Segmentation image is stored at: ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/S2Omics_output/image_files/cluster_image_num_clusters_20.jpg
../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/S2Omics_output/pickle_files/cluster_image.pickle
Segmentation image is stored at: ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/S2Omics_output/image_files/cluster_image_num_clusters_20.jpg
../_images/notebooks_Tutorial_3_Consecutive_ROI_selection_breast_8_3.png
../_images/notebooks_Tutorial_3_Consecutive_ROI_selection_breast_8_4.png
../_images/notebooks_Tutorial_3_Consecutive_ROI_selection_breast_8_5.png

Step 5: Select best ROI for spatial omics experiment

[5]:
from s2omics.multiple_sections.p5_roi_selection_rectangle import roi_selection_for_multiple_sections

# fusion_weights: the weight of three scores, default=[0.33,0.33,0.33], the sum of three weights should be equal to 1 (if not they will be normalized)
# positive_prior, negative_prior: prior information about interested and not-interested histology clusters, default = [],[]
# prior_preference: the larger this parameter is, S2Omics will focus more on those interested histology clusters, default=  1
roi_selection_for_multiple_sections(prefix_list, save_folder_list,
                                    down_samp_step=10,
                                    roi_size=[1.5,1.5],
                                    rotation_seg=6,
                                    num_roi=1, #0 refers to automatiacally determine the number of ROI
                                    fusion_weights=[0.33,0.33,0.33],
                                    emphasize_clusters=[], discard_clusters=[],
                                    prior_preference=1)
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/pickle_files/shapes.pickle
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/pickle_files/cluster_image.pickle
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/S2Omics_output/pickle_files/shapes.pickle
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g2/S2Omics_output/pickle_files/cluster_image.pickle
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/S2Omics_output/pickle_files/shapes.pickle
Pickle loaded from ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g3/S2Omics_output/pickle_files/cluster_image.pickle
[(103, 119), (106, 119), (106, 119)]
Sampling ROI candidates...
100%|██████████| 3600/3600 [00:01<00:00, 3033.46it/s]
Current best ROI: [[[45, 31], [63, 31], [63, 49], [45, 49]]]
    roi score: 0.7813556906243225
    scale score: 0.5218310997663257
    valid score: 0.9984555975339682
    balance score: 0.915561715045757
Current number of ROIs is 1.
Find the best 1 ROI(s) with:
    ROI score: 0.7813556906243225
    Scale score: 0.5218310997663257
    Coverage score: 0.9984555975339682
    Balance score: 0.915561715045757

../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/roi_selection_detailed_output/circle_roi_size_1.5_1.5/prior_preference_1/best_roi.pickle
Best ROI on histology segmentation image is stored at ../demo/Tutorial_3_Consecutive_ROI_selection_breast/breast_cancer_g1/S2Omics_output/main_output/best_roi_on_histology_segmentations.jpg
../_images/notebooks_Tutorial_3_Consecutive_ROI_selection_breast_10_3.png
[ ]: