Usage ===== Overview -------- S2Omics is organized as a modular pipeline: 1. **Histology Preprocessing (p1)** – Scale and pad raw H&E stained images. 2. **Superpixel Quality Control (p2)** – Split image into superpixels and remove low-quality tiles. 3. **Feature Extraction (p3)** – Apply foundation model to obtain deep embeddings of image patches. 4. **Histology Segmentation (p4)** – Cluster embeddings into histology-based morphological clusters. 5. **Cluster Merging (p5)** – Merge highly similar clusters to reduce over-segmentation. 6. **ROI Selection (p6)** – Automatically select optimal Regions of Interest given constraints. 7. **Label Broadcasting (p7)** – Project cell type annotations from ROI-scale spatial omics data to the entire slide. Installation ------------ We recommend Python 3.11+. .. code-block:: bash git clone https://github.com/ddb-qiwang/S2Omics.git cd S2Omics conda create -n S2Omics python=3.11 conda activate S2Omics pip install -r requirements.txt # if GCC is very old: pip install -r requirements_old_gcc.txt Demo Data and Models -------------------- Download from Google Drive: https://drive.google.com/drive/folders/1z1nk0sF_e25LKMyHxJVMtROFjuWet2G_?usp=sharing Place both `checkpoints` and `demo` under S2Omics main folder. Running the Full ROI Selection Pipeline ---------------------------------------- Use `run_roi_selection.sh` to execute all steps p1–p6. Example (select 1 rectangular 6.5mm x 6.5mm ROI at downsampling step=10): .. code-block:: bash chmod +x run_* ./run_roi_selection_demo.sh Typical output: .. image:: images/best_roi_on_histology_segmentations_scaled.jpg :alt: Best ROI example :width: 60% :align: center This runs: 1. **p1_histology_preprocess.py** - Input: Folder with `he-raw.jpg`, `pixel-size-raw.txt` - Output: `he-scaled.jpg`, padded `he.jpg`. 2. **p2_superpixel_quality_control.py** - Splits `he.jpg` into `patch_size` superpixels (default 16×16 px). - Applies density filtering and texture analysis to remove low-quality tiles. 3. **p3_feature_extraction.py** - Loads foundation model (UNI/Virchow/GigaPath). - Extracts two-level embeddings (global 224×224, local patch-level). 4. **p4_get_histology_segmentation.py** - Clusters PCA-reduced embeddings into morphological clusters. - Methods: kmeans, fuzzy c-means, Louvain, Leiden, etc. 5. **p5_merge_over_clusters.py** - Merges clusters with high similarity (hierarchical-linkage based) to target cluster count. 6. **p6_roi_selection_rectangle.py / p6_roi_selection_circle.py** - Uses search + scoring system (scale, coverage, balance) to select ROIs automatically or for given `num_roi`. Running Cell Label Broadcasting ------------------------------- Prerequisite: you have spatial omics ROI-level annotations (`annotation_file.csv`). Example: .. code-block:: bash ./run_label_broadcasting_demo.sh Output example: .. image:: images/S2Omics_whole_slide_prediction_scaled.jpg :alt: Whole slide cell type prediction :width: 60% :align: center This runs p1-p7: **p7_cell_label_broadcasting.py** – Loads histology features from ROI-scale omics and from whole-slide image, trains a small autoencoder-based classifier, and predicts cell type for every valid superpixel across the slide. Input File Formats ------------------ - **he-raw.jpg** – raw histology image. - **pixel-size-raw.txt** – microns/pixel for raw image. - **pixel-size.txt** – target microns/pixel after rescaling. - **annotation_file.csv (optional)** – Required for label broadcasting. Columns: `super_pixel_x, super_pixel_y, annotation`. Example annotation file: +----------------+--------------+--------------+----------------------------+ | barcode | super_pixel_x| super_pixel_y| annotation | +================+==============+==============+============================+ | s_xxx | 267 | 1254 | Myofibroblasts | +----------------+--------------+--------------+----------------------------+ | s_xxx | 270 | 1254 | Epithelial cells (Malignant)| +----------------+--------------+--------------+----------------------------+