Supplementary website for "Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data" (Doi: 10.1038/nbt.4260)


Running HMRF analysis locally: software installation and usage guide


Install the R smfishhmrf package, which has the core HMRF implementation.
install_bitbucket("qzhudfci/smfishhmrf-r", ref="master")
Install the Python smfishHmrf package, which is the Python interface for using smfishHmrf.
pip3 install --user smfishHmrf
Note: if upgrading from a previous installation:
pip3 install --user --upgrade --no-cache-dir --no-deps smfishHmrf


Input data

Download the input data files here. In general, we need 3 files:

fcortex.coordinates.txt (eg.) - Physical coordinates of single cells. Space-separated. Each row is a cell. Columns: 1. Cell index (starts at 1); 2. field (default 0 if single field); 3,4. X,Y coordinates.

fcortex.expression.txt (eg.) - Single-cell gene expression. Space-separated. Each row is a cell. Column 1: Cell index. Remaining columns: Genes (order of genes specified in genes file below). Values: gene expression. We already performed log2 expression, followed by row-wise, column-wise z-scoring in this file.

genes (eg.) - Gene names. Each row is a gene. Gene in nth row must corresponds to (n+1)th column in expression file. Thus this file specifies the gene names for the expression file.

Steps in Python

Import relevant libraries.
import sys
import math
import os
import numpy as np
import pandas as pd
import smfishHmrf.reader as reader
from smfishHmrf.HMRFInstance import HMRFInstance
from smfishHmrf.DatasetMatrix import DatasetMatrix, DatasetMatrixSingleField
import smfishHmrf.visualize as visualize
import smfishHmrf.spatial as spatial
Run the following code:
directory = "workdir"
# read input files
genes=reader.read_genes("%s/genes" % directory)
Xcen,field=reader.read_coordinates("%s/fcortex.coordinates.txt" % directory)
expr=reader.read_expression_matrix("%s/fcortex.expression.txt" % directory)

# create a DatasetMatrixSingleField instance with the input files
this_dset = DatasetMatrixSingleField(expr, genes, None, Xcen)

# compute neighbor graph (first test various distance cutoff: 0.3%, 0.5%, 1% of lowest pairwise distances)
this_dset.test_adjacency_list([0.3, 0.5, 1], metric="euclidean")
# use cutoff of 0.3% (gives ~5 neighbors/cell)
this_dset.calc_neighbor_graph(0.3, metric="euclidean")

# read the HMRF genes (69-gene). Then subset the data to these genes.
new_genes = reader.read_genes("%s/HMRF.genes" % directory)
new_dset = this_dset.subset_genes(new_genes)

outdir = "spatial.jul20"

# This runs HMRF for a fixed K, multiple beta settings.
# K=9, beta=(0,0.5,30) which means start_beta=0, beta_incr=0.5, num_betas_to_run=30. 
this_hmrf = HMRFInstance("cortex", outdir, new_dset, 9, (0, 0.5, 30), tolerance=1e-20)
this_hmrf.init(nstart=1000, seed=-1)

visualize.domain(this_hmrf, 9, 9.0, dot_size=45, size_factor=10, outfile="visualize.beta.%.1f.png" % 9.0)


This website is created by Arpan Sarkar, Qian Zhu in Guo-Cheng Yuan's lab.