clustermap
The clustermap
function is very similar to the heatmap
function. The main difference is that clustermap
will also make and represent a hierarchically-cluster of the rows and the columns of the data.
import numpy as np
import seaborn as sns
# Data simulation
np.random.seed(2)
data = np.random.rand(6, 6)
sns.clustermap(data)
Standardize the data
Note that you can standardize the data by rows (0) or by columns (1) with the standard_scale
argument of the function.
import numpy as np
import seaborn as sns
# Data simulation
np.random.seed(2)
data = np.random.rand(6, 6)
sns.clustermap(data, standard_scale = 1)
Normalize the data
Similarly, you can also normalize the data within rows (0) or within columns (1) with the z_score
argument.
import numpy as np
import seaborn as sns
# Data simulation
np.random.seed(2)
data = np.random.rand(6, 6)
sns.clustermap(data, z_score = 1)
Figure size
Note that you can customize the size of the figure passing a vector of sizes to figsize
, where the first represents the width and the second the height of the figure. In addition, you can set the relative size of the dendrogram respect to the whole plot with dendrogram_ratio
.
import numpy as np
import seaborn as sns
# Data simulation
np.random.seed(2)
data = np.random.rand(6, 6)
sns.clustermap(data,
figsize = (8, 6), # Figure sizes
dendrogram_ratio = 0.1) # Size proportion of the dendrograms
The method used to compute the hierarchical clustering can be selected with the method
argument. Possible options are "single"
, "complete"
, "average"
(default) "weighted"
, "centroid"
, "median"
and "ward"
. In the following blocks of code you can see a couple of examples. See this reference for additional details of each method.
“Single” clustering method
import numpy as np
import seaborn as sns
# Data simulation
np.random.seed(2)
data = np.random.rand(6, 6)
sns.clustermap(data, method = "single")
Ward clustering method
import numpy as np
import seaborn as sns
# Data simulation
np.random.seed(2)
data = np.random.rand(6, 6)
sns.clustermap(data, method = "ward")
The distance metric is the metric used to compute the pairwise distance between observations. Possible metrics are "braycurtis"
, "canberra"
, "chebyshev"
, "cityblock"
, "correlation"
, "cosine"
, "dice"
, "euclidean"
(default), "hamming"
, "jaccard"
, "jensenshannon"
, "kulsinski"
, "mahalanobis"
, "matching"
, "minkowski"
, "rogerstanimoto"
, "russellrao"
, "seuclidean"
, "sokalmichener"
, "sokalsneath"
, "sqeuclidean"
and "yule"
. You will find more info about each metric in the following link.
Canberra metric
import numpy as np
import seaborn as sns
# Data simulation
np.random.seed(2)
data = np.random.rand(6, 6)
sns.clustermap(data, metric = "canberra")
Color palette
The cmap
argument can be used to change the color palette of the clustering heat map.
import numpy as np
import seaborn as sns
# Data simulation
np.random.seed(2)
data = np.random.rand(6, 6)
sns.clustermap(data, cmap = "vlag")
Limits of the color range
Note that you can change the limits of the color range with vmin
and vmax
.
import numpy as np
import seaborn as sns
# Data simulation
np.random.seed(2)
data = np.random.rand(6, 6)
sns.clustermap(data, cmap = "mako",
vmin = -1, vmax = 1)
See also