Ridgeline plots (joy plots) in matplotlib with joypy

Library

joypy

Author

Leonardo Taccari

Sample data

The joypy library can take a pandas data frame as input with several numerical variables and also an optional categorical variable representing groups, as the sample data frame of the following block of code.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

Ridgeline plots with the joyplot function

By default, if you pass a pandas data frame as input, the joyplot function will create a ridgeline plot of the numerical variables. This is, it will show stacked density charts for each of the numerical variables of the data frame.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df)

# plt.show()

Joy plot in Python with joypy

Ridgeline plot by group

However, you might also want to create a joy plot of a single variable but divided by group. In this scenario there will be as many densities as groups that represent the distribution of the variable for each group. For this purpose you will need to specify the name of the categorical variable with by and the name of the numerical variable (or variables) with column.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, by = "group", column = "var1")

# plt.show()

Ridgeline plot in matplotlib with joypy

Ridgeline plot for each variable and group

The last alternative is to create a ridgeline plot that displays the density for each variable and group, so each group will have as many densities as numerical variables. You can achieve this specifying all the desired numerical variables with column (e.g. column = ["var1", "var2"]) or only using by, as in the example below.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, by = "group")

# plt.show()

Ridgeline plot of several variables in Python

Title of the plot

The joyplot function also provides other arguments to customize the visual appearance of the plots. For instance, with title you can add a title to the figure.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, title = "Title of the joy plot")

# plt.show()

The joypy Python library

Joy plot with legend

In case you want to add a legend to the plot to identify the different variables you can specify legend = True to add an automatic legend.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, by = "group", legend = True)

# plt.show()

Adding a legend to a Python ridgeline plot

Joy plot with histograms

Ridgeline plots can also display histograms instead of density estimations. You will need to set hist = True and specify the number of bins if you want with bins, which defaults to 10.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, by = "group", column = "var1",
                        hist = True, bins = 50)

# plt.show()

Ridgeline plot with histograms in matplotlib

Adding a grid

The grid argument defaults to False and can be used to add a vertical grid when set to True. This feature is interesting to compare the distribution for the different groups.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, by = "group", column = "var1", grid = True)

# plt.show()

Adding a grid to a Python joyplot

Type of density

The kind argument can be used to set the type of density to be created. Possible values are "kde" (the default), "counts", "normalized_counts" and "values".

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, by = "group", column = "var1",
                        kind = "counts")

# plt.show()

Kernel density estimate of a ridgeline plot in matplotlib

Border and fill colors

joypy also provides several ways to customize the visual appearance of the plots. If you want to change the default blue color of the densities you can specify a new color with color.

Color of the densities of a ridgeline plot in Python

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, by = "group", column = "var1", color = "darkseagreen")

# plt.show()

Different color for each group

Note that you can also pass an array of colors as input of the color argument with as many colors as groups.

Different color for each group of a ridgeline plot in matplotlib

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})
colors = ["#FDAE61", "#FEE08B", "#FFFFBF", "#E6F598", "#ABDDA4"]

fig, ax = joypy.joyplot(df, by = "group", column = "var1", color = colors)

# plt.show()

Using color palettes

An alternative to the previous is using matplotlib predefined color palettes with colormap. Recall that you will need to import cm from matplotlib with from matplotlib import cm.

Using a matplotlib color palette to change the color of the ridges of a joyploy

import matplotlib.pyplot as plt
from matplotlib import cm
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, by = "group", column = "var1", colormap = cm.Pastel1)

# plt.show()

Color transparency

Sometimes the densities overlap one of each other. If this happens you can set fade = True so each density will be more opaque than the previous (from top to bottom), allowing a better visualization of the densities.

Change the opacity of the color of the density ridges with joypy

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, by = "group", column = "var1", fade = True)

# plt.show()

Color of the lines

The density lines are black by default, but with linecolor you can choose the color you prefer. In the following example we are setting it to white.

Color of the density lines of a ridgeline plot made with joypy

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, by = "group", column = "var1", linecolor = "white")

# plt.show()

Remove the fill color

Note that you can also remove the area of the densities and just left the line setting fill = False.

Remove the fill color of the densities of the Python joy plot

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, by = "group", column = "var1", fill = False)

# plt.show()

Background color

Finally, you can also customize the background color of the plot with background, which defaults to None.

Change the background color of a joypy plot

import matplotlib.pyplot as plt
from matplotlib import cm
import pandas as pd
import numpy as np; np.random.seed(2)
import random; random.seed(2)
import joypy

# Sample data
df = pd.DataFrame({'var1': np.random.normal(70, 100, 500),
                   'var2': np.random.normal(250, 100, 500),
                   'group': random.choices(["G1", "G2", "G3", "G4", "G5"], k = 500)})

fig, ax = joypy.joyplot(df, background = "lavender")

# plt.show()
Data Sketches

A journey of imagination, exploration, and beautiful data visualizations

Buy on Amazon
Fundamentals of Data Visualization

A Primer on Making Informative and Compelling Figures

Buy on Amazon

See also