Scatter plot with regression line in seaborn

There are two functions in seaborn to create a scatter plot with a regression line: regplot and lmplot. Despite these functions are very similar, they have some minor differences. The output of lmplot is a square figure, requires the data argument and allows visualizing the relationship of the variables based on groups while regplot doesn’t, in addition to other differences.

Sample data

The data below will be used in the examples of this tutorial. Note that the lmplot function requires a pandas data frame as argument while regplot can be used without setting the data argument.

import numpy as np
import pandas as pd
from random import choices

# Seed
rng = np.random.RandomState(0)

# Data simulation
x = rng.uniform(0, 1, 300)
y = 5 * x + rng.normal(0, 2, size = 300)
group = choices(["A", "B"], k = 300)
x = x + rng.uniform(-0.2, 0.2, 300)

# Data set
df = {'x': x, 'y': y, 'group': group}

# Pandas data frame
df = pd.DataFrame(data = df)

Single regression model with regplot

In order to create a scatter plot in seaborn with a regression line pass your data to the regplot function. Note that both the colors and the estimates will be colored in blue by default.

import seaborn as sns

sns.regplot(x = x, y = y)

# Equivalent to:
sns.regplot(x = "x", y = "y", data = df)

Scatter plot in seaborn with regression line

Different colors for points and line

If you need to modify the default colors of the points and the line and the confidence interval you will need to pass dictionaries to the scatter_kws and line_kws arguments, respectively, as shown in the following example.

import seaborn as sns

sns.regplot(x = x, y = y,
            scatter_kws = {"color": "black", "alpha": 0.5},
            line_kws = {"color": "red"})

Different colors for points and line in seaborn regplot

Confidence interval level

Note that the default confidence interval is at 95%. You can use the ci argument to modify the level of confidence or to remove it setting the argument to None.

import seaborn as sns

sns.regplot(x = x, y = y,
            scatter_kws = {"color": "black", "alpha": 0.5},
            line_kws = {"color": "red"},
            ci = 99) # 99% level

Setting the confidence interval level of a regression line in seaborn regplot

Regression lines by group with lmplot

Regression lines based on a categorical variable with lmplot in seaborn

The lmplot function allows creating regression lines based on a categorical variable. You just need to pass the variable to the hue argument of the function. Note that this function requires the data argument with a pandas data frame as input.

import seaborn as sns

sns.lmplot(x = "x", y = "y",
           hue = "group", data = df)

seaborn lmplot function

Different marker for each group

The markers argument allows customizing the shape of the symbols of the plot as shown below.

import seaborn as sns

sns.lmplot(x = "x", y = "y",
           hue = "group", markers = ["s", "x"],
           data = df)

Scatter plot with two regression lines in Python with seaborn

Color palette

Note that you can override the default color palette with the palette argument of the function.

import seaborn as sns

sns.lmplot(x = "x", y = "y",
           hue = "group", markers = ["s", "x"],
           palette = "Set2",
           data = df)

Scatter plot in Python with regression lines in two columns

Plot across different columns

In the previous plots both estimates were displayed over the same plot. If you prefer plotting the estimates across different plots you can pass the categorical variable to the col argument of the function.

import seaborn as sns

sns.lmplot(x = "x", y = "y",
           col = "group", hue = "group",
           data = df)
Storytelling with Data

A Data Visualization Guide for Business Professionals

Buy on Amazon

See also