Messages About Removed Rows

Lets-Plot prints a short message above a plot when rows are dropped during its preparation. These messages help you notice when a plot was built from less data than you expected.

Rows can be removed by:

  • sampling — reducing an oversized dataset before drawing;
  • statistics — filtering out non-finite values before a statistical transform;
  • geoms — skipping rows with missing values.
In [8]:
import numpy as np
import pandas as pd

from lets_plot import *

LetsPlot.setup_html()
In [9]:
df = pd.DataFrame({
    "x": [1, 2, np.nan, 4, 5, np.nan, 7, 8],
    "y": [2, np.nan, 3, 4, 5, np.nan, 7, 8],
})

df_ridges = pd.DataFrame({
    "x": [1, 2, np.nan, 4, 1, 2, 3, np.nan],
    "y": [0, 0, 0, 0, 1, 1, 1, 1],
})

np.random.seed(42)
big_df = pd.DataFrame({
    "x": np.random.normal(size=75_000),
    "y": np.random.normal(size=75_000),
})

Sampling

When a layer uses sampling=..., the message reports how many rows each sampling step dropped.

In [3]:
ggplot(big_df, aes("x", "y")) + \
    geom_point(sampling=sampling_random(500, seed=42) + sampling_systematic(100))
Out[3]:

Statistics

Stats drop non-finite rows before computing. The message shows how many rows the stat removed.

In [4]:
ggplot(df_ridges, aes("x", "y")) + geom_area_ridges()
Out[4]:

Geoms

A layer with stat='identity' reports rows that have a missing positional value.

In [5]:
ggplot(df, aes("x", "y")) + geom_point()
Out[5]:

Hiding the messages

Set na_rm=True on a layer to silence its message, or use theme(plot_message=element_blank()) to hide all messages on the plot.

In [6]:
ggplot(df, aes("x", "y")) + geom_point(na_rm=True)
Out[6]:
In [7]:
ggplot(df_ridges, aes("x", "y")) + geom_area_ridges() + \
    theme(plot_message=element_blank())
Out[7]: