Messages About Removed Rows¶
Lets-Plot prints a short message above a plot when rows are dropped during its preparation. These messages help you notice when a plot was built from less data than you expected.
Rows can be removed by:
- sampling — reducing an oversized dataset before drawing;
- statistics — filtering out non-finite values before a statistical transform;
- geoms — skipping rows with missing values.
In [8]:
import numpy as np
import pandas as pd
from lets_plot import *
LetsPlot.setup_html()
In [9]:
df = pd.DataFrame({
"x": [1, 2, np.nan, 4, 5, np.nan, 7, 8],
"y": [2, np.nan, 3, 4, 5, np.nan, 7, 8],
})
df_ridges = pd.DataFrame({
"x": [1, 2, np.nan, 4, 1, 2, 3, np.nan],
"y": [0, 0, 0, 0, 1, 1, 1, 1],
})
np.random.seed(42)
big_df = pd.DataFrame({
"x": np.random.normal(size=75_000),
"y": np.random.normal(size=75_000),
})
Sampling¶
When a layer uses sampling=..., the message reports how many rows each sampling step dropped.
In [3]:
ggplot(big_df, aes("x", "y")) + \
geom_point(sampling=sampling_random(500, seed=42) + sampling_systematic(100))
Out[3]:
Statistics¶
Stats drop non-finite rows before computing. The message shows how many rows the stat removed.
In [4]:
ggplot(df_ridges, aes("x", "y")) + geom_area_ridges()
Out[4]:
Geoms¶
A layer with stat='identity' reports rows that have a missing positional value.
In [5]:
ggplot(df, aes("x", "y")) + geom_point()
Out[5]:
Hiding the messages¶
Set na_rm=True on a layer to silence its message, or use theme(plot_message=element_blank()) to hide all messages on the plot.
In [6]:
ggplot(df, aes("x", "y")) + geom_point(na_rm=True)
Out[6]:
In [7]:
ggplot(df_ridges, aes("x", "y")) + geom_area_ridges() + \
theme(plot_message=element_blank())
Out[7]: