Preserving Data–Statistic Bijection in Lets-Plot-Kotlin¶

Some statistical geometries in Lets-Plot-Kotlin (such as geomSina()) generate their own statistical data, while still keeping a one-to-one correspondence with the original input data points. Previously, this correspondence was not preserved in the mapping: if you mapped an aesthetic (e.g., color) to a column from the original dataset, all points could end up with an aggregated value.

Now, Lets-Plot-Kotlin preserves the bijection between data and statistics for such geometries. This means you can safely map aesthetics to variables from the original dataset, and they will be correctly aligned with the statistical output.

In [1]:
%useLatestDescriptors
%use dataframe
%use lets-plot
In [2]:
LetsPlot.getInfo()
Out[2]:
Lets-Plot Kotlin API v.4.12.0. Frontend: Notebook with dynamically loaded JS. Lets-Plot JS v.4.8.1.
Outputs: Web (HTML+JS), Kotlin Notebook (Swing), Static SVG (hidden)
In [3]:
val url = "https://raw.githubusercontent.com/JetBrains/lets-plot-docs/refs/heads/master/data/mpg.csv"
val df = DataFrame.readCSV(url)
val data = df.toMap()
println("${df.rowsCount()} x ${df.columnsCount()}")
df.head()
234 x 12
Out[3]:

DataFrame: rowsCount = 5, columnsCount = 12

untitledmanufacturermodeldisplyearcyltransdrvctyhwyflclass
1audia41,80000019994auto(l5)f1829pcompact
2audia41,80000019994manual(m5)f2129pcompact
3audia42,00000020084manual(m6)f2031pcompact
4audia42,00000020084auto(av)f2130pcompact
5audia42,80000019996auto(l5)f1626pcompact

Map Columns to the Aesthetics¶

Sina Stat¶

In [4]:
letsPlot(data) { x = "drv"; y = "hwy" } +
    geomViolin() +
    geomSina(seed = 42) {
        color = "displ"
        size = "cyl"
    } +
    scaleSize(range = 2.0 to 4.0)    
Out[4]:
f 4 r 15 20 25 30 35 40 45 hwy drv displ 2 3 4 5 6 7 cyl 4 5 6 7 8

Q-Q Stat¶

In [5]:
letsPlot(data) +
    geomQQ {
        sample = "hwy"
        color = "displ"
        size = "cyl"
    } +
    scaleSize(range = 3.0 to 6.0)
Out[5]:
-3 -2 -1 0 1 2 3 15 20 25 30 35 40 45 sample theoretical displ 2 3 4 5 6 7 cyl 4 5 6 7 8

Show Column Values in Tooltips¶

For the above-mentioned statistics, the tooltips can display not only the mapped values, but also any columns from the original dataframe.

In [6]:
letsPlot(data) { sample = "hwy" } +
    geomQQLine(color = "teal") +
    geomQQ(
        size = 3.0,
        shape = 21,
        color = "black",
        fill = "gold",
        alpha = 0.5,
        tooltips = layerTooltips()
            .title("@manufacturer @model")
            .line("theoretical|@..theoretical..")
            .line("highway mileage (sample)|@..sample..")
            .line("city mileage|@cty")
            .line("engine displacement in liters|@displ")
            .line("year of manufacturing|@year")
            .line("number of cylinders|@cyl")
            .line("type of transmission|@trans")
            .line("drive type|@drv")
            .line("fuel type|@fl")
            .line("vehicle class|@class")
            .format("year", "d")
            .minWidth(300)
            .anchor("bottom_right")
    ) +
    ggsize(1000, 600)
Out[6]:
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 5 10 15 20 25 30 35 40 45 sample theoretical