6. Advanced plotting

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import seaborn as sns

We have seen already two options to plot data: we can use the "raw" Matplotlib which in principle allows one to create any possible plot, however with lots of code, and we saw the simpler internal Pandas solution. While the latter solution is very practical to quickly look through data, it is rather cumbersome to realise more complex plots.

Here we look at another type of plotting resting on the concepts of the grammar of graphics. This approach allows to create complex plots where data can be simply split in a plot into color, shapes etc. without having to do a grouping operation in beforehand. We will mainly look at Seaborn, and finish with an example with Plotnine, the port to Python of ggplot.

Importing data

We come back here to the dataset of swiss towns. To make the dataset more interestig we add to it some categorical data. First we attempt to add the main language for each town. It is a good example of the type of data wranglig one ofen has to do by combining information from different sources.

In [2]:
#load table indicating to which canton each town belongs
cantons = pd.read_excel('Datasets/be-b-00.04-osv-01.xls',sheet_name=1)[['KTKZ','ORTNAME']]
In [3]:
#load general table with infos on towns
towns = pd.read_excel('Datasets/2018.xls', skiprows=list(range(5))+list(range(6,9)),
                      skipfooter=34, index_col='Commune',na_values=['*','X'])
towns = towns.reset_index()
In [4]:
#merge tables using the town name. This adds the canton abbreviation to the main table 
towns_canton = pd.merge(towns, cantons, left_on='Commune', right_on='ORTNAME',how = 'inner')
In [5]:
#load data indicating languages of each canton
language = pd.read_excel('Datasets/je-f-01.08.01.02.xlsx',skiprows=[0,2,3,4],skipfooter=11)
languages = language[['Allemand (ou suisse allemand)','Français (ou patois romand)',
         'Italien (ou dialecte tessinois/italien des grisons)']]
languages = languages.apply(pd.to_numeric, errors='coerce')
#check which language has majority in each canton
languages['language'] = np.argmax(languages.values.astype(float),axis=1)
code={0:'German', 1:'French', 2:'Italian'}
languages['Language'] = languages.language.apply(lambda x: code[x])
languages['canton'] = language['Unnamed: 0']
languages = languages[['canton','Language']]

#load table matching canton name to abbreviation
cantons_abbrev = pd.read_excel('Datasets/cantons_abbrev.xlsx')
#add full canton name to table by merging on abbreviation
canton_language = pd.merge(languages, cantons_abbrev,on='canton')
In [6]:
#add language by merging on canton abbreviation
towns_language = pd.merge(towns_canton, canton_language, left_on='KTKZ', right_on='abbrev')
In [7]:
towns_language['town_type'] = towns_language['Surface agricole en %'].apply(lambda x: 'Land' if x<50 else 'City')
In [8]:
#Create a new party column and a new party score column
parties = pd.melt(towns_language,id_vars=['Commune'], value_vars=['UDC','PS','PDC'], 
                  var_name= 'Party', value_name='Party score')
towns_language = pd.merge(parties, towns_language, on='Commune')

towns_language
Out[8]:
Commune Party Party score Code commune Habitants Variation en % Densité de la population par km² Etrangers en % 0-19 ans 20-64 ans ... PBD PST/Sol. PES Petits partis de droite KTKZ ORTNAME canton Language abbrev town_type
0 Aeugst am Albis UDC 30.929249 1 1977 8.388158 249.936789 13.100658 20.586748 62.822458 ... 2.617442 0.167638 7.075094 4.888178 ZH Aeugst am Albis Zurich German ZH City
1 Aeugst am Albis PS 18.645940 1 1977 8.388158 249.936789 13.100658 20.586748 62.822458 ... 2.617442 0.167638 7.075094 4.888178 ZH Aeugst am Albis Zurich German ZH City
2 Aeugst am Albis PDC 2.076428 1 1977 8.388158 249.936789 13.100658 20.586748 62.822458 ... 2.617442 0.167638 7.075094 4.888178 ZH Aeugst am Albis Zurich German ZH City
3 Affoltern am Albis UDC 33.785785 2 11900 7.294203 1123.701605 27.848740 20.285714 62.201681 ... 4.164299 0.190049 6.211047 1.768197 ZH Affoltern am Albis Zurich German ZH Land
4 Affoltern am Albis PS 19.080314 2 11900 7.294203 1123.701605 27.848740 20.285714 62.201681 ... 4.164299 0.190049 6.211047 1.768197 ZH Affoltern am Albis Zurich German ZH Land
5 Affoltern am Albis PDC 4.585387 2 11900 7.294203 1123.701605 27.848740 20.285714 62.201681 ... 4.164299 0.190049 6.211047 1.768197 ZH Affoltern am Albis Zurich German ZH Land
6 Bonstetten UDC 29.100156 3 5435 5.349874 731.493943 14.149034 23.808648 60.717571 ... 3.803108 0.112518 6.661066 1.915807 ZH Bonstetten Zurich German ZH City
7 Bonstetten PS 20.403265 3 5435 5.349874 731.493943 14.149034 23.808648 60.717571 ... 3.803108 0.112518 6.661066 1.915807 ZH Bonstetten Zurich German ZH City
8 Bonstetten PDC 3.378541 3 5435 5.349874 731.493943 14.149034 23.808648 60.717571 ... 3.803108 0.112518 6.661066 1.915807 ZH Bonstetten Zurich German ZH City
9 Hausen am Albis UDC 34.937369 4 3571 6.279762 262.573529 14.533744 22.738729 60.403248 ... 4.656087 0.193911 8.021665 1.825436 ZH Hausen am Albis Zurich German ZH City
10 Hausen am Albis PS 19.393305 4 3571 6.279762 262.573529 14.533744 22.738729 60.403248 ... 4.656087 0.193911 8.021665 1.825436 ZH Hausen am Albis Zurich German ZH City
11 Hausen am Albis PDC 2.881915 4 3571 6.279762 262.573529 14.533744 22.738729 60.403248 ... 4.656087 0.193911 8.021665 1.825436 ZH Hausen am Albis Zurich German ZH City
12 Hedingen UDC 30.114599 5 3687 8.123167 564.624809 14.971522 22.484405 62.110117 ... 3.768864 0.227988 6.466387 1.840045 ZH Hedingen Zurich German ZH Land
13 Hedingen PS 22.478008 5 3687 8.123167 564.624809 14.971522 22.484405 62.110117 ... 3.768864 0.227988 6.466387 1.840045 ZH Hedingen Zurich German ZH Land
14 Hedingen PDC 3.918166 5 3687 8.123167 564.624809 14.971522 22.484405 62.110117 ... 3.768864 0.227988 6.466387 1.840045 ZH Hedingen Zurich German ZH Land
15 Kappel am Albis UDC 48.615099 6 1110 20.915033 140.151515 18.018018 26.486486 60.180180 ... 5.134268 0.312447 4.382706 2.769802 ZH Kappel am Albis Zurich German ZH City
16 Kappel am Albis PS 10.285425 6 1110 20.915033 140.151515 18.018018 26.486486 60.180180 ... 5.134268 0.312447 4.382706 2.769802 ZH Kappel am Albis Zurich German ZH City
17 Kappel am Albis PDC 2.744469 6 1110 20.915033 140.151515 18.018018 26.486486 60.180180 ... 5.134268 0.312447 4.382706 2.769802 ZH Kappel am Albis Zurich German ZH City
18 Knonau UDC 32.876136 7 2168 20.444444 335.085008 17.158672 24.077491 60.885609 ... 5.944968 0.008415 5.801919 3.437395 ZH Knonau Zurich German ZH City
19 Knonau PS 18.436553 7 2168 20.444444 335.085008 17.158672 24.077491 60.885609 ... 5.944968 0.008415 5.801919 3.437395 ZH Knonau Zurich German ZH City
20 Knonau PDC 3.126052 7 2168 20.444444 335.085008 17.158672 24.077491 60.885609 ... 5.944968 0.008415 5.801919 3.437395 ZH Knonau Zurich German ZH City
21 Maschwanden UDC 43.383446 8 626 1.623377 133.475480 12.140575 23.162939 59.744409 ... 3.452833 0.644309 5.170990 2.114654 ZH Maschwanden Zurich German ZH City
22 Maschwanden PS 22.732529 8 626 1.623377 133.475480 12.140575 23.162939 59.744409 ... 3.452833 0.644309 5.170990 2.114654 ZH Maschwanden Zurich German ZH City
23 Maschwanden PDC 3.502396 8 626 1.623377 133.475480 12.140575 23.162939 59.744409 ... 3.452833 0.644309 5.170990 2.114654 ZH Maschwanden Zurich German ZH City
24 Mettmenstetten UDC 35.671015 9 4861 14.565166 373.062164 14.873483 22.341082 60.851677 ... 4.352704 0.133017 5.059457 3.569025 ZH Mettmenstetten Zurich German ZH City
25 Mettmenstetten PS 18.800282 9 4861 14.565166 373.062164 14.873483 22.341082 60.851677 ... 4.352704 0.133017 5.059457 3.569025 ZH Mettmenstetten Zurich German ZH City
26 Mettmenstetten PDC 3.649155 9 4861 14.565166 373.062164 14.873483 22.341082 60.851677 ... 4.352704 0.133017 5.059457 3.569025 ZH Mettmenstetten Zurich German ZH City
27 Obfelden UDC 36.174029 10 5131 9.496372 680.503979 20.015591 23.055935 60.202690 ... 4.358116 0.086903 3.705416 2.982453 ZH Obfelden Zurich German ZH Land
28 Obfelden PS 16.922138 10 5131 9.496372 680.503979 20.015591 23.055935 60.202690 ... 4.358116 0.086903 3.705416 2.982453 ZH Obfelden Zurich German ZH Land
29 Obfelden PDC 6.488176 10 5131 9.496372 680.503979 20.015591 23.055935 60.202690 ... 4.358116 0.086903 3.705416 2.982453 ZH Obfelden Zurich German ZH Land
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
6318 Muriaux UDC 14.609053 6753 504 3.703704 29.857820 7.341270 19.841270 60.515873 ... NaN 4.320988 8.641975 NaN JU Muriaux Jura French JU City
6319 Muriaux PS 8.641975 6753 504 3.703704 29.857820 7.341270 19.841270 60.515873 ... NaN 4.320988 8.641975 NaN JU Muriaux Jura French JU City
6320 Muriaux PDC 20.370370 6753 504 3.703704 29.857820 7.341270 19.841270 60.515873 ... NaN 4.320988 8.641975 NaN JU Muriaux Jura French JU City
6321 Le Noirmont UDC 8.346334 6754 1845 10.877404 90.485532 15.880759 21.680217 61.517615 ... NaN 2.964119 7.332293 NaN JU Le Noirmont Jura French JU City
6322 Le Noirmont PS 25.663027 6754 1845 10.877404 90.485532 15.880759 21.680217 61.517615 ... NaN 2.964119 7.332293 NaN JU Le Noirmont Jura French JU City
6323 Le Noirmont PDC 15.834633 6754 1845 10.877404 90.485532 15.880759 21.680217 61.517615 ... NaN 2.964119 7.332293 NaN JU Le Noirmont Jura French JU City
6324 Saignelégier UDC 9.322820 6757 2556 2.240000 80.707294 8.020344 21.244131 60.015649 ... NaN 5.287570 9.137291 NaN JU Saignelégier Jura French JU Land
6325 Saignelégier PS 24.768089 6757 2556 2.240000 80.707294 8.020344 21.244131 60.015649 ... NaN 5.287570 9.137291 NaN JU Saignelégier Jura French JU Land
6326 Saignelégier PDC 25.278293 6757 2556 2.240000 80.707294 8.020344 21.244131 60.015649 ... NaN 5.287570 9.137291 NaN JU Saignelégier Jura French JU Land
6327 Soubey UDC 21.739130 6759 134 -8.843537 9.933284 2.238806 14.925373 56.716418 ... NaN 0.000000 8.695652 NaN JU Soubey Jura French JU Land
6328 Soubey PS 15.217391 6759 134 -8.843537 9.933284 2.238806 14.925373 56.716418 ... NaN 0.000000 8.695652 NaN JU Soubey Jura French JU Land
6329 Soubey PDC 42.028986 6759 134 -8.843537 9.933284 2.238806 14.925373 56.716418 ... NaN 0.000000 8.695652 NaN JU Soubey Jura French JU Land
6330 Alle UDC 8.108108 6771 1817 7.769870 171.415094 10.621904 24.766098 54.980737 ... NaN 1.719902 4.176904 NaN JU Alle Jura French JU City
6331 Alle PS 14.557740 6771 1817 7.769870 171.415094 10.621904 24.766098 54.980737 ... NaN 1.719902 4.176904 NaN JU Alle Jura French JU City
6332 Alle PDC 48.341523 6771 1817 7.769870 171.415094 10.621904 24.766098 54.980737 ... NaN 1.719902 4.176904 NaN JU Alle Jura French JU City
6333 Beurnevésin UDC 16.279070 6773 127 -8.633094 24.950884 6.299213 12.598425 48.818898 ... NaN 3.100775 6.201550 NaN JU Beurnevésin Jura French JU City
6334 Beurnevésin PS 6.976744 6773 127 -8.633094 24.950884 6.299213 12.598425 48.818898 ... NaN 3.100775 6.201550 NaN JU Beurnevésin Jura French JU City
6335 Beurnevésin PDC 48.062016 6773 127 -8.633094 24.950884 6.299213 12.598425 48.818898 ... NaN 3.100775 6.201550 NaN JU Beurnevésin Jura French JU City
6336 Boncourt UDC 8.318099 6774 1205 -7.307692 133.740289 10.539419 17.925311 52.614108 ... NaN 0.548446 2.285192 NaN JU Boncourt Jura French JU Land
6337 Boncourt PS 22.760512 6774 1205 -7.307692 133.740289 10.539419 17.925311 52.614108 ... NaN 0.548446 2.285192 NaN JU Boncourt Jura French JU Land
6338 Boncourt PDC 44.972578 6774 1205 -7.307692 133.740289 10.539419 17.925311 52.614108 ... NaN 0.548446 2.285192 NaN JU Boncourt Jura French JU Land
6339 Bonfol UDC 21.452703 6775 665 -2.349486 48.969072 9.172932 15.789474 53.984962 ... NaN 2.364865 6.418919 NaN JU Bonfol Jura French JU Land
6340 Bonfol PS 15.371622 6775 665 -2.349486 48.969072 9.172932 15.789474 53.984962 ... NaN 2.364865 6.418919 NaN JU Bonfol Jura French JU Land
6341 Bonfol PDC 31.418919 6775 665 -2.349486 48.969072 9.172932 15.789474 53.984962 ... NaN 2.364865 6.418919 NaN JU Bonfol Jura French JU Land
6342 Bure UDC 5.503356 6778 685 3.162651 50.036523 6.131387 19.854015 54.306569 ... NaN 0.671141 1.208054 NaN JU Bure Jura French JU Land
6343 Bure PS 8.859060 6778 685 3.162651 50.036523 6.131387 19.854015 54.306569 ... NaN 0.671141 1.208054 NaN JU Bure Jura French JU Land
6344 Bure PDC 49.395973 6778 685 3.162651 50.036523 6.131387 19.854015 54.306569 ... NaN 0.671141 1.208054 NaN JU Bure Jura French JU Land
6345 Coeuve UDC 8.194444 6781 732 8.124077 62.994836 3.551913 25.136612 56.147541 ... NaN 2.222222 5.000000 NaN JU Coeuve Jura French JU City
6346 Coeuve PS 24.722222 6781 732 8.124077 62.994836 3.551913 25.136612 56.147541 ... NaN 2.222222 5.000000 NaN JU Coeuve Jura French JU City
6347 Coeuve PDC 36.527778 6781 732 8.124077 62.994836 3.551913 25.136612 56.147541 ... NaN 2.222222 5.000000 NaN JU Coeuve Jura French JU City

6348 rows × 51 columns

Basic plotting

We finally have a table with mostly numerical information but also two categorical data: language and town type (land or city). With Seaborn we can now easily make all sorts of plots. For example what are the average scores of the different parties:

In [9]:
sns.barplot(data = towns_language, y='Party score', x = 'Party');
/usr/local/lib/python3.5/dist-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

Do land towns vote more for the right-wing party ?

In [10]:
g = sns.scatterplot(data = towns_language, y='UDC', x = 'Surface agricole en %', s = 10, alpha = 0.5);
g.set_xlim([0,100]);

Using categories as "aesthetics"

The greate advantage of using these packages is that they allow to include categories as "aesthetics" of the plot. For example we looked before at average party scores. But are they different between language regions ? We can just specify that the hue (color) should be mapped to the town language:

In [11]:
sns.barplot(data = towns_language, y='Party score', x = 'Party', hue = 'Language');
/usr/local/lib/python3.5/dist-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

Similarly with scatter plots. Is the relation between land and voting on the right language dependent ?

In [12]:
g = sns.scatterplot(data = towns_language, y='UDC', x = 'Surface agricole en %', hue = 'Language',
                    s = 10, alpha = 0.5);
g.set_xlim([0,100]);

Statistics

We see difference in the last plot, but it is still to clearly see the relation. Luckiliy these packages allow us to either create summary statistics or to fit the data:

In [13]:
g = sns.lmplot(data = towns_language, x = 'Surface agricole en %', y='UDC', hue = 'Language', scatter=True,
              scatter_kws={'alpha': 0.1});
g.ax.set_xlim([0,100]);
/usr/local/lib/python3.5/dist-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

Now we can also do the same exercise for all parties. Does the relation hold?

In [14]:
g = sns.lmplot(data = towns_language, x = 'Surface agricole en %', y='Party score', 
               hue = 'Party', scatter=True,
              scatter_kws={'alpha': 0.1});
g.ax.set_xlim([0,100]);
/usr/local/lib/python3.5/dist-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

Adding eve more information

We can recover from some other place (Poste) the coordinates of each town. Again by merging we can add that information to our main table:

In [15]:
coords = pd.read_csv('Datasets/plz_verzeichnis_v2.csv', sep=';')[['ORTBEZ18','Geokoordinaten']]
coords['lat'] = coords.Geokoordinaten.apply(lambda x: float(x.split(', ')[0]) if type(x)==str else np.nan)
coords['long'] = coords.Geokoordinaten.apply(lambda x: float(x.split(', ')[1]) if type(x)==str else np.nan)
In [16]:
towns_language = pd.merge(towns_language,coords, left_on='Commune', right_on='ORTBEZ18')

So now we can in addition look at the geography of these parameters. For example, who votes for the right-wing party ?

In [17]:
fix, ax = plt.subplots(figsize = (12,8))
sns.scatterplot(data = towns_language, x= 'long', y = 'lat', hue='UDC', style = 'Language', palette='Reds');
In [18]:
# MZ: if used to ggplot -> use 'plotnine' package
# same grammar as ggplot