Plotting data, part 1

with ggplot2

Aurélien Ginolhac

University of Luxembourg

Friday, the 2th of May, 2025

ggplot2

About this lecture

Learning objectives

  • Learn the basic grammar of graphics
  • Understand how it is implemented in ggplot2
    • Input data structures as data.frame/tibble
    • Mapping columns to display features (aesthetics)
    • Types of graphics (geometries)
    • Multiple and repeating graphics (facets)
    • Transforming plots (scales)
    • Using different coordinate systems
    • Customizing graphs with themes
  • Make quick exploratory plots of your multidimensional data.

Introduction

ggplot2

  • Stands for grammar of graphics plot version 2
  • Inspired by Leland Wilkinson work on the grammar of graphics in 2005.

Idea: split a graph into layers

  • Such as axis, curve(s), labels.
  • 3 elements are required: data, aesthetics, geometry \(\geqslant 1\)

ggplot2

ggplot2 layers

Basic example

Data

A B C D
2 3 4 a
1 2 1 a
4 5 15 b
9 10 80 b

Aesthetics

  • x = A
  • y = C
  • shape = D

Geometry

  • dot / point

Scaling

\(x = \frac{A-min(A)}{range(A)}*width\)

\(y = \frac{C-min(C)}{range(C)}*height\)

Result

Data scaled

A B shape
25 11 circle
0 0 circle
75 53 square
200 300 square

What if we want to split into panels circles and squares?

Faceting

Split by shape

Redundancy

  • shape and facets provide the same information.
  • The shape aesthetic is free for another variable.

Faceting can also carry a message

Combine layers but not with pipes!

Warning

ggplot2 layers are combined with +, not %>% nor |>.

This introduces a break in the workflow. (ggplot1 would have been fine)

ggplot lines combined with +

library(ggplot2)
swiss |>
  ggplot(aes(x = Education, 
             y = Examination)) +
  geom_point() +
  scale_colour_brewer()

If not, the error is explicit

swiss |>
  ggplot(aes(x = Education, 
             y = Examination)) |>
  geom_point() +
  scale_colour_brewer()
Error in `geom_point()`:
! `mapping` must be created by `aes()`.
ℹ Did you use `%>%` or `|>` instead of `+`?

Build a plot by layers

swiss
             Fertility Agriculture Examination Education Catholic
Courtelary        80.2        17.0          15        12     9.96
Delemont          83.1        45.1           6         9    84.84
Franches-Mnt      92.5        39.7           5         5    93.40
Moutier           85.8        36.5          12         7    33.77
Neuveville        76.9        43.5          17        15     5.16
Porrentruy        76.1        35.3           9         7    90.57
Broye             83.8        70.2          16         7    92.85
Glane             92.4        67.8          14         8    97.16
Gruyere           82.4        53.3          12         7    97.67
Sarine            82.9        45.2          16        13    91.38
Veveyse           87.1        64.5          14         6    98.61
Aigle             64.1        62.0          21        12     8.52
Aubonne           66.9        67.5          14         7     2.27
Avenches          68.9        60.7          19        12     4.43
Cossonay          61.7        69.3          22         5     2.82
Echallens         68.3        72.6          18         2    24.20
Grandson          71.7        34.0          17         8     3.30
Lausanne          55.7        19.4          26        28    12.11
La Vallee         54.3        15.2          31        20     2.15
Lavaux            65.1        73.0          19         9     2.84
Morges            65.5        59.8          22        10     5.23
Moudon            65.0        55.1          14         3     4.52
Nyone             56.6        50.9          22        12    15.14
Orbe              57.4        54.1          20         6     4.20
Oron              72.5        71.2          12         1     2.40
Payerne           74.2        58.1          14         8     5.23
Paysd'enhaut      72.0        63.5           6         3     2.56
Rolle             60.5        60.8          16        10     7.72
Vevey             58.3        26.8          25        19    18.46
Yverdon           65.4        49.5          15         8     6.10
Conthey           75.5        85.9           3         2    99.71
Entremont         69.3        84.9           7         6    99.68
Herens            77.3        89.7           5         2   100.00
Martigwy          70.5        78.2          12         6    98.96
Monthey           79.4        64.9           7         3    98.22
St Maurice        65.0        75.9           9         9    99.06
Sierre            92.2        84.6           3         3    99.46
Sion              79.3        63.1          13        13    96.83
Boudry            70.4        38.4          26        12     5.62
La Chauxdfnd      65.7         7.7          29        11    13.79
Le Locle          72.7        16.7          22        13    11.22
Neuchatel         64.4        17.6          35        32    16.92
Val de Ruz        77.6        37.6          15         7     4.97
ValdeTravers      67.6        18.7          25         7     8.65
V. De Geneve      35.0         1.2          37        53    42.34
Rive Droite       44.7        46.6          16        29    50.43
Rive Gauche       42.8        27.7          22        29    58.33
             Infant.Mortality
Courtelary               22.2
Delemont                 22.2
Franches-Mnt             20.2
Moutier                  20.3
Neuveville               20.6
Porrentruy               26.6
Broye                    23.6
Glane                    24.9
Gruyere                  21.0
Sarine                   24.4
Veveyse                  24.5
Aigle                    16.5
Aubonne                  19.1
Avenches                 22.7
Cossonay                 18.7
Echallens                21.2
Grandson                 20.0
Lausanne                 20.2
La Vallee                10.8
Lavaux                   20.0
Morges                   18.0
Moudon                   22.4
Nyone                    16.7
Orbe                     15.3
Oron                     21.0
Payerne                  23.8
Paysd'enhaut             18.0
Rolle                    16.3
Vevey                    20.9
Yverdon                  22.5
Conthey                  15.1
Entremont                19.8
Herens                   18.3
Martigwy                 19.4
Monthey                  20.2
St Maurice               17.8
Sierre                   16.3
Sion                     18.1
Boudry                   20.3
La Chauxdfnd             20.5
Le Locle                 18.9
Neuchatel                23.0
Val de Ruz               20.0
ValdeTravers             19.5
V. De Geneve             18.0
Rive Droite              18.2
Rive Gauche              19.3

Build a plot by layers

swiss |>
  ggplot()

Build a plot by layers

swiss |>
  ggplot() +
  aes(x = Education)

Build a plot by layers

swiss |>
  ggplot() +
  aes(x = Education) +
  aes(y = Examination)

Build a plot by layers

swiss |>
  ggplot() +
  aes(x = Education) +
  aes(y = Examination) +
  geom_point()

Build a plot by layers

swiss |>
  ggplot() +
  aes(x = Education) +
  aes(y = Examination) +
  geom_point() +
  theme_bw(18)

Palmer penguins

install with

install.packages("palmerpenguins")

A more interesting example

library(palmerpenguins)

A more interesting example

library(palmerpenguins)
penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot()

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g)

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex)

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex) +
  geom_point()

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex) +
  geom_point() +
  theme_bw(base_family = "Roboto Condensed", base_size = 13)

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex) +
  geom_point() +
  theme_bw(base_family = "Roboto Condensed", base_size = 13) +
  scale_color_manual(values = c("darkorange", "cyan4"), na.translate = FALSE)

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex) +
  geom_point() +
  theme_bw(base_family = "Roboto Condensed", base_size = 13) +
  scale_color_manual(values = c("darkorange", "cyan4"), na.translate = FALSE) +
  labs(title = "Penguin flipper and body mass",
       caption = "Horst AM, Hill AP, Gorman KB (2020)",
       subtitle = "Dimensions for male/female Adelie, Chinstrap and Gentoo Penguins at Palmer Station LTER")

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex) +
  geom_point() +
  theme_bw(base_family = "Roboto Condensed", base_size = 13) +
  scale_color_manual(values = c("darkorange", "cyan4"), na.translate = FALSE) +
  labs(title = "Penguin flipper and body mass",
       caption = "Horst AM, Hill AP, Gorman KB (2020)",
       subtitle = "Dimensions for male/female Adelie, Chinstrap and Gentoo Penguins at Palmer Station LTER") +
  theme(plot.subtitle = element_text(size = 13))

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex) +
  geom_point() +
  theme_bw(base_family = "Roboto Condensed", base_size = 13) +
  scale_color_manual(values = c("darkorange", "cyan4"), na.translate = FALSE) +
  labs(title = "Penguin flipper and body mass",
       caption = "Horst AM, Hill AP, Gorman KB (2020)",
       subtitle = "Dimensions for male/female Adelie, Chinstrap and Gentoo Penguins at Palmer Station LTER") +
  theme(plot.subtitle = element_text(size = 13)) +
  labs(x = "Flipper length (mm)",
       y = "Body mass (g)",
       color = "Penguin sex")

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex) +
  geom_point() +
  theme_bw(base_family = "Roboto Condensed", base_size = 13) +
  scale_color_manual(values = c("darkorange", "cyan4"), na.translate = FALSE) +
  labs(title = "Penguin flipper and body mass",
       caption = "Horst AM, Hill AP, Gorman KB (2020)",
       subtitle = "Dimensions for male/female Adelie, Chinstrap and Gentoo Penguins at Palmer Station LTER") +
  theme(plot.subtitle = element_text(size = 13)) +
  labs(x = "Flipper length (mm)",
       y = "Body mass (g)",
       color = "Penguin sex") +
  theme(legend.position = "bottom",
        legend.background = element_rect(fill = "white", color = NA))

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex) +
  geom_point() +
  theme_bw(base_family = "Roboto Condensed", base_size = 13) +
  scale_color_manual(values = c("darkorange", "cyan4"), na.translate = FALSE) +
  labs(title = "Penguin flipper and body mass",
       caption = "Horst AM, Hill AP, Gorman KB (2020)",
       subtitle = "Dimensions for male/female Adelie, Chinstrap and Gentoo Penguins at Palmer Station LTER") +
  theme(plot.subtitle = element_text(size = 13)) +
  labs(x = "Flipper length (mm)",
       y = "Body mass (g)",
       color = "Penguin sex") +
  theme(legend.position = "bottom",
        legend.background = element_rect(fill = "white", color = NA)) +
  theme(plot.caption = element_text(hjust = 0, face = "italic"))

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex) +
  geom_point() +
  theme_bw(base_family = "Roboto Condensed", base_size = 13) +
  scale_color_manual(values = c("darkorange", "cyan4"), na.translate = FALSE) +
  labs(title = "Penguin flipper and body mass",
       caption = "Horst AM, Hill AP, Gorman KB (2020)",
       subtitle = "Dimensions for male/female Adelie, Chinstrap and Gentoo Penguins at Palmer Station LTER") +
  theme(plot.subtitle = element_text(size = 13)) +
  labs(x = "Flipper length (mm)",
       y = "Body mass (g)",
       color = "Penguin sex") +
  theme(legend.position = "bottom",
        legend.background = element_rect(fill = "white", color = NA)) +
  theme(plot.caption = element_text(hjust = 0, face = "italic")) +
  theme(plot.caption.position = "plot")

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex) +
  geom_point() +
  theme_bw(base_family = "Roboto Condensed", base_size = 13) +
  scale_color_manual(values = c("darkorange", "cyan4"), na.translate = FALSE) +
  labs(title = "Penguin flipper and body mass",
       caption = "Horst AM, Hill AP, Gorman KB (2020)",
       subtitle = "Dimensions for male/female Adelie, Chinstrap and Gentoo Penguins at Palmer Station LTER") +
  theme(plot.subtitle = element_text(size = 13)) +
  labs(x = "Flipper length (mm)",
       y = "Body mass (g)",
       color = "Penguin sex") +
  theme(legend.position = "bottom",
        legend.background = element_rect(fill = "white", color = NA)) +
  theme(plot.caption = element_text(hjust = 0, face = "italic")) +
  theme(plot.caption.position = "plot") +
  facet_wrap(vars(species))

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex) +
  geom_point() +
  theme_bw(base_family = "Roboto Condensed", base_size = 13) +
  scale_color_manual(values = c("darkorange", "cyan4"), na.translate = FALSE) +
  labs(title = "Penguin flipper and body mass",
       caption = "Horst AM, Hill AP, Gorman KB (2020)",
       subtitle = "Dimensions for male/female Adelie, Chinstrap and Gentoo Penguins at Palmer Station LTER") +
  theme(plot.subtitle = element_text(size = 13)) +
  labs(x = "Flipper length (mm)",
       y = "Body mass (g)",
       color = "Penguin sex") +
  theme(legend.position = "bottom",
        legend.background = element_rect(fill = "white", color = NA)) +
  theme(plot.caption = element_text(hjust = 0, face = "italic")) +
  theme(plot.caption.position = "plot") +
  facet_wrap(vars(species)) +
  scale_x_continuous(guide = guide_axis(n.dodge = 2))

A more interesting example

library(palmerpenguins)
penguins |>
  ggplot() +
  aes(x = flipper_length_mm,
      y = body_mass_g) +
  aes(color = sex) +
  geom_point() +
  theme_bw(base_family = "Roboto Condensed", base_size = 13) +
  scale_color_manual(values = c("darkorange", "cyan4"), na.translate = FALSE) +
  labs(title = "Penguin flipper and body mass",
       caption = "Horst AM, Hill AP, Gorman KB (2020)",
       subtitle = "Dimensions for male/female Adelie, Chinstrap and Gentoo Penguins at Palmer Station LTER") +
  theme(plot.subtitle = element_text(size = 13)) +
  labs(x = "Flipper length (mm)",
       y = "Body mass (g)",
       color = "Penguin sex") +
  theme(legend.position = "bottom",
        legend.background = element_rect(fill = "white", color = NA)) +
  theme(plot.caption = element_text(hjust = 0, face = "italic")) +
  theme(plot.caption.position = "plot") +
  facet_wrap(vars(species)) +
  scale_x_continuous(guide = guide_axis(n.dodge = 2)) +
  scale_y_continuous(labels = scales::label_comma())

Geometric objects define the plot type to be drawn

geom_point()

geom_violin()

geom_line()

geom_histogram()

geom_bar()

geom_density()

Tip

Have a look at the cheatsheet or the documentation for more possibilities.

Core layers

Other layers

They are present, it works because they have sensible default:

  • Theme is theme_grey
  • Coordinate is cartesian
  • Statistic is identity
  • Facets are disabled

3 layers are enough

  • Data
  • Aesthetics mapping data to plot component
  • Geometry at least one

Your first plot

library(palmerpenguins)

Your first plot

library(palmerpenguins)
library(ggplot2)

Your first plot

library(palmerpenguins)
library(ggplot2)
ggplot(data = penguins)

Your first plot

library(palmerpenguins)
library(ggplot2)
ggplot(data = penguins) +
  geom_point(mapping = aes(x = bill_length_mm,
                           y = bill_depth_mm,
                           colour = species))

Mapping aesthetics

Requirements

  • aes() map columns/variables data to aesthetics
  • Specific geometries (geom) have different expectations:
    • univariate, one x or y for flipped axes
    • bivariate, x and y like scatterplot
  • Continuous or Discrete variables
    • Continuous for color ➡️ gradient
    • Discrete for color ➡️ qualitative

geom_point() requires both x and y coordinates

ggplot(penguins) +
  geom_point(aes(x = bill_length_mm,
                 y = bill_depth_mm))

  • Same as previous slide
  • Without colour mapping

Unmapped parameters

  • geom_point() accepts additional arguments such as the colour
  • Define them to a fixed value without mapping
ggplot(penguins) +
  geom_point(aes(x = bill_length_mm,
                 y = bill_depth_mm),
             colour = "black")

Important

Parameters defined outside the aesthetics aes() are applied to all data.

Mapped parameters

Require two conditions:

  1. Being defined inside the aesthetics aes()
  2. Refer to one of the column data, here: mistake
ggplot(penguins) +
  geom_point(aes(x = bill_length_mm,
                 y = bill_depth_mm,
                 colour = country))
Error in FUN(X[[i]], ...): object 'country' not found
  • Passing the unknown column as string as a different effect:
ggplot(penguins) +
  geom_point(aes(x = bill_length_mm,
                 y = bill_depth_mm,
                 colour = "country"))

This is hardly useful, but we shall see an application later, stick to the 2 mapping rules: - Inside aes() and refer to a valid table column.

Mapping aesthetics correctly

In aes() and refer to a data column

ggplot(penguins) +
  geom_bar(aes(y = species,
               fill = sex))

species and sex are 2 valid columns in penguins

Advantages:

  • The legend 🟥/🟦 for free
  • Missing data are highlighted in grey ⬜️
  • Using y axis for categories eases reading

How not using a string for mapping is useful?

Fair question

  • Could we pass an expression?
  • Which penguins are above 4 kg?
  • Use body_mass_g > 4000 that returns a boolean to find out
ggplot(penguins) +
  geom_bar(aes(y = species,
               fill = body_mass_g > 4000))

The expression was evaluated in penguins context Obvious that Gentoo are bigger than the 2 other species

Inheritance of arguments across layers

Compare the two following (great example of a Simpson’s paradox):

ggplot(penguins,
       aes(x = bill_length_mm,
           y = bill_depth_mm)) +
  geom_point(aes(colour = species)) +
  geom_smooth(method = "lm", formula = 'y ~ x')

ggplot(penguins,
       aes(x = bill_length_mm,
           y = bill_depth_mm,
           colour = species)) +
  geom_point() +
  geom_smooth(method = "lm", formula = 'y ~ x')

Important

  • aesthetics in ggplot() are passed on to all geometries.
  • aesthetics in geom_*() are specific (and can overwrite inherited)

Try it

  • Map the island variable to a shape aesthetics for both dots and linear models
  • All dots (circles / triangles / squares) with:
    • A size of 5
    • A transparency of 30% (alpha = 0.7)

Aim plot:

05:00

Answer:

ggplot(penguins,
       aes(x = bill_length_mm,
           y = bill_depth_mm,
           shape = island,
           colour = species)) +
  geom_point(size = 5, alpha = 0.7) +
  geom_smooth(method = "lm",
              formula = 'y ~ x')

Joining observations

set.seed(212) # tidyr::crossing generate combinations
tib <- tibble(crossing(x = letters[1:4], 
                       g = factor(1:2)), 
              y = rnorm(8))

Suppose we want to connect dots by g

Should be the job of geom_line()

tib
# A tibble: 8 × 3
  x     g          y
  <chr> <fct>  <dbl>
1 a     1     -0.239
2 a     2      0.677
3 b     1     -2.44 
4 b     2      1.24 
5 c     1     -0.327
6 c     2      0.154
7 d     1      1.04 
8 d     2     -0.780

Invisible aesthetic: grouping

Naive

ggplot(tib, aes(x, y, colour = g)) +
  geom_line() + 
  geom_point(size = 4)

Correct

ggplot(tib, aes(x, y, colour = g)) +
  geom_line(aes(group = g)) +
  geom_point(size = 4)

Forcing connection

ggplot(tib, aes(x, y, colour = g)) +
  geom_line(aes(group = 1)) +
  geom_point(size = 4)

Labels

ggplot(penguins,
       aes(x = bill_length_mm,
           y = bill_depth_mm,
           shape = island,
           colour = species)) +
  geom_point() +
  geom_smooth(method = "lm",
              formula = 'y ~ x') +
  labs(title = "Bill ratios of Palmer penguins",
       caption = "Horst AM, Hill AP, Gorman KB (2020)",
       subtitle = "Split per species / island",
       shape = "Islands",
       x = "cumen length (mm)",
       y = "cumen depth (mm)")

Statistics / geometries are interchangeable

ggplot(penguins) +
  geom_bar(aes(y = species))

ggplot(penguins) +
  stat_count(aes(y = species))

Warning

  • Feels more natural since visual
  • But just a preference
  • Most code in the wild use geom

Let ggplot2 doing the stat for you

stat_count could be omitted since default

ggplot(penguins, aes(x = species)) +
  geom_bar(stat = "count")

stat_count acts on the mapped var like dplyr::count()

count(penguins, species)
# A tibble: 3 × 2
  species       n
  <fct>     <int>
1 Adelie      152
2 Chinstrap    68
3 Gentoo      124

Or do it yourselft, but with geom_col()

If you give counts, change the stat

count(penguins, species) |>
  ggplot(aes(x = species, y = n)) +
  geom_bar(stat = "identity")

geom_col() has the default identity

count(penguins, species) |>
  ggplot(aes(x = species, y = n)) +
  geom_col()

The stat function allows computation, like proportions

Classic counting

ggplot(penguins, aes(y = species)) +
  geom_bar(aes(x = stat(count)))

  • Now compute proportions
  • Bonus: get x scale in % using scales
ggplot(penguins, aes(y = species)) +
  geom_bar(aes(x = stat(count) / sum(count))) +
  scale_x_continuous(labels = scales::label_percent())

Flexibility in the asthetics for flipping axes

geom_bar() requires x OR y

penguins |>
  # horizontal brings readability
  ggplot(aes(y = species)) +
  geom_bar()

Cleanup plot

penguins |>
  ggplot(aes(y = species)) +
  geom_bar() +
  labs(y = NULL) +
  scale_x_continuous(expand = expansion(mult = c(0, .1)))

No extra space on the left, but keep 10% on the right

Annoying to see those 3 bars in disorder

Reorder the categorical variable (forcats)

Using the function fct_infreq()

penguins |>
  ggplot(aes(y = fct_infreq(species))) +
  geom_bar() +
  scale_x_continuous(expand = expansion(mult = c(0, .1))) +
  labs(title = "Palmer penguins species",
       y = NULL) +
  theme_minimal(14) +
  # nice trick from T. Pedersen
  theme(panel.ontop = TRUE,
        # better to hide the horizontal grid lines
        panel.grid.major.y = element_blank())

Geometries catalogue

Histograms

penguins |>
ggplot(aes(x = body_mass_g,
           fill = species)) +
  geom_histogram(bins = 35,
                 alpha = 0.6, 
                 position = "identity")

  • Default bin value is 30 and will be printed out as a message
  • Default is stack for the position. Here we overlay with "identity" and use transparency

Density plots

penguins |>
ggplot(aes(x = body_mass_g,
           fill = species,
           colour = species)) +
  geom_density(alpha = 0.6)

  • Use both colour and fill mapped to the same variable for cosmetic purposes

Barplots: bar positions

Default: position = "stack"

ggplot(penguins) +
  geom_bar(aes(y = species, 
               fill = island))

Dodging island: side by side

ggplot(penguins) +
  geom_bar(aes(y = species, fill = island),
           position = "dodge")

But global width per species is preserved

Preserve single bar (same width)

ggplot(penguins) +
  geom_bar(aes(y = species,
           fill = island),
           position = position_dodge2(preserve = "single")) +
  labs(y = NULL)

Stacked barchart for proportions

penguins |>
  drop_na(sex) |>  # from tidyr
  ggplot() +
  geom_bar(aes(y = species,
               fill = sex),
           position = "fill") +
  geom_vline(xintercept = 0.5, 
             linetype = "dashed",
             colour = "grey30") +
  scale_x_continuous(labels = scales::label_percent(),
                     position = "top",
                     expand = c(0, 0)) +
  labs(x = NULL, y = NULL) +
  theme_classic(16) # larger font sizes

  • Makes comparison of sex-ratio much easier

Boxplot, a continuous y by a categorical x

ggplot(penguins) +
  geom_boxplot(aes(y = body_mass_g,
                   x = species))

geom_boxplot() is assessing that:

  • body_mass_g is continuous
  • species is categorical/discrete

Boxplot, dodging by default

Filter out NA to avoid this category

penguins |>  # alternative to tidyr::drop_na()
  filter(!is.na(sex)) |>
  ggplot() +
  geom_boxplot(aes(y = body_mass_g,
                   x = species,
                   fill = sex))

Better: violin and jitter

Show the data

penguins |>
  filter(!is.na(sex)) |>
  # define aes here for both geometries
  ggplot(aes(y = body_mass_g,
             x = species,
             fill = sex,
             # for violin contours and dots
             colour = sex
  )) +        # very transparent filling
  geom_violin(alpha = 0.1, trim = FALSE) +
  geom_point(position = position_jitterdodge(dodge.width = 0.9),
             alpha = 0.5,
             # don't need dots in legend
             show.legend = FALSE)

Even better: beeswarm

ggplot extension ggbeeswarm

library(ggbeeswarm)
penguins |>
  filter(!is.na(sex)) |>
  ggplot(aes(y = body_mass_g,
             x = species,
             colour = sex)) +
  geom_quasirandom(dodge.width = 1) +
  theme_bw(14)

Coding mistake

What is wrong with the above code?

(Hint: think about inherited aesthetics)

penguins |>
  ggplot() +
  geom_point(aes(x = bill_length_mm, 
                 y = body_mass_g)) +
  geom_smooth(method = "lm")
Error in `geom_smooth()`:
! Problem while computing stat.
ℹ Error occurred in the 2nd layer.
Caused by error in `compute_layer()`:
! `stat_smooth()` requires the following missing aesthetics: x and y.

Inheritance of aesthetics in main ggplot()

penguins |>
  ggplot(aes(x = bill_length_mm, 
             y = body_mass_g)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

Control the dots plotting order

ggplot2 outputs dots as they appear in the input data

tibble(x = LETTERS[1:3],
       y = x) |>
  ggplot(aes(x, y)) +
  geom_point(aes(colour = x),
             show.legend = FALSE,
             size = 100, alpha = 0.9) +
  scale_color_brewer(palette = "Dark2") +
  theme_classic(20)

tibble(x = LETTERS[1:3], y = x) |>
  arrange(desc(x)) |>
  ggplot(aes(x, y)) +
  geom_point(aes(colour = x),
             show.legend = FALSE,
             size = 100, alpha = 0.9) +
  scale_color_brewer(palette = "Dark2") +
  theme_classic(20)

Before we stop

You learned to:

  • Apprehend Graphics as a language
  • Embrace the layer system
  • Link data columns to aesthetics
  • Discover geometries

Further reading 📚

Acknowledgments 🙏 👏

Thank you for your attention!