Introduction to Datascience

with R

Aurélien Ginolhac, DHML

University of Luxembourg

Monday, the 23th of February, 2026

Hello!

What you can do now:

  • Check for material at the main site

https://basv53.uni.lu

  • Install R, RStudio and packages

setup

Check your install

library(tidyverse)
read_csv("https://biostat2.uni.lu/practicals/data/swiss.csv",
         show_col_types = FALSE) |>
  filter_out(Fertility > 80) |>
  pivot_longer(cols = c(Fertility, Agriculture),
               names_to = "measurement", 
               values_to = "value") |>
  ggplot(aes(x = value, y = Education, colour = measurement)) +
  geom_point() +
  geom_smooth(method = "loess", formula = "y ~ x", alpha = 0.2) +
  theme_bw(14)

Overview

This course provides an introduction to and the tidyverse, one of its dialect.

  • Brief computer history, programming and hardware notions
  • Biology track: data analysis is mandatory
  • Focusing on loading and cleaning data for exploratory visualizations

Lectures

  • Slides, formal lecture
  • Quick exercises inserted
  • Unprepared live demo

Practicals

  • Detailed exercises
  • Solutions hidden/revealed

Projects

  • Different projects, team up by (2, 3)
  • Due date: June
  • 10 min defense
  • No slides to prepare: Quarto HTML

This course is composed of ~ 30 hours (2 ECTS)

1 ECTS

  • Written and practical exam, qmd file
  • 2 hours
  • All document allowed
  • Internet allowed
  • Only your laptop allowed (no tablet)
  • Communication with others forbidden
  • AI allowed if tool specified and prompt included
  • Retake exam is an oral exam

1 ECTS

  • Home work project in a group
  • AI allowed if tool specified & prompt included
  • Wide range of subjects:
    • Gene expression in yeast cells
    • Spotify song characteristics
    • Temperatures from ice core records
    • Voyager Golden Record images
    • CO2 worldwide emissions
    • One year of a photo-voltaic installation by 5 min step
    • Human genome gene structures
    • Colon rectal cancer microarray

Internet access allowed. Watch out for time!

Why allowed ?

Hello my name is Joby, I have a PhD in Physics and I work for NASA and I just had to look up the equation for the volume of a sphere

— Joby Hollis 🏳️ 🌈🇪🇺 (@Jobium) 3 September 2018

Downside!

  • Time vanishes fast if you aren’t prepared
  • AI is NOT running code

Teacher

Aurélien Ginolhac

This website

Entirely built with Quarto, and hosted on the Uni/LCSB Gitlab

Data Science, not a computer science course

  • Computer parts
  • Programming basics, exemplified in
    • Data types
    • Data structures
    • Sub-setting
    • Control flow for and if
  • Data wrangling
    • Import data
    • Manipulating
    • Visualizing
  • Literate programming
    • Quarto

Literate programming, separate content from formatting

HTML

PDF

Live Demo: French first names over 125 years

Dataset baby names in France over 120 years (data from the French statistic institute INSEE

https://www.insee.fr/fr/statistiques/8595130

Flat file

First lines of the 711,070 lines :

sexe;prenom;periode;valeur
1;AABAN;2018;5
1;AADAM;2009;5
1;AADAM;2014;5
1;AADAM;2015;5
1;AADAM;2016;5
1;AADAM;2017;5
1;AADAM;2018;5
1;AADAM;2020;5
1;AADAM;2022;5
[...]

Dataset loaded in R

prenoms
# A tibble: 711,069 × 4
    sexe prenom periode valeur
   <dbl> <chr>    <dbl>  <dbl>
 1     1 AABAN     2018      5
 2     1 AADAM     2009      5
 3     1 AADAM     2014      5
 4     1 AADAM     2015      5
 5     1 AADAM     2016      5
 6     1 AADAM     2017      5
 7     1 AADAM     2018      5
 8     1 AADAM     2020      5
 9     1 AADAM     2022      5
10     1 AADAM     2024      5
# ℹ 711,059 more rows
# ℹ Use `print(n = ...)` to see more rows

Data analysis

Questions?

  • Number of babies born per year
  • Births per month across recent years
  • Ratio Male / Female
  • Evolution of your first names
  • Dynamic of novelty
  • Novelty versus Saints
  • Double gender first names
  • Most popular first names per decade