Introduction to BASV53

with R

Aurélien Ginolhac, DLSM

University of Luxembourg

Tuesday, the 25th of February, 2025

Hello!

What you can do now:

  • Check for material at the main site

https://basv53.uni.lu

  • Install R, RStudio and packages

setup

Check your install

library(tidyverse)
read_csv("https://biostat2.uni.lu/practicals/data/swiss.csv",
         show_col_types = FALSE) |>
  pivot_longer(cols = c(Fertility, Agriculture),
               names_to = "measurement", 
               values_to = "value") |>
  ggplot(aes(x = value, y = Education, colour = measurement)) +
  geom_point() +
  geom_smooth(method = "loess", formula = "y ~ x") +
  theme_bw(14)

Overview

This course provides an introduction to and the tidyverse, one of its dialect.

  • Brief computer history, programming and hardware notions
  • Biology track: data analysis is mandatory
  • Focusing on loading and cleaning data for exploratory visualizations

Lectures

  • Slides, formal lecture
  • Quick exercises inserted
  • Unprepared live demo

Practicals

  • Detailed exercises
  • Solutions hidden/revealed

Projects

  • Different projects, team up by (2, 3)
  • Due date: June
  • 10 min defense
  • No slides to prepare: Quarto HTML

This course is composed of ~ 30 hours (2 ECTS)

1 ECTS

  • Written and practical exam, qmd file
  • 2 hours
  • All document allowed
  • Internet allowed
  • AI allowed if tool specified and prompt included

1 ECTS

  • Home work project in a group
  • AI allowed if tool specified and prompt included
  • Wide range of subjects:
    • Gene expression in yeast cells
    • Spotify song characteristics
    • Temperatures from ice records
    • Voyager Golden Record images
    • CO2 worldwide emissions
    • Human genome gene structures
    • Colon rectal cancer microarray

Internet access allowed. Watch out for time!

Why allowed ?

Hello my name is Joby, I have a PhD in Physics and I work for NASA and I just had to look up the equation for the volume of a sphere

— Joby Hollis 🏳️ 🌈🇪🇺 (@Jobium) 3 September 2018

Downside!

  • Time vanishes fast if you aren’t prepared
  • AI is NOT running code

Teacher

Aurélien Ginolhac

  • Bioinformatician
  • : Work at the Department of Life Sciences and Medecine
    University of Luxembourg
    Campus Belval - 6 avenue du Swing
    KT4 – Office 403 L-4367 Belvaux
  • +352 46 66 44 6560
  • : aurelien.ginolhac@uni.lu

This website

Entirely built with Quarto, and hosted on the LCSB Gitlab

Highlights, more data science than pure computing

  • Computer parts
  • Programming basics, exemplified in
    • Data types
    • Data structures
    • Sub-setting
    • Control flow for and if
  • Data wrangling
    • Import data
    • Manipulating
    • Visualizing
  • Literate programming
    • Quarto

Literate programming, separate content from formatting

HTML

PDF

Demo live: French first names over 120 years

Dataset was gathered by a French company Thinkr, compiling babies names in France over 120 years (data from the French statistic institute INSEE)

Dataset

remotes is itself a package that is not part of base, you can install it with:

install.packages("remotes")
remotes::install_github("ThinkR-open/prenoms")

Loading a package is a call to library():

library(prenoms)
prenoms_france
# A tibble: 648,370 × 5
    year sex   name          n      prop
   <dbl> <chr> <chr>     <int>     <dbl>
 1  1900 F     Abeline       3 0.0000127
 2  1900 F     Abelle        3 0.0000127
 3  1900 F     Ada           4 0.0000170
 4  1900 F     Adelaide    194 0.000822 
 5  1900 F     Adelheid      3 0.0000127
 6  1900 F     Adelia       12 0.0000509
 7  1900 F     Adelie        3 0.0000127
 8  1900 F     Adelina      50 0.000212 
 9  1900 F     Adeline     224 0.000949 
10  1900 F     Adelphine    19 0.0000805
# ℹ 648,360 more rows
# ℹ Use `print(n = ...)` to see more rows

Data analysis

Questions?

  • Number of babies born per year
  • Births per month across recent years
  • Ratio Male / Female
  • Evolution of your first names
  • Dynamic of novelty
  • Novelty versus Saints
  • Double gender first names