Reproducible Research

Published

May 5, 2025

Under construction

Reproducibility and Replicability in Research

Use version control (git or svn)

  • Do not track model development in git, it is too messy, which messes with the git history.
    • Use rsync if needed
    • Track the Rmd-file for the report
    • This tracks the models. The models are still in the “messy” folder.
      • base_model <- run25.mod
      • covariate_model <- run63.mod
      • final_model <- run67.mod
      • simulation_model <- run68.mod
    • Runrecord
      • runno
      • based on
      • OFV
      • dOFV
      • Condition number (CN)
  • Do not track produced PDFs in git

File/folder naming

  • Name files/folders using only A-Z, a-z, 0-9, -, _.
    • Start folder names with a number for sorting purposes.
  • In general, use kebab-case for naming (easier to read than snake_case).
    • If there are multiple parts to a name (e.g., a description, a date, and an author), use snake_case to separate between parts, and kebab-case within the parts (e.g., descriptive-name_2025-01-08_viktor-rognas.ext)

Folder structure:

project/
  - README.md       # Project description
  - input/
    - data/         # All input data files
      - raw_data/   # Untouched original data files
        - raw_data.csv
      - dat1.csv
      - dat2.csv
  - R/              # R-scripts
      - dat1.R
      - dat2.R
  - NONMEM/
    - model/        # Model files
      - pk/
        - run001.mod
      - pd/
        - run002.mod
  - output/         # Results
    - report/
      - 1a/
        - .tex
        - .pdf
      - 1b/
        - .tex
        - .pdf
      - 1/
        - .tex
        - .pdf
    - presentation/ # Communication
      - slides.pptx

Coding: language agnostic

Function naming

Strive to use verbs for function names: to, add, remove, do, get, make, take, find, use, call, try, have, has, give, ask, go, put, let, help, move, turn, run, hold, write, read, include, set, change, watch, stop, start, create, open, close, save, build, wait, require, kill, pull, push, pass, stay, etc…

Use (lower) camelCase for self-defined functions that are not to be exported outside your project.

Class names on the other hand should use Pascal Casing.

This is to make it a clear distiction between self-defined and imported functions.

# Class

Parameter <- R6Class("Parameter", ....)

# Variable

parameterToDelete <- ...

# Method and function

performSimulation <- function (...)
Don’t hesitate to choose lengthy names for test functions.

Unlike regular functions, long names are less problematic for test functions because

  • They are not visible or accessible to the users
  • They are not called repeatedly throughout the codebase

Variable naming

Variable names should be nouns.

True constant variables should use ALL_CAPS Casing.

# Constant variables

DEFAULT_PERCENTILE <- 0.5

Names for Boolean variables or functions should make clear what true and false mean. This can be done using prefixes (is, has, can, etc)

# not great
if (child) {
  if (parentSupervision) {
    watchHorrorMovie <- TRUE
  }
}

# better
if (isChild) {
  if (hasParentSupervision) {
    canWatchHorrorMovie <- TRUE
  }
}

Use positive terms for Booleans since they are easier to process.

# double negation - difficult
is_firewall_disabled <- FALSE

# better
is_firewall_enabled <- TRUE

Coding: language specific

R

  • Script all plots.
  • Quarto-scripted report.
    • R.version
    • rstudioapi::versionInfo()
    • .packages()
    • devtools::session_info(pkgs = "attached")

NONMEM

When using Monte-Carlo estimation methods (e.g., SAEM, IMP, or FOCE MCETA)), always specify the SEED option and RANMETHOD=P. Also, it is recommended to specify the RANMETHOD option accordingy: * For SAEM and IMP: RANMETHOD=3S2P * For MCETA:RANMETHOD=4P($SIMULATION` uses this method by default)