Reproducible Research
- Reproducibility is defined as obtaining consistent results using the same data and code as the original study (synonymous with computational reproducibility).
- Replicability means obtaining consistent results across studies aimed at answering the same scientific question using new data or other new computational methods.
Reproducibility and Replicability in Research
Use version control (git or svn)
- Do not track model development in git, it is too messy, which messes with the git history.
- Use
rsync
if needed - Track the Rmd-file for the report
- This tracks the models. The models are still in the “messy” folder.
- base_model <- run25.mod
- covariate_model <- run63.mod
- final_model <- run67.mod
- simulation_model <- run68.mod
- Runrecord
- runno
- based on
- OFV
- dOFV
- Condition number (CN)
- Use
- Do not track produced PDFs in git
File/folder naming
- Name files/folders using only A-Z, a-z, 0-9, -, _.
- Start folder names with a number for sorting purposes.
- In general, use kebab-case for naming (easier to read than snake_case).
- If there are multiple parts to a name (e.g., a description, a date, and an author), use snake_case to separate between parts, and kebab-case within the parts (e.g.,
descriptive-name_2025-01-08_viktor-rognas.ext
)
- If there are multiple parts to a name (e.g., a description, a date, and an author), use snake_case to separate between parts, and kebab-case within the parts (e.g.,
Folder structure:
project/
- README.md # Project description
- input/
- data/ # All input data files
- raw_data/ # Untouched original data files
- raw_data.csv
- dat1.csv
- dat2.csv
- R/ # R-scripts
- dat1.R
- dat2.R
- NONMEM/
- model/ # Model files
- pk/
- run001.mod
- pd/
- run002.mod
- output/ # Results
- report/
- 1a/
- .tex
- .pdf
- 1b/
- .tex
- .pdf
- 1/
- .tex
- .pdf
- presentation/ # Communication
- slides.pptx
Coding: language agnostic
Function naming
Strive to use verbs for function names: to
, add
, remove
, do
, get
, make
, take
, find
, use
, call
, try
, have
, has
, give
, ask
, go
, put
, let
, help
, move
, turn
, run
, hold
, write
, read
, include
, set
, change
, watch
, stop
, start
, create
, open
, close
, save
, build
, wait
, require
, kill
, pull
, push
, pass
, stay
, etc…
Use (lower) camelCase for self-defined functions that are not to be exported outside your project.
Class names on the other hand should use Pascal Casing.
This is to make it a clear distiction between self-defined and imported functions.
# Class
Parameter <- R6Class("Parameter", ....)
# Variable
parameterToDelete <- ...
# Method and function
performSimulation <- function (...)
Unlike regular functions, long names are less problematic for test functions because
- They are not visible or accessible to the users
- They are not called repeatedly throughout the codebase
Variable naming
Variable names should be nouns.
True constant variables should use ALL_CAPS Casing.
Names for Boolean variables or functions should make clear what true
and false
mean. This can be done using prefixes (is
, has
, can
, etc)
# not great
if (child) {
if (parentSupervision) {
watchHorrorMovie <- TRUE
}
}
# better
if (isChild) {
if (hasParentSupervision) {
canWatchHorrorMovie <- TRUE
}
}
Use positive terms for Booleans since they are easier to process.
Coding: language specific
R
- Script all plots.
- Quarto-scripted report.
R.version
rstudioapi::versionInfo()
.packages()
devtools::session_info(pkgs = "attached")
NONMEM
When using Monte-Carlo estimation methods (e.g., SAEM
, IMP
, or FOCE MCETA)
), always specify the SEED
option and RANMETHOD=P
. Also, it is recommended to specify the RANMETHOD
option accordingy: * For SAEM
and IMP
: RANMETHOD=3S2P
* For MCETA:
RANMETHOD=4P(
$SIMULATION` uses this method by default)