NONMEM model quality assurance

Published

July 21, 2025

Under construction

Interpreting the results, model validation

Grok NONMEM

.cnv means “convergence”, contains convergence-testing statistics.
.ets means “ETA samples”; gets “random sampled ETAs” (less prone to shrinkage)
.ext means “extra”
.phi contains individual parameters: φi=μi+ηi, φi=(φ1,…,φn), and iOFV
.phm means “.phi-file for mixture model”
.shk means “shrinkage”; contains same cmoposite shrinkage data as in .lst
.shm means “shrinkage map”; contains information which ETAs were excluded in the ETA shrinkage assessment.

phc=Var(phi)
non_MU_ref parameters: φi=ηi

QA

Tip

The “best” model isn’t necessarily the one with the lowest objective function.

QA
ETA ShrinkageSD < 20% ? (then we can use EBE based diagnostics)
EPS ShrinkageSD < 20% ? (then we can look at IPRED vs DV)
Condition number (CN) – $COVARIANCE PRINT=E
- Calculated differently in different software
  - In PsN it is calculated as dividing the largest eigenvalue with the smallest
  - Gabrielsson and Wiener calculates it as log(largest/smallest)
- Different guidelines for ill-conditioning
  - If CN < 10p, where p = “number of estimable parameters”, then it’s good
  - There are also references that point to CN < 106
  - < 1000 (seems to relate more to linear models, or PK-models with 3 parameters (CL, V, KA))
NRD = Number of Required Digits

Significant improvement

Don’t directly associate OFVs with p-values (Wählby et al. 2001)
- Regard a drop in OFV with ~10 (Byon et al. 2013)
boxcox ETA: dOFV > -10.827 (p < 0.001)

Bias

Driving individuals

Compare iOFV values in the .phi-file between runs.

Sharkplot (ΔOFV (= OFV_full - OFV_reduced) vs #subjects_removed)
Should be able to remove 5 subjects without loosing significance.

Outlying observations

Do a sensitivity analysis for outliers (|CWRES| < 5).

Re-estimate model with the outliers removed
Parameter estimates change -> Remove outliers
Parameter estimates doesn’t change -> Keep outliers

Parameter uncertainty

RSE% considered precise: < 30% for THETAs, < 50% for ETAs

Don’t confuse parameter precision estimates as determinant of the quality of the model fit. If a model minimizes, those are the best fit parameters.

We have three different methods because different models require different methods.

$COVARIANCE is the fastest, but doesn’t work on all models, especially complex ones where the parameter precision distribution is not normally distributed.
- Default (Sandwich)
- MATRIX=S
- MATRIX=R
SIR (PsN) is faster than bootstrap, but it can also sometimes give skewed distributions.
Bootstrap takes the longest and can really be influenced by the sampling scheme employed and inflate the confidence interval for parameters that rely on a small number of subjects in the analysis.

There is absolutely no reason that they should match each other because they have different assumptions and they’re using different approaches.

VPCs

The VPCs central metric are the prediction of data percentiles. If you focus on the difference between e.g. the 5th and 95th percentile based on the simulated data you will have a prediction interval, like Bill states. If you focus on an individual percentile, but consider the imprecision with which it is derived, often given as a shaded area, then it is like other metrics of imprecision a confidence interval. Confidence interval (ci) and prediction interval (pi) for VPC.

Notes

Mixture prior slide 31. Prior+flat prior “what if we are wrong”. When conflict: up-weights data, downplays history
Two-stage approach: Apply a model to each individuals data
- Require much data per individual.
- OK mean parameters (i.e. typical individual)
- But IIV id inflated (also diff ID -> diff model)
NONMEM: Apply a model to all individuals data

Interpreting the results, model validation

Grok NONMEM

QA

Bias

Driving individuals

Outlying observations

Parameter uncertainty

VPCs

References