NONMEM model quality assurance
Interpreting the results, model validation
Grok NONMEM
.cnv
means “convergence”, contains convergence-testing statistics.
.ets
means “ETA samples”; gets “random sampled ETAs” (less prone to shrinkage)
.ext
means “extra”
.phi
contains individual parameters: φi=μi+ηi, φi=(φ1,…,φn), and iOFV
.phm
means “.phi-file for mixture model”
.shk
means “shrinkage”; contains same cmoposite shrinkage data as in .lst
.shm
means “shrinkage map”; contains information which ETAs were excluded in the ETA shrinkage assessment.
- phc=Var(phi)
- non_MU_ref parameters: φi=ηi
QA
The “best” model isn’t necessarily the one with the lowest objective function.
QA
ETA ShrinkageSD < 20% ? (then we can use EBE based diagnostics)
EPS ShrinkageSD < 20% ? (then we can look at IPRED vs DV)
Condition number (CN) –
$COVARIANCE PRINT=E
- Calculated differently in different software
- In PsN it is calculated as dividing the largest eigenvalue with the smallest
- Gabrielsson and Wiener calculates it as log(largest/smallest)
- Different guidelines for ill-conditioning
- If CN < 10p, where p = “number of estimable parameters”, then it’s good
- There are also references that point to CN < 106
- < 1000 (seems to relate more to linear models, or PK-models with 3 parameters (CL, V, KA))
- Calculated differently in different software
NRD = Number of Required Digits
- Don’t directly associate OFVs with p-values (Wählby et al. 2001)
- Regard a drop in OFV with ~10 (Byon et al. 2013)
- boxcox ETA: dOFV > -10.827 (p < 0.001)
Bias
Driving individuals
Compare iOFV values in the .phi
-file between runs.
- Sharkplot (ΔOFV (= OFVfull - OFVreduced) vs #subjectsremoved)
- Should be able to remove 5 subjects without loosing significance.
Outlying observations
Do a sensitivity analysis for outliers (|CWRES| < 5).
- Re-estimate model with the outliers removed
- Parameter estimates change -> Remove outliers
- Parameter estimates doesn’t change -> Keep outliers
Parameter uncertainty
- RSE% considered precise: < 30% for THETAs, < 50% for ETAs
Don’t confuse parameter precision estimates as determinant of the quality of the model fit. If a model minimizes, those are the best fit parameters.
We have three different methods because different models require different methods.
- $COVARIANCE is the fastest, but doesn’t work on all models, especially complex ones where the parameter precision distribution is not normally distributed.
- Default (Sandwich)
MATRIX=S
MATRIX=R
- SIR (PsN) is faster than bootstrap, but it can also sometimes give skewed distributions.
- Bootstrap takes the longest and can really be influenced by the sampling scheme employed and inflate the confidence interval for parameters that rely on a small number of subjects in the analysis.
There is absolutely no reason that they should match each other because they have different assumptions and they’re using different approaches.
VPCs
The VPCs central metric are the prediction of data percentiles. If you focus on the difference between e.g. the 5th and 95th percentile based on the simulated data you will have a prediction interval, like Bill states. If you focus on an individual percentile, but consider the imprecision with which it is derived, often given as a shaded area, then it is like other metrics of imprecision a confidence interval. Confidence interval (ci) and prediction interval (pi) for VPC.
- Mixture prior slide 31. Prior+flat prior “what if we are wrong”. When conflict: up-weights data, downplays history
- Two-stage approach: Apply a model to each individuals data
- Require much data per individual.
- OK mean parameters (i.e. typical individual)
- But IIV id inflated (also diff ID -> diff model)
- NONMEM: Apply a model to all individuals data