The treatment of uncertainty in STRmix™
John Buckleton
As a general philosophy, we have attempted to identify all major sources of uncertainty relating to the interpretation of profiles and the assignment of the likelihood ratio within STRmix™. Whenever judgement is exercised in relation to any modelling decision, we favor decisions that tend to reduce the value of the LR, an approach we call conservative.
We list in approximately diminishing order what we believe the major sources of uncertainty to be and the actions taken by us or available in STRmix™ in the table below.
Source |
Description and/or Magnitude |
Action |
Relatedness |
The presence of a non-excluded close relative may have the effect of reducing by many orders of magnitude the LR. The magnitude of the effect depends on the profile, and the relationship. |
STRmix™ offers the Unified LR option. Two parameters are input; the average number of children per family and the population size. The conservative side is up for average number of children and down for population size [1-7]. This LR is output in the STRmix™ report as well as the value for unrelated, siblings, parent-child and many other relationships.
The unified LR is close to linear with respect to population/average number of children |
Number of contributors |
The assignment of an incorrect number of contributors has a variable effect on the LR depending on whether the POI aligns with the major contributors or the minor contributor. Assigning one too many contributors has little effect on the LR of the major contributor but lowers the LR for the smallest contributor. Assigning one too few contributors has little effect on the LR for the major contributors but leads to a false exclusion of the smallest contributor [8, 9]. |
STRmix™ V2.6 offers the user the option of inputting a range of contributor number [10]. V2.6 is validated only for a range of 1. For example, 3 or 4. There is a very considerable run time cost to allowing this range. For versions below V2.6 the uncertainty in the number of contributors has to be managed manually (for example running two options). This behaviour is taught in STRmix™ training courses. |
Population genetic model |
There are three population genetic models that have been in use in the US. The performance of all of these has been assessed [11]. The option offered in STRmix™ is the most conservative of the three. We have also looked at the effect of a breakdown in the modelling assumptions of this model [12] and find them to be negligible. This model has been tested against considerable real data [13-16]. |
STRmix™ offers the Balding and Nichols approach [3] |
MCMC |
The Monte Carlo process leads to a different value for the point estimate every time the same interpretation is undertaken. The level of the variation depends on the profile itself and no one statement about this tends to be adequate. |
STRmix™ offers an assessment of the variation likely from the MCMC process. The user may choose to output a one sided or two sided interval of any realistic α value. This is done by shrinking the correlated sample to the effective sample size (ESS) |
Value for the coancestry coefficient |
The value of the coancestry coefficient affects the LR in a direct but complex non-linear manner that is different for every case. |
STRmix™ offers the option to input a point value usually selected at the high end of the plausible range or to draw multiple random values for this parameters from a beta distribution. Plausible values for the beta distribution have been developed from data [17] that would tend to emphasise divergence and hence high values. The user may choose to view a one or two sided probability interval for any reasonable α value. |
Allele frequency data |
Probability estimates from allele frequency data have sampling uncertainty. |
STRmix™ offers the highest posterior density method for assessing this uncertainty. A Dirichlet prior is used and over time two have been trialled. These are k dimensional Dirichlet D(1…1) and D(1/k…1/k) where k is a parameter under the control of the user but recommended to be the number of allele classes. See [18] for the comparison of these and other options. The overall distribution of the LR from this source of variability is developed by numerical resampling. |
Value for the allele variance |
The value for the allele variance affects the likelihood used in the MCMC Metropolis-Hastings process. The effect of a change can be in either direction. We have undertaken sensitivity analyses changing these values on many occasions. |
STRmix™ does not fix the allele variance. It is optimised as part of the MCMC process with a gamma prior. The parameters of the gamma are set from empirical data but this simply sets the prior.
Kelly et al. [19] ran a large trial using four different sets of plausible parameters developed from four different laboratories and found negligible differences. |
Value for the stutter expectation and variances |
The value for the stutter expectation and variance affects the likelihood used in the MCMC Metropolis-Hastings process but in a fairly minor way [20]. The effect of a change can be in either direction. Two models have been used over time and no discernible change in performance occurred. We have a known issue with stutter for the vWA 14 allele which appears to be bi-modal in expectation. |
STRmix™ does not fix the allele variance. It is taken in as part of the MCMC process with a gamma prior. The parameters of the gamma are set from empirical data but this simply sets the prior
Kelly et al. [19] ran a large trial using four different sets of plausible parameters developed from four different laboratories and found negligible differences. |
Degradation modelling |
Degradation affects the expected allele height. We have trialled two models: linear and exponential. Empirical data supports exponential. The effect of this difference is case dependent. |
STRmix™ offers an exponential degradation option [21]. The parameter of the exponent is optimised during the MCMC process. |
General (this is out of place if this is diminishing order) |
There are factors outside the PG that are actually much bigger. See Bright et al. [22] |
References
[1] Balding D, Steele C. Weight-of-evidence for forensic DNA profiles. Second edition: Chichester.: John Wiley and Sons; 2015.
[2] Balding DJ. Weight-of-evidence for forensic DNA profiles. Chichester: John Wiley and Sons; 2005.
[3] Balding DJ, Nichols RA. DNA profile match probability calculation: how to allow for population stratification, relatedness, database selection and single bands. Forensic Science International. 1994;64:125-40.
[4] Bright J-A, Curran JM, Buckleton JS. Relatedness calculations for linked loci incorporating subpopulation effects. Forensic Science International: Genetics. 2013;7:380-3.
[5] Buckleton J, Triggs C. Relatedness and DNA: are we taking it seriously enough? Forensic Science International. 2005;152:115-9.
[6] Buckleton J, Bright JA, Taylor D. Forensic DNA evidence interpretation. 2nd ed. Florida, USA: CRC Press; 2016.
[7] Taylor D, Bright J-A, Buckleton J. Considering relatives when assessing the evidential strength of mixed DNA profiles. Forensic Science International: Genetics. 2014;13:259-63.
[8] Bright J-A, Richards R, Kruijver M, Kelly H, McGovern C, Magee A, et al. Internal validation of STRmix™ – A multi laboratory response to PCAST. Forensic Science International: Genetics. 2018;34:11-24.
[9] Bright J-A, Curran JM, Buckleton JS. The effect of the uncertainty in the number of contributors to mixed DNA profiles on profile interpretation. Forensic Science International: Genetics. 2014;12:208-14.
[10] Taylor D, Bright J-A, Buckleton J. Interpreting forensic DNA profiling evidence without specifying the number of contributors. Forensic Science International: Genetics. 2014;13:269-80.
[11] Curran JM, Buckleton JS, Triggs CM. What is the magnitude of the subpopulation effect? Forensic Science International. 2003;135:1-8.
[12] Buckleton J, Curran J, Walsh S. How reliable is the sub-population model in DNA testimony? Forensic Science International. 2006;157:144-8.
[13] Tvedebrink T, Eriksen PS, Curran JM, Mogensen HS, Morling N. Analysis of matches and partial-matches in a Danish STR data set. Forensic Science International: Genetics. 2012;6:387-92.
[14] Lauc G, Dzijan S, Marjanovic D, Walsh S, Curran J, Buckleton J. Empirical support for the reliability of DNA interpretation in Croatia. Forensic Science International: Genetics. 2008;3:50-3.
[15] Curran J, Walsh SJ, Buckleton JS. Empirical support for the reliability of DNA evidence interpretation in Australia and New Zealand. Australian Journal of Forensic Sciences. 2008;40:99-108.
[16] Curran JM, Walsh SJ, Buckleton JS. Empirical testing of estimated DNA frequencies. Forensic Science International: Genetics. 2007;1:267-72.
[17] Buckleton J, Curran J, Goudet J, Taylor D, Thiery A, Weir BS. Population-specific F_{ST} values for forensic STR markers: A worldwide survey. Forensic Science International: Genetics.23:91-100.
[18] Triggs CM, Curran JM. The sensitivity of the Bayesian HPD method to the choice of prior. Science & Justice. 2006;46:169-78.
[19] Kelly H, Bright J-A, Kruijver M, Cooper S, Taylor D, Duke K, et al. A sensitivity analysis to determine the robustness of STRmix™ with respect to laboratory calibration. Forensic Science International: Genetics. 2018;35:113-22.
[20] Bright J-A, Curran JM, Buckleton JS. Investigation into the performance of different models for predicting stutter. Forensic Science International: Genetics. 2013;7:422-7.
[21] Bright J-A, Taylor D, J.M. C, Buckleton JS. Degradation of forensic DNA profiles. Australian Journal of Forensic Sciences. 2013;45:445-9.
[22] Bright J-A, Stevenson KE, Curran JM, Buckleton JS. The variability in likelihood ratios due to different mechanisms. Forensic Science International: Genetics. 2015;14:187-90.