When is a ‘Bayesian’ not a ‘Bayesian’?

Several of the posts on this blog relate to the logical approach to evidence evaluation; aka, the coherent logical approach, or the likelihood-ratio (LR) approach. In my opinion, it is the best way to evaluate evidence for forensic purposes no matter what type of evidence is being discussed. I say “best” because it is simple, logically sound, and relatively straight-forward to apply in forensic work. It helps to promote transparency through the application of a thorough and complete evaluation process (all points I have explained in other posts).

The reality is, however, that this approach is still not well understood by forensic practitioners, nor by members of the legal profession.

I hope that in time, and with education, that will change. Several workshops I have presented have been aimed at helping examiners understand what it really means, how it works, the philosophical basis behind the approach as well as the need for and benefit of doing things that particular way. It really does work to the benefit of both the examiner and their ultimate client, the court.

One recurring issue at these workshops relates to the very basic and fundamental concept of what the term “Bayesian” means. For various reasons, but mainly just misunderstanding, many people in the forensic document examination community hold the term “Bayesian” in negative regard. When the word ‘Bayes’, or any of its many derivations, come up in the conversation eyes glaze over while heads sag ever so slightly. And those are the positive people in the crowd.

I find such reactions understandable, but unfortunate. The fact is that an understanding of the term is beneficial for anyone interested in how it might be applied in a forensic evidence context, whether or not one chooses to do so. Indeed, for myself the answer to the question posed above — when is a Bayesian not a Bayesian? — lies in knowing how the overall Bayesian philosophy and theorem (or rule) differs from the more constrained and limited logical approach to evidence evaluation. These two are not the same or even close to equivalent.

Some Terminology and History

The key to the matter rests in terminology and the degree to which one chooses to limit the meaning of a term or word with the key word being “Bayesian”. Let’s start by considering what that word means and from where it comes. ‘Bayes’ refers, of course, to the Reverend Thomas Bayes (1701–1761) whose unpublished manuscript outlining how to use new evidence to update belief in a logically coherent manner (more formally, the study of inverse probability or reasoning backward from effect to cause) was read posthumously to the Royal Society of London for Improving Natural Knowledge by his colleague and friend, Richard Price. Bayes’ thesis remained relatively unknown until independently rediscovered and developed much further by Pierre-Simon Laplace.¹

Laplace published his own thoughts in 1774 and provided the modern formulation of the algorithm in his 1812 text entitled Théorie analytique des probabilités. The historical record makes it clear that Laplace did a lot of hard work to make these concepts a reality — something people could use and apply in the real world. It can be argued that Laplace deserves most of the credit for what we refer to as “Bayes’ Theorem” today. Nonetheless, the overall approach and the basic theorem is named after Bayes.

In one sense the word “Bayesian” can be used when referring to anything that applies or relates to “Bayes’ Theorem”, or the logical reasoning embodied in it. So let’s take a closer look at the Theorem and how it might be used in general.

When perusing the literature one will encounter many different formulations for Bayes Theorem, all of which are mathematically equivalent. For example, each of the following might be used depending upon the situation in which it is applied:
$$p(A|B)=\frac{p(B|A)p(A)}{p(B)} $$
or
$$p(A|B)=\frac{p(B|A)p(A)}{p(B|A)p(A)+p(B|A’)p(A’)} $$
or, even,
$$p(H_{i}|A)=\frac{p(A|H_{i})p(H_{i})}{p(A)} \text{; where } p(A)=\sum p(A|H_{i}) \cdot p(H_{i}) $$

Our interest pertains to forensic science applications wherein the equation of greatest concern is the so-called “odds form” shown below.² Using formal mathematical symbols the odds form of the equation, with the ‘components’ labelled, looks like this:
$$\underbrace{\dfrac{p(H_{1}|E,\textit{I})}{p(H_{2}|E,\textit{I})}}_{\text{Posterior Odds}}= \underbrace{\dfrac{p(E|H_{1},\textit{I})}{p(E|H_{2},\textit{I})}}_{\text{Likelihood Ratio}} \cdot \underbrace{\dfrac{p(H_{1}|\textit{I})}{p(H_{2}|\textit{I} )}}_{\text{Prior Odds}} $$

This shows how the likelihood-ratio (LR) component serves as a multiplicative factor acting on the prior odds of the propositions to produce posterior odds of the propositions. All of these components appear in the form of ‘conditional’ probabilities that include a conditioning factor, I, which refers to omnipresent framework information. That factor is important and always exists but it is often omitted when you see this equation in the literature usually to simplify the equation for reading purposes.

Much of the time people who talk about a “Bayesian approach” are referring to the full theorem. The theorem, as shown above, is very useful because it represents a complete and logically coherent decision-making process or, perhaps more accurately, a belief-updating process. The theorem tells us how belief about a set of propositions (encoded as prior odds) can be updated to some new belief (posterior odds) when new information is added to the mix (via the likelihood-ratio). The theorem invokes probability to address issues relating to uncertainty and conditionality. This might be used by any individual who is making a decision or when developing AI software that mimics human decision-making (assuming, of course, that people actually think this way — an assertion rarely shown to be true). Or for any number of other purposes along those lines.

Roles in the Reasoning Process

Perhaps the greatest value of the theorem lies in how it assists us in clarifying the roles played in a decision-making process when different parties are involved in that process. That is, after all, the essence of the ‘trial’ process in our justice system.

Let’s look at the theorem to see how it relates to the judicial decision-making process. Please note that I am not suggesting such decision-making actually conforms to this approach — only that we can explore the trial process using the theorem as a logically sound guide or method.

Most people agree that, in a judicial setting, it is the trier-of-fact who makes decisions about what did or did not happen in a contested matter. Put simply, they are the party responsible for making or formulating any ‘ultimate decision’ about what is true or not true in a given scenario or situation. Thus, it is their belief that matters when it comes to actual decision-making. They must base that ultimate decision on information provided to them by witnesses — who speak either to fact (lay witnesses) or to opinion (experts). It should be noted that they also base it on their own personal beliefs, biases and approach to reasoning (an aspect of the situation that is often overlooked in these discussions, even though it is always present).

The whole process is guided by the judge who is assisted by counsel representing different sides in the matter.³ The role of counsel is essentially to debate the matter before the court. The nature of that debate differs in adversarial systems (where two counsel present arguments to an independent trier) versus inquisitorial systems (where the trier(s) interrogates witnesses directly to explore various potential explanations and counsel help their client understand legal issues while assisting the court in its explorations). But in either system the trier is given information relevant to the matter at hand which they use when forming a final belief about the outcome of the trial. In adversarial systems the onus is on the prosecution to present a theory, hypothesis, or explanation for the evidence — one that meets the legal requirements of the charge being made. That position will be rebutted by the defence in some manner. Sometimes they propose an alternative theory about what happened or they may argue that the evidence is inadequate or flawed.⁴

In this context, each position or argument can be considered to be a competing ‘proposition’.⁵

Evidence has relevance, value and meaning to the court only if it can help differentiate between the competing propositions by making one or the other more likely than it would be in the absence of that evidence.

A critical concept to understand is that it is the ‘belief’ that exists in the mind of the trier(s) that matters in the end; belief about the competing propositions. The trier’s belief will vary through the course of a hearing as information is presented and added to the mix. That information, in the form of evidence, will differentially support one or the other proposition. At any given point in the proceedings and in terms of the theorem, the internal (and unknown) belief of the trier forms the prior odds of the propositions being modified as new information (evidence) is presented by the next witness, or as counsel present their arguments and thoughts about the matter.

Thus, a “Bayesian approach” might be helpful as a guide to decision-making in a court of law. I say ‘might’ because the trier cannot be constrained in how they reach their decision (other than in terms of relevant legal constraints). The reality is that most people are not “Bayesian” in the way they naturally reason.⁶

In my opinion it would be improper for any examiner to even suggest that a court try to apply Bayes Theorem in their deliberations. At the same time, one can only hope that any judicial process will be fair, balanced and logically sound whether done in accordance with this theorem, or using some other approach. This is simply that a good way to achieve those goals. At the same time, I know that is an unrealistic expectation.

In general, juries are instructed to evaluate the evidence “in a rational way” or something similar after which they are left to their own devices.⁷

But the fact that the trier may or may not apply this approach should not change the way in which the examiner’s evidence is provided. It is important to ensure, as much as possible, that testimony be given in a way that is clear/transparent, balanced, logical and robust to ensure that the meaning of the evidence is clear to the trier; that it be neither over or under-valued; that it not be double-counted; that it be presented in an unbiased and fair manner. The same requirements apply no matter how the trier ultimately uses the information they are given.

The Examiner and the Likelihood-ratio

Having explored the roles of the trier and counsel let’s turn to the forensic examiner who, in a court of law, serves as an expert witness. An expert witness has a clear responsibility to provide the trier with relevant information pertaining to the evidence that falls within a limited and well-defined domain or scope. They do not, as a rule, have access to information about the case other than that which falls in their domain. And they certainly will never have an understanding of the state of belief in the mind of the trier(s); i.e., the actual prior odds. As a result, if they are reasoning logically in some manner that conforms to Bayes’ Theorem, they cannot legitimately express an opinion that speaks to the propositions directly; i.e., the resulting posterior odds. If that is their goal, the best they might do is try to apply some estimation of prior odds, e.g., using relevant baserate information, or work with some standard form of prior odds, e.g., 1:1 or equal priors. But those approaches are rarely, if ever, warranted and always represent a poor approximation of reality.⁸

This may seem like a serious impediment, but it is not.

Expert testimony can and should be limited to expressing some form of the “likelihood-ratio component” of the reasoning process. In other words, they should speak to the evidence only, and not the propositions. More specifically, they should address the probability of observing the evidence under each of the competing propositions (and in the relevant context, or framework). Alternatively, and equivalently, they may speak to the support provided by the evidence for each of the competing propositions. That information is, literally, what the trier needs from the examiner, and it is precisely what the expert should be providing to the court. It is also worth noting that examiners routinely evaluate this type of information in the course of their examination, though most do so unwittingly. It is also important to note that the concept of the likelihood-ratio is not solely a part of Bayes Theorem. It exists as an independent concept and functions perfectly well without invoking Bayes Theorem at all.⁹

At any rate, all of this is a normal aspect of an examiner’s regular evaluation process. However, at present, most examiners are taught to express their beliefs in terms of the propositions and, as a result, they end up “going too far” by extending the result beyond what is warranted.¹⁰ Sadly, the somewhat flawed nature of that approach is unknown to most examiners, or to many other parties in the legal system.

Simply restricting one’s evaluation to the likelihood-ratio eliminates concerns relating to the formulation or use of either prior or posterior odds of the propositions. Those can safely be left with the trier to consider and evaluate however they choose to do so.

When the “Bayesian Approach” first appeared in forensic science literature it was presented and discussed in terms of the full Bayes Theorem; hence, the approach was referred to as being “Bayesian” in nature. This was the case even though the authors did not intend that the theorem be applied in its entirety by an expert. Robertson and Vignaux recognized the confusion in the situation and suggested the use of an alternative term, the “likelihood-ratio approach” to put the focus on the critical aspect of the evaluation process. Unfortunately, the term “likelihood-ratio” is problematic in its own way. The word “likelihood-ratio” has a specific meaning (at least for mathematicians and statisticians), and the approach I’m describing does not conform perfectly with that meaning. For example, in order to express a ‘true’ likelihood-ratio one requires numeric data — i.e., which ideally would be empirically derived.

That particular point isn’t a huge problem since it is perfectly legitimate to work with subjective and personal probabilities (which is ultimately the basis of the present approach used by examiners) obtained through elicitation (which is generally not done), rather than probabilities derived through empirical research or study.¹¹

Although I recommend that examiner’s express their belief by assigning numeric values to those probabilities, I also feel that forcing the issue would be unduly restrictive. I think of the theorem primarily as a model for logical reasoning, whether or not it is driven by explicit numeric data. In that context, the term is used simply to refer to the updating element of the theorem; the information that alters prior belief to form posterior belief. In any event, some take exception to the phrase “likelihood-ratio approach” believing it to be a poor description of what is being done.

Various authors prefer to use the “logical approach” for evidence evaluation.¹² I like that terminology because it focuses on the key aspect of the situation. This is a system of logical reasoning that incorporates conditional probabilities to deal with uncertainty; a system that is both logically coherent and logically sound. Hence, I personally prefer to use the term “the logical approach” in discussions though I confess to slipping and using “the likelihood-ratio approach” all too often as well.

Being a practising forensic document examiner I am also very aware that most examiners truly believe that they are already being completely logical in what they do now. While they are undoubtedly making every effort to that end, what they generally do not realise is that their reasoning can be a bit flawed and incomplete, lacking coherence in its application.¹³

To make this point I often add the word “coherent” resulting in the term, the “coherent logical approach for evidence evaluation”, even though the wording is redundant.

Summary Thoughts

Now, let us re-consider the original question “When is a Bayesian not a Bayesian?” As I stated at the beginning of this essay the answer lies in the distinction between applying the theorem in its full form and restricting oneself to evaluation and expression of only the likelihood-ratio component from the theorem. While the “coherent logical approach” (aka, “logical approach” or “likelihood-ratio (LR) approach”) are all fundamentally ‘Bayesian’ at the core being derived from that Theorem, the coherent logical approach does not apply the full Theorem, per se.

An examiner can quite easily evaluate and discuss the likelihood-ratio without concern for the other elements found in the theorem. Hence, someone using this approach is not a ‘Bayesian’ in the sense of applying the theorem, in full. At the same time there can be little doubt that someone using the logical approach should understand what “Bayes Theorem” means and how it works. And at the most fundamental level they would likely endorse this approach to logic and reasoning for any decision-maker, or even just updating of belief. Such people understand the correct and proper role to be played by an expert — the evaluation the evidence given the propositions, not the propositions given the evidence. And, as is required in most legal systems, they are happy to leave the decision making to the party with that responsibility, the trier-of-fact.

So, when is a Bayesian not a Bayesian? When that person is someone like myself. I do not refer to myself as being a “Bayesian” (though I know that others do) because I make no attempt to assess all of the components in the Theorem, nor do I apply the Theorem in my work. I believe very strongly that forensic experts should focus on evaluating and explaining the likelihood-ratio when they try to provide the court with information about the value of their evidence in terms of the matter at hand.¹⁴ The only time I mention Bayes Theorem is to explain how and why I think the LR approach is most appropriate for our work. That particular discussion is generally reserved for colleagues interested in how this approach works (and those reading this blog). Thus, to me, a Bayesian is not a Bayesian when they apply the (coherent) logical approach to evidence evaluation and report the results accordingly.

Footnotes

Laplace worked independently and apparently without knowledge of the early work done by Bayes. For more information see Pierre Simon Laplace on Probability and Statistics, courtesy of Richard J. Pulskamp.
In general, ‘odds’ are a ratio of probabilities for two competing conditional events or propositions that comprise an exhaustive and mutually-exclusive set of possibilities. Thus, to a mathematical purist the formulation in a forensic context may not be truly “odds” insofar as the competing proposition set is not always exhaustive in nature. This aspect is discussed elsewhere and, while it might be ‘nice’ to have an exhaustive set, the only requirement for competing propositions is that they be mutually exclusive.
Nonetheless, the basic logic and reasoning remains sound and applicable whether or not the proposition set is exhaustive and complete.
Of course, there are many instances where no separate jury is involved and the Court must serve both roles.
Of course, there is no requirement in general that the defence provide a clear counter-argument. The onus is on the prosecution to ‘prove’ their case and the defence need only create “reasonable doubt” in the mind of the trier. But, even in instances where no literal counter-argument is made, there will be at least one implicit ‘negative’ alternative side to the matter. It should be noted that for the purposes of evaluating evidence simply negating the main proposition to produce an alternative is not the best approach. Nonetheless, it can be used if a clear alternative is not provided. In addition, these comments are made in the context of a criminal trial. In terms of expert evidence and its presentation civil disputes are similar with the key difference being the latter applies a different standard of proof that the trier must apply when making their final decision. But that element functions at the level of the trier and not the individual witness.
I have written about the nature of propositions and their critical importance in the evaluation process in another post that you can read here.
There is research that suggests people do not function in accordance with Bayes’ Theorem when making decisions on their own. In other words, one might say that people are not inherently Bayesian in nature. That’s fair enough but it is not a valid argument in favour of having an expert function in an illogical manner (as is often the case in our more traditional approach).
An argument is sometimes made that an expert should not try to express their findings in the form of a likelihood-ratio simply because jurors do not evaluate evidence in accordance with the overall theorem. However, whether jurors do or do not conform to the logic inherent in the theorem (and it really isn’t clear that they do) is irrelevant.
What really matters is trying to ensure that the forensic expert does their job which is to evaluate the evidence in terms of the propositions, taking into account other relevant information. Expressing an opinion in the form of a likelihood-ratio makes perfect sense as it directly reflects the evaluation process itself.
Jury instructions vary a lot, but there are model instructions available from bodies like the National Judicial Institute of the Canadian Judicial Council. Such instructions often direct jurists to, for example, “consider the evidence and make your decision without sympathy, prejudice or fear. You must not be influenced by public opinion. Your duty as jurors is to assess the evidence impartially.”, “It is your duty to consult with one another and to try to reach a just verdict according to the law…”, “Approach your duties in a rational way…”, “To make your decision consider carefully, and with an open mind, all the evidence presented during the trial.”, or simply “use your collective common sense” (emphases added).
The adoption of equal prior odds is not an appropriate choice in most situations. I discussed this topic at length in another blog post you can read here.
See DH Kaye, Likelihoodism, Bayesianism, and a Pair of Shoes. Jurimetrics 2012 for further discussion.
“Going too far” in this context means transposing the conditional without directly addressing the issue of prior odds. Additionally, some conclusion scales incorporate ‘definite’ conclusions of identity or elimination which extend things even further by applying some personal threshold relating to acceptable error or uncertainty in the matter. Both of these are very problematic issues which can be easily avoided if the examiner restricts their evaluation and opinion to the probability of the evidence given the propositions.
For a very good discussion of probability and belief see Dennis V. Lindley’s Understanding Uncertainty(Wiley-Interscience, 2007). In a fairly short text Prof. Lindley explains in a very accessible way that all belief is probabilistic in nature and that all probability is conditional, subjective and personal. These points are all critical to understanding how information is interpreted and communicated.
I don’t know who it was that first coined this phrase, but it may have been Franco Taroni or Alex Biedermann from the Universite de Lausanne. Alternatively, it may go back further to Robertson and Vignaux’s 1995 text, Interpreting Evidence, John Wiley & Sons, Chichester.
“Coherence” here means both having systematic and logical consistency and reasoning that is in accordance with the three rules of probability (see Lindley).
As you might have guessed, the best descriptor for someone like me would be “likelihood-ist” or something along those lines.

R. B. Ostrum, FDE

When is a ‘Bayesian’ not a ‘Bayesian’?

Some Terminology and History

Roles in the Reasoning Process

The Examiner and the Likelihood-ratio

Summary Thoughts

Footnotes

One thought on “When is a ‘Bayesian’ not a ‘Bayesian’?”

Leave a Reply Cancel reply

Some Terminology and History

Roles in the Reasoning Process

The Examiner and the Likelihood-ratio

Summary Thoughts

Share this:

Footnotes

One thought on “When is a ‘Bayesian’ not a ‘Bayesian’?”

Leave a Reply Cancel reply