Huber, Headrick & Bayes… - R. B. Ostrum, FDE

Like many document examiners I consider Huber and Headrick’s 1999 textbook, Handwriting Identification: Facts and Fundamentals, to be a seminal work.²¹

Huber and Headrick Handwriting Identification

In my opinion, it is the best textbook written to date on the topic of handwriting identification. The authors provide a comprehensive overview as well as some less conventional perspectives on select concepts and topics. In general, I tend to agree with their position on many things. A bit of disclosure is need here: I was trained in the RCMP laboratory system; the same system in which Huber and Headrick were senior examiners and very influential. Hence, I tend to be biased towards their point-of-view.

That does not, however, mean that I think their textbook is perfect. While it is well written and manages to present a plethora of topics in reasonable depth, some parts are incomplete or misleading; particularly when we take developments that have happened since it was written into account.

One area of particular interest to me relates to the evaluation of evidence; specifically evaluation done using a coherent logical (or likelihood-ratio) approach.²² I have posted elsewhere on the topic so I’m not going to re-hash the background or details any more than necessary.

This post will look at the topic of ‘Bayesian concepts’ as discussed by Huber and Headrick in their textbook. These concepts fall under the general topic of statistical inference found in their Chapter 4, “The Premises for the Identification of Handwriting”. The sub-section of interest is #21 where the authors attempt to answer the question, “What Part Does Statistical Inference Play in the Identification Process?” Much of their answer in that sub-section relates to Bayesian philosophy, in general, and the application of the logical approach to evidence evaluation. However, while they introduce some things reasonably well, the discussion is ultimately very flawed and very much in need of correction. Or, at least, clarification.

Before delving into specifics from the text it is important to review the concept of inference. The overall evaluation process used by forensic examiners clearly involves a number of inferential processes. There are three formal types of logical inference: deduction, induction, and abduction. Any of these types of reasoning may be used by forensic experts but, when dealing with unknown and uncertain material in casework, the most common form is inductive reasoning. The authors clearly agree since they wrote “The argument for the identification of a handwriting is an inductive argument.” and “Deduction is a matter of recognizing valid logical forms, but induction is a matter of weighing evidence.”²³

Inference, in general, is the act or process of deriving a logical conclusion from premises known or assumed to be true. The term “statistical inference” used by the authors is a bit more specific in that it refers to the use of statistics/mathematics to draw conclusions in the presence of uncertainty; that is, the use of quantitative or qualitative (categorical) data to inform the process.

I like the authors’ explanation of how ‘statistics’, in general, may enhance the reasoning process: “When one brings common sense to bear upon a problem, a mixture of experience and intuition is used. Inferential statistics employ a similar process, substituting data for experience and formula for intuition. Hence, in practise, statistical methods require us to do, in a more formal and rigorous way, the things that are done, informally, countless times each day.” This is true… to a degree. It is not so much the use of statistics that leads to a more formal and rigorous method, but rather the application of probabilistic logic appropriate to the problem at hand.

Indeed, my first concern with the authors’ presentation relates to the explicit invocation of statistics as a necessary part of the inferential process, with an implicit emphasis on ‘hard’ data. I suspect the idea that statistics are necessary came from with their realization that our work is probabilistic in nature. It is true that probability is unavoidable and, indeed, necessary to do our work properly — probability is the mechanism by which we deal with the uncertainty omnipresent in everything we do. And it is true that empirically-derived statistics can be very useful, even beneficial, when dealing with probability issues. However, there is no actual requirement for either numbers or statistics in order to use proper probabilistic logic. There is much benefit to be derived from a logical approach to the evaluation of evidence (for FDE work or any other forensic discipline).

The key point to remember is that proper inferential reasoning must follow a logically correct framework that incorporates appropriate probabilistic concepts. A good model for this is one based on Bayes Theorem which describes a process of the updating of belief that moves from some prior state of understanding to a new state of understanding via the incorporation of new information in the form of a likelihood-ratio (LR). In my opinion, it is best to think about this as a system of logic that incorporates probabilistic reasoning to address the omnipresent element of uncertainty, rather than as a mathematical or statistical formulation.²⁴

Trials and other legal proceedings are formal systems that involve the updating of belief; specifically, the belief held by the trier-of-fact. As such, these fora usually involve multiple parties. Beyond providing a useful framework for reasoning in general Bayes theorem also helps us by clarifying the roles of different parties when several are involved in the decision-making process. This is the key to proper application of the ‘theorem’ in any forensic domain because, in the end, that is the nature of every judicial hearing.

Probabilistic reasoning involves a number of concepts like conditionality and the subjective nature of all knowledge and information. While ‘hard’ (i.e., empirically-derived) statistics can be very useful to inform our evaluation, they are not required in any strict or absolute sense. Personal knowledge (i.e., that of the examiner) is perfectly valid for this purpose and the theorem accommodates any type or form of information.²⁵

Another problem with the authors’ presentation stems from confusion between Bayes Theorem and the likelihood-ratio.²⁶ The latter, while it is an element in the theorem, stands alone and can be evaluated by examiners without ever worrying about application of the full theorem. This is important because Bayes Theorem in its full form is not generally useful to us in practice or casework. Many people have difficulty understanding this but it is critical to proper reasoning. The theorem certainly has tremendous value in showing the manner in which belief can, and perhaps should, be updated through the addition of relevant information, in accordance with formal probabilistic logic. As noted above, it also has value in showing how such things may happen in the context of a trial or court. Finally, in doing so, the theorem helps to clarify the roles and responsibilities of each party involved in an investigation or trial.

But the fact of the matter is that an examiner is only capable of assessing a very discrete and limited aspect of the theorem — specifically, the likelihood-ratio or something equivalent to it. The LR reflects the strength of the evidence in terms of at least two competing propositions. In the numeric application of the theorem, it acts as a multiplicative factor that is applied to the prior odds (prior belief about the propositions) to generate some new posterior odds (posterior belief about the propositions). In the non-numeric sense, the decision-maker must apply the LR to their personal belief about the situation, thus leading to a new belief. The cognitive process(es) involved in doing this are unclear but, ideally, they would conform to the logical requirement and be essentially ‘multiplicative’ in nature.²⁷

Issues relating to prior odds generally fall outside the domain of the expert. This idea should not bother examiners. After all, most will say they don’t need to know about other evidence or information relating to the case (which is the main basis for prior odds in this context). Granted, there will be times when the expert has knowledge or information that relates to the prior odds in some specific way — for example, background base-rate information about the frequency with which some type of evidence may occur in a population. But even in those instances the expert should still generally refrain from trying to incorporate that information into their own evaluation process.

In my opinion, it is better in those situations to inform the trier-of-fact of the information, which generally relates to background frequency data, who can then adjust their own priors before being given the information specific to the likelihood-ratio; that is the expert’s opinion about the value of the evidence in terms of the propositions. This approach serves two purposes: it limits the examiner’s testimony to the evidence and what it means in terms of the propositions, and it helps avoid the possibility of double-counting evidence — once by the examiner and again by the trier.

It is important to remember which party is the actual decision-maker in a trial. It is not the expert; it is the trier-of-fact be that a judge or jury. Alternatively, at the investigative level of proceedings it is the investigator or a lawyer. The role the expert is solely to evaluate the evidence and then explain the meaning and significance of evidence in terms relating to the propositions of interest.

The authors acknowledged the rather long history of the logical approach when they wrote:

A number of papers have been written in recent years relative to the application of statistics, particularly the Bayesian theorem, to handwriting examination, few of which have been very helpful to writing examiners. The subject is not that new, however. An excellent account of earlier attempts to apply the theorem to writing cases, of which Bertillon’s was one of the first (1898), will be found in the study by Taroni, Champod, and Margot (1997). It is not our intention to devote excessive time and space to a review of the Bayes theory, other than to say that it allows the examiner or the triers of facts to take into account the relevant population of persons that circumstances and other evidence circumscribes in some practical fashion as encompassing the potential authors of a questioned writing. Thus, if a finding is not definitive, i.e., it is a qualified opinion, these other factors may provide sufficient information to render the finding more definitive than it is.

These are reasonably fair comments but they don’t go far enough. In the ‘early days’ of forensic science, when there was no such science, per se, practitioners came from other domains and the earliest writings show that many, if not all, used some form of this approach. At some point very early on, there was a shift away from this approach for forensic science; at least, in much of the world including North America, the UK, Canada, Australia and so on.

The authors’ comment that “[Bayes Theorem] allows the examiner or the triers of facts to take into account the relevant population of persons that circumstances and other evidence circumscribes in some practical fashion as encompassing the potential authors of a questioned writing” is very confusing. In a certain sense the statement is true but what the authors do not say is that any evaluation, whether it uses Bayes Theorem or something else, should be doing this. This is, therefore, not unique to Bayes Theorem. The theorem simply helps to make it clear that such information is a necessary element of any evaluation based on proper logic. Of course, it also makes clear how this information comes into play in the process.

Their comment, “if a finding is not definitive, i.e., it is a qualified opinion, these other factors may provide sufficient information to render the finding more definitive than it is” is incorrect in two respects. First, ‘definitive’ findings are nonsensical in a Bayesian framework or approach so that is a moot point.²⁸

Indeed, all conclusions are qualified and conditional by their very nature. So the evaluative process is not at all aimed at rendering the findings “more definitive” as they suggest — rather it is aimed at ensuring that proper and justified weight is given to the evidence in terms of its support for one or the other competing proposition. As such, the approach has value for any and all evaluations. It is true, however, that the approach may be more helpful when evidence has limitations making the evaluation difficult or challenging.

In discussing the likelihood-ratio the authors reference Souder’s discussion of it from 1935 as follows:

Souder was one of the few and perhaps the first handwriting examiner to employ the likelihood ratio, that is a progeny of the Bayes theory, to assess the evidence in the identification of writing:
“…in handwriting we do not have to push the tests until we get a fraction represented by unity divided by the population of the world. Obviously, the denominator can always be reduced to those who can write and further to those having the capacity to produce the work in question. In a special case, it may be possible to prove that one of three individuals must have produced the document. Our report, even though it shows a mathematical probability of only one in one hundred, would then irresistibly establish the conclusion.”

The characterization of the likelihood-ratio as “a progeny of the Bayes theory” is wrong; the LR is not an ‘off-spring’ or ‘derivation’ of the theory. It is simply one of several components in the odds-form of Bayes theorem. While it may be considered a fundamental part of the theorem, it is not a ‘progeny’ of the theory by any means.

In the quoted passage Souder discusses the concept of the relevant population defined under the alternative proposition (i.e., relating to the denominator of the LR). It is noted that we do not need to address the “population of the world” as it often assumed. But there is a suggestion that “it may be possible” to reduce that population (the example being to “one of three individuals”), a situation where the evidence “would then irresistibly establish the conclusion”.

The issue I have with this is subtle, but very significant. The evaluation process is not one where the examiner ‘tries’ in any way to reduce the population to make the evidence more meaningful. The relevant population is determined by the specifics that apply under the alternative proposition; as such, it is ‘pre-set’ and cannot be changed by the examiner unless the proposition is modified in some manner or other. The size of the relevant population of potential writers may be very small, moderate, huge or whatever. It is whatever it is. Now, it is true that the smaller the relevant population might be, the more likely it is that the evidence will have value in differentiating between the propositions BUT that is coincidental. What matters for the evaluation is the probability of observing the evidence if the alternative proposition is true (eg. if someone from the population of potential writers, whatever size that population might be, and not the suspect, wrote the questioned signature). If that probability is low, there is little support for the proposition; if high, there is strong support for it. I will expand on this later when I talk more about the ‘relevant population’ for any given alternative proposition.

This makes the final sentence “Our report, even though it shows a mathematical probability of only one in one hundred, would then irresistibly establish the conclusion” a huge concern. This is a very familiar example of a ‘limited population’; in this case, only 3 potential writers/suspects. Is it reasonable to say that a denominator value with a value of 1/100 would ‘irresistibly establish the conclusion’ of identity?

Without delving deeply into details but assuming unity in the numerator, a denominator value of 1/100 produces a LR value of 100 (1 over 1/100). What this means is the evidence provides a 100x increase in support for the proposition outlined in the numerator over the denominator. That is a substantial increase in support, particularly when applied to prior odds of 1:3. However, even in that restricted case scenario, I don’t think most people would consider such an outcome to be definitive or conclusive.

It certainly improves the odds in favour of the main proposition over the alternative — indeed, the trier’s belief about the matter (ignoring anything else they might know) should change from 1:3 to 33:1.²⁹ Yes, those are much better odds but are they sufficient to be deemed “conclusive”? Personally speaking, I don’t think so.

That is especially true when you realise the trier may have some reason, unknown to the examiner, to strongly favour the alternative in the first place. Say, for example, their prior was not 1:3 but, based on other information the trier was given, it was closer to 1:100. Now the evidence simply serves to counter-balance that other information because the posterior odds become 1:1 — completely indeterminant.

The appropriate final outcome depends on what other information might come into play which brings us nicely back to the fact that the decision-making occurs in the mind of the trier, not the expert. In fairness to Souder, I still need to go back to see this excerpt in proper context but thus far I have been unable to obtain that article.

The authors are quite correct in noting:

Once rebuffed and rejected, the Bayesian approach now has strong advocates for its use in forensic science. Aitken wrote that “the Bayesian approach (is) the best measure we have for assessing the value of evidence.” Gaudette, without identifying its Bayesian basis, wrote on the “Evaluation of Associative Physical Evidence.” Good, a prolific writer on the topic, referred to it as the “weight of evidence.” See also Hill for a general review.

This is very true; more so today than in 1999. There are strong advocates for this approach, but very few suggest the use of the theorem in its full form. Rather, they promote a focus on the likelihood-ratio, or something equivalent to it. Unfortunately, Huber and Headrick again turn to Hilton for a definition and further discussion of the likelihood-ratio:³⁰

Alford, perhaps unwittingly, initiated the use of the Bayes Theorem and the likelihood ratio in handwriting case work in 1965. On the strength of a paper read at the meeting of the American Academy of Forensic Science by Olkin,³¹ Hilton explained,³² that the likelihood ratio statistic is the ratio of: the probability calculated on the basis of the similarities, under the assumption of identity, to the probability calculated on the basis of dissimilarities, under the assumption of nonidentity. Accordingly, the probability of identity in a population of five persons, on the strength of what Hilton calls the joint probability of three writing features, would be 1/5 to the power of 3, or 1/125. Then the probability of nonidentity would be 4/5 to the power of 3, or 64/125, (approximately 1/2). The ratio of ‘identity’ to ‘nonidentity’ in this case is 1/125 divided by 64/125 and equals 1/64. It is considered to be a measure of the likelihood of ‘chance coincidence’, (not our words). Hence, the smaller that this fraction is, or if you prefer, the larger the denominator relative to a numerator of 1, the less likely is coincidence and the stronger is the identification.
The likelihood ratio is a statistical means of testing a calculated value derived from a statistical sample. Relative to handwriting examinations, it is the means of determining whether the probability of identity and the probability of nonidentity are significantly different. In other contexts, we frequently use the term odds. We invert the likelihood ratio (that we determined above in our example to be 1/64), and say that the odds favouring the identification of this subject are 64 to 1. Readers should note that it is the likelihood ratio that is inverted to produce the odds, not the joint probability of a number of similarities, that in our example was 1/125.

I used the word ‘unfortunate’ earlier because the discussion here is badly flawed. As is the case with Souder, I have not been able to obtain the Olkin reference from 1958 so I’m not sure if it was Olkin that got it wrong in his presentation or if Hilton who misunderstood what was being said in the first place (and the nature of Alford’s involvement in all of this is also unclear to me).

To be fair to Hilton the presentation by Huber and Headrick does not help matters. Not at all. Hilton’s original article provided more information that this excerpt suggests but much of it was incorrect or at least misapplied (for a complete review of Hilton’s article please see this blog post). The following comments focus on the above quote and address specific issues in it.

The authors say, “the likelihood ratio statistic is the ratio of: the probability calculated on the basis of the similarities, under the assumption of identity, to the probability calculated on the basis of dissimilarities, under the assumption of non-identity.”

This is wrong. The likelihood ratio is the probability of observing the complete set of features (both similarities and dissimilarities) given the main proposition divided by the probability of observing the same, complete set of features (both similarities and dissimilarities) given the alternative proposition.³³ It is very important to understand that it is the totality of the evidence that matters. One must not focus on similarities in the numerator and dissimilarities in the denominator. To suggest this is the case is incorrect. I should note that this is not what Hilton wrote — rather, it is a mis-interpretation by Huber and Headrick.

The authors then present some nonsensical numeric values that, I assume, were intended to clarify the process.³⁴ They do not clarify anything. As noted earlier the likelihood ratio does not have to be based on quantitative values. The LR is often defined and explained in terms of numeric values (not these values) but there is no ‘need’ for numbers, per se. If someone wishes to invoke numeric or quantified data, I have no problem with it. One could derive numeric values from empirical studies or, more likely, through elicitation of personal subjective probabilities converted to numeric values. The latter is a perfectly valid approach to the issue.³⁵ So, ‘numbers’ can be used. But it is critically important to realize there is no need for metrics or statistics at all. One can apply the concepts embodied in the approach without numbers, with little to be gained by insisting on the artificial use of them.

Now, whenever numeric data is used it is essential to provide information about how it was obtained as well as the limits that apply to it. In this instance the authors quote Hilton and provide a (meaningless) joint probability value to present a ‘probability of identity’ from which a ‘probability of non-identity’ is obtained based on the assumption these two values are mutually exclusive and exhaustive.

The values are then used to produce a ratio which they say “is considered to be a measure of the likelihood of ‘chance coincidence’, (not our words). Hence, the smaller that this fraction is, or if you prefer, the larger the denominator relative to a numerator of 1, the less likely is coincidence and the stronger is the identification.” This comes directly from Hilton’s article and it is wrong. The concept of ‘chance coincidence’ is fine, but applies only to the denominator of the likelihood ratio and, even in that context, has only limited value in the overall evaluation. The end of the sentence is better since a smaller value in the denominator does indicate stronger support for the main proposition over the alternative. However, I dislike the terminology “the stronger is the identification”.

The authors wrote that “Relative to handwriting examinations, [the LR] is the means of determining whether the probability of identity and the probability of nonidentity are significantly different”. The likelihood ratio is definitely not a means for comparing the ‘probability of identity’ with the ‘probability of non-identity’. This is a big issue both for this text and in Hilton’s article. It exemplifies, almost perfectly, “transposition of the conditional”.³⁶

The terms ‘identity’ and ‘non-identity’ in this context refer literally to the propositions of interest. That is, the competing propositions which must be assessed by the trier. Whenever one speaks about the probability of propositions they are referring to either prior odds or the posterior odds. Therefore, by definition, those values cannot be the likelihood ratio. The key point here is that the likelihood serves to inform the trier about the value of the evidence and the weight it should be given as the trier is trying to evaluate the “probability of identity” versus the “probability of nonidentity”.

I sum up my criticism of this section by simply saying that neither Mr. Hilton nor Messrs Huber and Headrick had any real understanding of the likelihood-ratio, what it means or how it can/should be applied in our work.

As noted earlier, the authors also touch on the concept of the ‘relevant population’ which is, of course, critical to the evaluation process. Indeed, in many situations it is the ‘relevant population’ under the alternative proposition that is of greatest concern. That population is, quite literally, the subset of persons who fall within the parameters defined by the alternative proposition; in other words, the group of people who might be considered as viable sources when the culprit/perpetrator is not the suspect.

It is perhaps worth noting that this idea comes into play regardless of the method used for assessment purposes. More traditionally, the default population was “everyone else” as embodied in the idea that an identification was made ‘to the exclusion of all other writers’. Aside from being impractical such an approach is unnecessary and ill-advised. The degree to which the relevant population can be limited will vary from case to case and it may well remain poorly or vaguely defined.

The authors wrote:

Others have written on the consideration to be given to the question of relevant populations that may strengthen the findings of a writing examination. Kingston’s view was that it was the role of the judge or jury, not the writing examiner, to apply the modification to the handwriting evidence that other evidence may justify. The question arises, however, as to who would be the most competent for the task: the judge? the jury? or the writing examiner?
But, to return to the question posed: statistical inference has a vital role to play in writing identification, greater, perhaps, than many examiners recognize.

This quote shows further misunderstanding of the process. It is not a “question of relevant populations that may strengthen the findings of a writing examination”. The population of interest is determined by the propositions under consideration. And those propositions will ideally reflect the arguments being made in court.

The issue of the propositions is very important to the evaluation process. I have discussed this at length elsewhere. Ideally, the propositions should align with the arguments to be made by counsel for either side — something that almost never happens in casework. In reality, in lieu of being provided propositions, the examiner must set their own and declare them clearly when reporting their opinion. The LR, which should be the basis for that opinion, may change dramatically when different sets of propositions are used.

One of the critical elements the trier (and the lawyers) must consider is whether or not appropriate propositions were considered by the expert. If they were not, then different ones must be given to the expert who can then re-evaluate the evidence and produce a new weighting (i.e., a new LR). This is necessary because, in part, changing the alternative proposition may result in consideration of a different population of interest.

It is important to realise that this is ‘nothing new’ in terms of an expert giving testimony. Lawyers always pose ‘alternative explanations’ or theories to ‘explain’ the evidence to favour one position or the other. This is a normal and expected part of a trial process. Other explanations are simply different alternative propositions. In most instances, a well-prepared examiner will have evaluated the evidence under all such alternatives in anticipation of being asked about them, even if that evaluation is not reported directly. But, if that hasn’t been done, the examiner will have to do the evaluation on-the-fly or ask for time to make such an assessment.

The sentence about Kingston’s view is, however, correct. Issues that relate to the prior odds (or posterior) odds are within the scope of the trier, not the examiner. In addition, the effect of the LR on the prior odds (and the posterior odds that result) must occur in the mind of the trier. So the follow-up question is moot. They ask “who would be the most competent for the task: the judge? the jury? or the writing examiner?” There is only one party with this role in the proceedings and it is not the examiner.

This issue comes up in another section in the text, #32 (Do Numerals or Symbols and Other Nonalphabetic Characters Play a Part in the Writing Identification Process?), with a further reference to Bayes, again mentioning Alford:³⁷

Alford provides us with a classic example of the case in which the discrimination between the numerals of a small population of writers is possible. In this case, other circumstances narrowed the population of possible writers to four persons whose habits in executing the numerals “0” to “9” were sufficiently distinctive from each other to permit proper conclusions of authorship to be drawn. This was the first application of the Bayes Theorem and the likelihood ratio to writing examination that we are aware of.

The idea of limiting the possible population of writers to a small number often comes up in discussions of Bayes Theorem in terms of the impact of the evidence and how decision-makers may use of the information. Hence, it is worth a bit of extended discussion.

The size of the relevant population of potential writers can be incorporated into the reasoning process in a couple of different ways.

First, consider an approach where the information about a limited set of potential writers becomes part of the propositions. We might consider something like the following:

$H_{1}$ = the suspect wrote the questioned signature
$H_{2}$ = one of the other 3 writers wrote the questioned signature

In this approach the evaluation of the evidence, $E$, must take into account the smaller population under $H_{2}$ when assessing the LR. In practical terms, this would not be that difficult to do assuming one has adequate samples from each of the possible writers. In particular, the situation is greatly simplified because the examiner need not (and should not) give any consideration to any other writer outside the group of 4!

In a very real sense, the examiner is looking to see if the suspect is the writer based upon whose samples are ‘closest’ to the questioned sample in that limited group. And, of course, assessing how much more (or less) the evidence supports that belief than the belief it was one of the others. With proper sampling the LR that results should be extremely large, or small, in support of the belief the suspect is the writer rather than someone else.

At this point I can hear the screams of anguish coming from document examiners all around the world. Most examiners will say this type of thing isn’t even possible, let alone appropriate or correct. They are wrong.

This approach is, of course, predicated on the assumption that the limited population actually contains the actual writer which may or may not be reasonable. If it does, then this approach is fine. However, if that is not the case (and the ‘real’ author is not present in the group of 4) what will happen? Assuming that the examiner does a good job with the comparison, the LR should come out having a value relatively close to 1 meaning that the evidence does not favour either of these particular propositions. It would, in fact, be a situation where both the numerator and denominator of the LR attain very low, but approximately equal, values.

The main point to remember is that the trier must assess and approve of the propositions that were used by the examiner — hence, the necessity of clearly spelling out exactly which proposition set was actually used. If the trier agrees that the proposition set was valid, then that is what the examiner should be using.³⁸

The purpose of any expert is to tell the court the value of their evidence in terms of the arguments being made by the parties (i.e., the propositions being argued). If these propositions are, in fact, the points being argued, then such an assessment is completely warranted. Even required of the expert, if they are doing their job properly.

Most examiners find this distasteful, at best. They will argue that the expert should always consider the possibility of ‘another’ possible writer, basically on principle. That is not true, but that is what they believe.

Now, having said the above in order to emphasize the role of the expert, I will explain why I don’t like the ‘restricted population’ approach. My argument is based on very different reasons that are both compelling and logical.

I argue that information about a small population of potential writers can be, and should be, considered as part of the prior odds. That is, before any consideration is given to the evidence in the handwriting samples, the odds are already 1:4 that the suspect is the writer (and, of course, the same odds apply to each of the other three possible writers). Note that this statement relates to the propositions and has nothing to do with the evidence being considered by the examiner. It is a function of the scenario and not the evidence. When the trier is informed of this they can assess the validity of the assertion and, if acceptable, modify their pre-existing belief to accommodate that reality. In other words, it becomes part of the prior odds for the propositions that exist before hearing the FDE evidence.

The examiner examines the evidence and determines which of the two propositions is supported and by how much. Then, having done so, they would testify about the likelihood-ratio so that the trier could modify those prior odds to form a new opinion about the propositions.

This is, in my opinion, a much better way to handle the matter. There is nothing wrong with this approach but, most important, it is much less prone to confusion or misunderstanding. At the same time, it is critical to understand that when using this approach the propositions must not include the smaller population as a factor in their wording, which brings us to the wording of the propositions that should be used. Appropriate propositions would be written something along the lines of:

$H_{1}$ = the suspect wrote the questioned signature
$H_{2}$ = someone other than the suspect wrote the questioned signature

Note in particular that $H_{2}$ is no longer constrained to the group of four writers. So, in a certain sense, this evaluation will be more ‘difficult’ than the earlier version. The essential difference derives from the fact that the population of potential writers under the new, alternative hypothesis is much larger than the group of four writers. It should also be noted that the ‘other 3 writers’ that were previously the focus are still captured under the present definition for $H_{2}$. They fall under the definition of “someone other than the suspect”. It is just that the population of interest under the alternative proposition is now much larger. The end result is likely to be a LR value somewhat closer to one, than the result derived using the other set of propositions. At the same time, it is important to note that is equivalent to what examiners are doing most of the time now. Indeed, this particular set of competing propositions is the most common one used in authorship evaluations. Therefore, no examiner should find this very problematic.

Why is this point so important? Because we must be careful that the effect of working with a limited population is not ‘double-counted’. That information is obviously very important and needs to be taken into account by the trier. If the trier accepts the argument that there are only four possible writers (which they may or may not), they will have that information in their head and apply it to the case, as-is. This should be done before the document examiner evidence is given to them.³⁹ Then the examiner can appear and present their evidence speaking to the value of the handwriting evidence and its effect on their belief about the propositions. Basically, in this approach, the observed evidence would generate an LR which could then be used to modify the prior odds of 1:4 making it either more, less or equally likely that the suspect is the writer.

If, on the other hand, the examiner uses the first approach and incorporates that information into their evaluation, there is a real and significant chance that the court will, perhaps unwittingly, apply the expert’s opinion to the other evidence adding it to the fact there is only a limited population. By adding the information in that manner it would be double-counted.⁴⁰ This is a serious, but easily avoided, problem. The solution: keep this type of information completely separate when presenting it to the trier.

As I said at the outset I consider Huber and Headrick’s text to be the best we have, notwithstanding some issues in parts of it. I hope this post will serve to address a few of those issues.

Footnotes

I am referring specifically to the first edition. Huber and Headrick. Handwriting Identification: Facts and Fundamentals (CRC Press, 1999). ISBN: 084931285X, 9780849312854. I have not yet had an opportunity to review the second edition.
Or evaluative reporting. This approach has a lot of different names.
At the same time, I should note that the process used by most examiners right now might perhaps be described as a form of “inference to the best explanation” simply because examiners extend the inferential process to the expression of an opinion about propositions, rather than solely speaking to the evidence. However, in my opinion any approach that leads to the expression of belief or probability of the propositions, including IBE, is unjustified and inappropriate. The latter is not the position of these authors.
As Lindley puts it “...probability is the unique extension of logic..." Our objective, when reasoning, should be the proper application of logic and probability. That is, we must be “coherent” in our reasoning which requires adherence to the three basic rules of probability: Convexity, Addition and Multiplication (see Lindley, pp. 64-66).
Any ardent mathematician or statistician who takes exception to this statement should review De Finetti’s “Theory of Probability” (in 2 volumes, Wiley and Sons, 1974 and 1975). Alternatively, a more recent and accessible option is Dennis Lindley’s Understanding Uncertainty (Wiley-Interscience, 2007/2014).
In fairness, this is a problem for many writers when they discuss this topic. It is not at all surprising for a text published in 1999.
Of course, the decision-maker in our situation is the trier-of-fact (judge/jury). And the courts have made it relatively clear that such evaluations should be done in whatever manner the trier likes or prefers. Whether or not such evaluations are logical or sensible is really not important, even if that were ideal or preferred. In any event expert should always strive to provide logically correct, accurate and valid information to the court.
Aside from the fact that all of our information is conditional, definitive conclusions require a ‘leap of faith’ or the application of some additional decision heuristic or criteria that extends beyond the posterior odds of the propositions that Bayes Theorem produces. Please note that this statement does not mean that full exclusions (or eliminations) are not possible using this approach. As a rule, however, complete and utter exclusions are not based on the evaluative/comparative process; rather they are based on the pre-existing ‘impossibility’ of one of the proposed propositions. That is, prior probability can be equated to zero for one of the propositions.
In this particular situation, before the evidence was considered there was a 33% chance that the suspect was the writer, rather than someone else. Thus, prior odds are 1:3 (favouring the alternative over the main). The evidence increases the odds of the main by 100x, and 0.33 x 100 = 33.0. Which gives us posterior odds of 33:1 in favour of the main proposition over the alternative.
For the original see Hilton, Ordway, “The Relationship of Mathematical Probability to the Handwriting Identification Problem”. Proceedings of Seminar No. 5, Roy A. Huber ed. (Ottawa: Queens Printer, 1958), pp 121-130. Alternatively, see the IJFDE reprint from 1995.
Original reference in text is Olkin, Ingram, “The Evaluation of Physical Evidence and the Identity Problem by Means of Statistical Probabilities”. Presented at the meeting of the American Academy of Forensic Sciences (Cleveland, February 1958).
Original reference in text is Hilton, see footnote 9 above.
I would have no issue using the term “under the assumption of identity” for the main proposition or using “under the assumption of nonidentity” for the alternative proposition, at least in this particular instance.
These values come directly from Hilton’s article but they are no clearer or understandable in that context.
Arguably, all probabilities are subjective even those based on empirical data, at least to some degree. Thus, the issue becomes a question of how empirically-derived information becomes incorporated into the personal belief of the examiner.
AKA, the Prosecutor’s Fallacy.
Alford, Edwin A., “Identification through Comparison of Numbers.” Identification News, 1965, July; pp 13-14.
There are many ways this type of evaluation might be done and various sets of propositions that might be considered.
I suggest this would ideally be handled before the document examiner evidence is brought in mainly because the limitation of the population will be based on some information or evidence other than the handwriting evidence; for example, motive or opportunity (access to the document) or other similar factors. In other words, it won't be based on evidence from the document examiner at all.
There is, of course, no way to know if this would actually ‘double’ the effect. The term simply means elements of the evidence are considered more than once in the evaluation process.
I am referring specifically to the first edition. Huber and Headrick. Handwriting Identification: Facts and Fundamentals (CRC Press, 1999). ISBN: 084931285X, 9780849312854. I have not yet had an opportunity to review the second edition.
Or evaluative reporting. This approach has a lot of different names.
At the same time, I should note that the process used by most examiners right now might perhaps be described as a form of “inference to the best explanation” simply because examiners extend the inferential process to the expression of an opinion about propositions, rather than solely speaking to the evidence. However, in my opinion any approach that leads to the expression of belief or probability of the propositions, including IBE, is unjustified and inappropriate. The latter is not the position of these authors.
As Lindley puts it “…probability is the unique extension of logic…” Our objective, when reasoning, should be the proper application of logic and probability. That is, we must be “coherent” in our reasoning which requires adherence to the three basic rules of probability: Convexity, Addition and Multiplication (see Lindley, pp. 64-66).
Any ardent mathematician or statistician who takes exception to this statement should review De Finetti’s “Theory of Probability” (in 2 volumes, Wiley and Sons, 1974 and 1975). Alternatively, a more recent and accessible option is Dennis Lindley’s Understanding Uncertainty (Wiley-Interscience, 2007/2014).
In fairness, this is a problem for many writers when they discuss this topic. It is not at all surprising for a text published in 1999.
Of course, the decision-maker in our situation is the trier-of-fact (judge/jury). And the courts have made it relatively clear that such evaluations should be done in whatever manner the trier likes or prefers. Whether or not such evaluations are logical or sensible is really not important, even if that were ideal or preferred. In any event expert should always strive to provide logically correct, accurate and valid information to the court.
Aside from the fact that all of our information is conditional, definitive conclusions require a ‘leap of faith’ or the application of some additional decision heuristic or criteria that extends beyond the posterior odds of the propositions that Bayes Theorem produces. Please note that this statement does not mean that full exclusions (or eliminations) are not possible using this approach. As a rule, however, complete and utter exclusions are not based on the evaluative/comparative process; rather they are based on the pre-existing ‘impossibility’ of one of the proposed propositions. That is, prior probability can be equated to zero for one of the propositions.
In this particular situation, before the evidence was considered there was a 33% chance that the suspect was the writer, rather than someone else. Thus, prior odds are 1:3 (favouring the alternative over the main). The evidence increases the odds of the main by 100x, and 0.33 x 100 = 33.0. Which gives us posterior odds of 33:1 in favour of the main proposition over the alternative.
For the original see Hilton, Ordway, “The Relationship of Mathematical Probability to the Handwriting Identification Problem”. Proceedings of Seminar No. 5, Roy A. Huber ed. (Ottawa: Queens Printer, 1958), pp 121-130. Alternatively, see the IJFDE reprint from 1995.
Original reference in text is Olkin, Ingram, “The Evaluation of Physical Evidence and the Identity Problem by Means of Statistical Probabilities”. Presented at the meeting of the American Academy of Forensic Sciences (Cleveland, February 1958).
Original reference in text is Hilton, see footnote 9 above.
I would have no issue using the term “under the assumption of identity” for the main proposition or using “under the assumption of nonidentity” for the alternative proposition, at least in this particular instance.
These values come directly from Hilton’s article but they are no clearer or understandable in that context.
Arguably, all probabilities are subjective even those based on empirical data, at least to some degree. Thus, the issue becomes a question of how empirically-derived information becomes incorporated into the personal belief of the examiner.
AKA, the Prosecutor’s Fallacy.
Alford, Edwin A., “Identification through Comparison of Numbers.” Identification News, 1965, July; pp 13-14.
There are many ways this type of evaluation might be done and various sets of propositions that might be considered.
I suggest this would ideally be handled before the document examiner evidence is brought in mainly because the limitation of the population will be based on some information or evidence other than the handwriting evidence; for example, motive or opportunity (access to the document) or other similar factors. In other words, it won’t be based on evidence from the document examiner at all.
There is, of course, no way to know if this would actually ‘double’ the effect. The term simply means elements of the evidence are considered more than once in the evaluation process.

Share this:

Footnotes

Leave a Reply Cancel reply