‘Noisy’ expert judgements

Economics Nobel Prize winner Daniel Kahneman has released his latest book, Noise: A Flaw in Human Judgment, co-written with Olivier Sibony and Cass Sunstein.

The book is about the enormous (and undesirable) variability – ‘noise’ – in most experts’ judgements arrived at from evaluating the same information, that you would otherwise expect to result in the same judgements.

Common examples of noisy judgements include: variability in doctors’ diagnoses of identical patients with the same condition, differences in sentences handed down by judges to people who have committed the same crime, and divergences in professors’ grades awarded for the same students’ work.

Noisy judgements are also present in: hiring decisions; social services decisions, e.g. child custody interventions; the awarding of university scholarships, research grants, etc; performance appraisals; strategic decision-making; policy-making; investing; project management; competition judging; economic forecasting; etc.

In all of these examples (and others), when people are excercising their judgement they are informally integrating diverse bits of information into an overall assessment of the decision at hand, where some judgements are evaluative and others are predictive in nature.

Noise is present in all judgements and arises because of differences between individuals (‘judges’) in their expertise, intelligence, preferences, personality, mood, effort, ‘triggers’, etc. According to Kahneman, Sibony and Sunstein: “If there is more than one way to see anything, people will vary in how they see it.”

However, most people are oblivious to the inherent randomness in their judgements and resulting decisions. And so noise usually goes undetected. Morever, when noise is detected it often comes as a shock (especially to people who consider themselves to be ‘experts’!). “Wherever there is judgment there is noise, and more of it than you think.”

Although variability reflecting differences in personal preferences is to be celebrated in creative or personal circles – such as writing a song or choosing a new coat – too much variability in professional settings can be dangerous, especially when the outcome is critical (e.g. risking patients being misdiagnosed and defendants unfairly sentenced).

Arriving at consistent decisions is especially desirable for decisions that are repeated. Doing otherwise – in effect, making decisions arbitrarily or capriciously – is both inefficient (wasteful of resources) and unfair. Decisions should not depend on the ‘good’ or ‘bad’ luck associated with who makes the decision: in effect, turning decision-making into a lottery.

For example, it seems reasonable to expect that when patients present with the same symptoms they should receive very similar, if not the same, diagnoses (preferably, the ‘right’ one!). The notion that a given patient’s diagnosis (and treatment) depends on the lottery of which doctor they see, and it might be different if they’d seen a different doctor, is disturbing.

‘Noise auditing’

Kahneman and his co-authors recommend that decision-makers and organizations conduct ‘noise audits’, to make them aware of the extent of the problem and to acknowledge it.

A noise audit involves case scenarios containing the same information being presented to decision-makers who are asked to make their judgements individually – e.g. diagnosing patients or sentencing defendants. And then these judgements are compared, in the process gauging their variability.

An example of a noise audit supported by 1000minds from a recent study is discussed below.

‘Decision hygiene’ strategies

With the objective of decreasing judgemental noise and increasing ‘decision hygiene’ – more valid and reliable decision-making – Kahneman and his co-authors have several key recommendatons:

  • Ensure decision-makers are as knowledgable and competent in the decision-making application as practically possible.
  • Specify explicit criteria and weights representing their relative importance – i.e. ‘algorithms’ or ‘formulas’ (e.g. using 1000minds) – to be used in support of decision-making.
  • When involving multiple decision-makers, have them express their preferences independently – to avoid groupthink, bandwagon effects, etc – and then combine them.

Reviews and interviews

In which Daniel Kahneman, Olivier Sibony and others discuss the ideas in the book.

Example of a ‘noise audit’

1000minds has been used in an extraordinarily wide range of applications to decrease judgemental ‘noise’ and increase ‘decision hygiene’. Common examples include prioritizing patients for treatment, crimes for investigation, research questions and grant applications for funding and project management.

Many applications begin with a ‘noise audit’ whereby decision-makers are asked to participate in a 1000minds ranking survey involving ranking case scenarios containing the same information.

This audit/survey usually dramatically demonstrates the need for a new and improved decision-making process based on explicit criteria and weights (as recommended by Kahneman and his co-authors, as discussed earlier). Most decision-makers aren’t aware of how much they can differ in their judgements – that they are often highly idiosyncratic.

In an example from the field of disease classification, Mahmoudian et al. (2021) reports on a 1000minds ranking survey involving 34 experts in symptomatic early-stage knee osteoarthritis (OA). The experts comprised 14 orthopaedic surgeons, 13 rheumatologists, 2 general practitioners, 2 sports medicine specialist and 3 physical therapists.

Each expert was presented, in random order, with 20 patient case scenarios and asked to rank them, based on their clinical experience, as to how likely they would classify them as early-stage knee OA patients: from 1st = most likely to 20th = least likely. The case scenarios included the patients’ clinical signs and symptoms as well as socio-demographic characteristics such as age, gender and social circumstances.

Noise audit results

The results from the ranking survey are presented in Table 1 and Figure 1 below. In the table, the number in each cell is the number of participants who ranked the patient case scenario (horizontal axis) in each rank position (vertical axis). For example, 11 of the 34 experts ranked ‘Bob’ 1st (i.e. “most likely to have early-stage knee OA”), 5 experts ranked him 2nd, 3 ranked him 3rd, 5 ranked him 4th and 1 ranked him 5th.

As can be seen in the table and figure, agreement over the rankings for the two most ‘extreme’ patients, ‘Bob’ and ‘Rose’ – on average, 1st and 20th respectively – is higher than for the ‘middle-ranked’ patients. For example, ‘John’ received an almost full range of rankings, from 2nd to 20th – meaning some experts thought he was very likely to have early-stage knee OA, whereas others thought he was unlikely to.

Overall, the distribution of the experts’ rankings indicates a high degree of disagreement for most cases – in other words, the experts’ judgements are very ‘noisy’.

Based on these results, the authors concluded that what is needed – to reduce the noise and increase decision hygiene – are explicit criteria for classifying patients with early-stage knee OA. They are using 1000minds to specify these criteria and determine weights representing their relative importance.

Table 1: Participants’ ranks for the 20 patient case scenarios (n=34)

Figure 1: Participants’ rankings of the 20 patient case scenarios (n=34) – each participant’ ranking is indicated by a colored shape


D Kahneman, O Sibony & O Sunstein (2021), Noise: A Flaw in Human Judgment, Little, Brown Spark.

A Mahmoudian, S Lohmander, M Englund, P Hansen, F Luyten & International Early-stage Knee OA Classification Criteria Expert Panel (2021), abstract, 2021 OARSI Virtual World Congress on Osteoarthritis: OARSI Connect ‘21, “Lack of agreement in experts’ classification of patients with early-stage knee osteoarthritis”, Osteoarthritis and Cartilage 29, S299-S300.

See also

1000minds webinar: Group decision-making – How to denoise decisions & make consistent judgements