Introduction
“It is important that we join with colleagues in other disciplines
to develop measures of the outcome of surgery..... We should attempt
to measure our success in these ways. Many of the technical decisions
in clinical surgical management are based on the weighing of different
outcomes, and we should ensure that these incorporate patient values.
Patient opinions are important in determining these relative values.”
(Devlin1990)
This quotation shows the importance of outcome measurement and
the 'patient centred approach' that is such an important part of
the NHS Plan published in 2000; the importance of this area was
recognised a decade before the plan's publication!
Surgical practice in the last few years has moved to objective
assessment and accountability in the context of clinical governance.
Since the introduction of the NHS Plan, the establishment of CHI,
NICE NCAA etc it is ever more important to show that one is following
accepted evidence based practice and also that one is striving to
perform towards nationally accepted standards of practice. The focus
of clinical effectiveness and quality improvement now lies with
the individual patient which now defines the patient centred approach.
To achieve these ideals one has to implement accepted proven practice
and also keep patients informed of their options. It is no longer
acceptable for the 'doctor knows best' approach to patient care.
The job of the clinician (esp surgeon) in this age is to provide
patients with the information they need in a format they understand
so they can make their own decisions on treatment. This is the guiding
principle of informed consent. How can one give informed consent
without providing an estimate of operative risk? Traditionally,
when risk has been given, it has been done on the basis of unadjusted
data from observed outcomes in the form of studies. This of course
will not take into account the individual patients' risk
based on their co-morbid risk factors - this is where risk adjustment
in surgery comes into its own.
Operative mortality will vary between secondary care units for
multiple reasons; case-mix, co-morbid disease, type of presentation
etc being the most relevant and important measure, sub-optimal surgical
care despite considerable recent media interest is not the only
reason for varying mortality rates. Risk stratification by the use
of mortality prediction models are have the potential to compensate
for the above factors and therefore allow a better means of comparing
performance between hospitals. This is not a new concept, Florence
Nightingale made note of this over a hundred years ago:
“in the first place, different hospitals receive very
different proportions of the same class of diseases. The ages
in one hospital may differ considerably from the ages in another.
And the state of the cases on admission may differ very much in
each hospital. These elements affect considerably the result of
treatment altogether apart from the sanitary state of hospitals”
Prediction Systems
ASA Grading
The ASA grading facilitates the division of patients into one of
five categories based on their general medical history and examination
without requiring any specific tests. It is simple and has been
widely used since 1963 when it was first proposed. It is very effective
and, when the age is also taken into account, there is an additive,
predictive effect. The drawback of ASA is that it is subjective
and therefore open to manipulation. The following table shows how
mortality varies with ASA grade in 2 conditions.
| ASA |
Grade Definition |
Mortality
(%) - in general |
Mortality
(%) - large bowel obstruction due to colorectal cancer |
| I |
Normal healthy individual |
0.05 |
2.6 |
| II |
Mild systemic disease that does not limit activity |
0.4 |
7.6 |
| III |
Severe systemic disease that limits activity but is not incapacitating |
4.5 |
23.9 |
| IV |
Incapacitating systemic disease which is constantly life-threatening |
25 |
42 |
| V |
Moribund, not expected to survive 24 hours with or without
surgery |
50 |
66.7 |
APACHE Scoring
Acute Physiology And Chronic Health Evaluation - The APACHE scoring
systems are used almost exclusively in the intensive therapy setting.
APACHE has now gone through 3 versions. The original APACHE (introduced
1981) used 34 physiological variables taking the worst value in
the first 24 hours of admission to ITU. APACHE II (introduced 1985)
simplified this to an acute physiological score from 12 physiological
variables added to a score derived for age and chronic health.
The APACHE systems can be used to provide information on the risks
of death for a group of patients suffering from a specific disease
category that may require admission to an intensive care unit, they
cannot be used as predictors of the risk of death in individual
patients.
APACHE II system is widely used in the UK but it is designed principally
for the acutely ill. Its use in the elective surgical patient is
questionable. The physiological variable used are listed below:
- Temperature - core
- Mean arterial pressure
- Heart rate
- Respiratory rate - ventilated or non-ventilated
- Oxygenation
- FIO2 > 0.5 record A-aDO2
- FIO2 < 0.5 record PaO2
- Arterial pH
- Serum sodium
- Serum potassium
- Serum creatinine
- Haematocrit
- White blood cell count
- Glasgow coma score
APACHE III has recently (2001) been introduced to address some
of the flaws of APACHE II. It is based upon data from 40 hospitals
and over 17,000 patients. Although APACHE III resembles APACHE II,
it includes new variables such as prior treatment location and the
disease requiring ICU admission. In APACHE III scoring, the patient's
age and chronic health history are worth up to 47 points. Within
24 hours of ICU admission, 17 physiologic variables are measured
and may add up to a maximum of an additional 252 points. The resulting
total score, in combination with prior treatment location and principal
ICU diagnosis, is entered into a logistic regression equation. The
equation (which is proprietary) provides a predicted mortality.
A unique feature of APACHE III is that it uses daily updates of
clinical information to provide a refinement of predicted mortality.
SAPS (Simplified Acute Physiology Score) is a derivation of the
APACHE score, using 14 of the original 34 variables to predict death
and is comparable to APACHE II. The score is assigned after 24 hours
of ICU admission. The most recent version, SAPS II, (a revision
of this score using 13 physiological variables as well as type of
admission (elective or emergency; medical or surgical) and chronic
health points) derives this score from the following 17 variables:
- Twelve physiologic variables
- Age
- Type of admission
- Three underlying disease variables (acquired immune deficiency
syndrome, metastatic cancer, and hematologic malignancy)
The resulting SAPS II score is then entered into a published mathematical
formula whose solution gives the numerical value of the predicted
hospital mortality. SAPS II is based upon data from 8500 patients
and has been validated on a sample of 4,500 patients.
Relevant References:
1. Cowen, JS, Kelley, MA. Predicting intensive care unit outcome:
Errors and bias in using predictive scoring systems. Crit Care Clin
1994; 10:53.
2. Escarce, JJ, Kelley, MA. Admission source to the medical intensive
care unit predicts hospital death independent of APACHE II score.
JAMA 1990; 264:2389
3. Knaus, WA, Wagner, DP, Draper, EA, et al. The APACHE III prognostic
system: Risk prediction of hospital mortality for critically ill
hospitalized adults. Chest 1991; 100:1619.
POSSUM Scoring Systems
Background information on POSSUM can be found on this site - here.
Veterans Affairs Surgical Risk Study
The Veterans Affairs (VA) Surgical Risk Study is probably the largest
and most contemporary risk adjustment programme which has been implemented
in the US. The study was conducted in 44 Veterans Affairs Medical
Centres and included 87,078 major non-cardiac operations performed
under general, spinal or epidural anaesthesia between 1991 and 1993.
The main outcome measures were 30-day operative mortality and operative
morbidity. The investigators used logistic regression analysis to
provide risk-adjustment models for all operations for eight surgical
specialities and compared surgical performance using observed to
expected mortality and morbidity ratios.
Patient risk factors predictive of operative mortality in general
surgery included serum albumin, ASA grade, emergency operation,
disseminated cancer, age, presence of ascites, urea, ?GT, functional
status and platelets. In total, the VA group identified 26 pre-operative
variables for predicting operative mortality and 28 pre-operative
variables associated with post-operative morbidity in general surgery.
Considerable variability in unadjusted mortality rates for all operations
was observed across the 44 hospitals (1.2-5.4%). The major limitation
of the VA study is that the patient population was largely middle
aged to elderly men, who were generally socioeconomically disadvantaged
and had previously served in the military. Such models may not be
generalisable to women or to non-VA population and there are no
studies that utilise the VA models in the United Kingdom. The table
below gives mortality and morbidity risks for various surgical categories:
| Type of Surgery |
Mortality
(%) |
Morbidity
(%) |
| General |
5.6 |
24.4 |
| Orthopedics (spine, musculoskeletal) |
1.8 |
11.7 |
| Urology (urinary system) |
0.7 |
8.5 |
| Peripheral Vascular (blood vessels) |
4.6 |
29.6 |
| Neurosurgery (nervous system) |
2.4 |
14.2 |
| Otolaryngology (ear nose throat) |
2.9 |
15.7 |
| Thoracic (chest, non cardiac) |
5.9 |
23.5 |
| Plastic (cosmetic, reconstruction etc) |
1.3 |
15.9 |
| Average |
3.1 |
17.4 |
Relevant References:
Khuri SF, Daley J, Henderson W et al. Risk adjustment of the postoperative
mortality rate for the comparative assessment of the quality of
surgical care: Results of the National Veterans Affairs surgical
risk study. J American College of Surgeons. 1997.
Other Risk Factors
The Influence of Age
It is known that the rate of mortality increases almost exponentially
with age through most of the adult age range but this tends to slow
down at very old ages. A possible explanation for this is the selective
survival of healthier individuals to older ages. Age is one variable
which is recorded in most cases and although the physiology of aging
is poorly understood. The figure below based on UK ONS data show
how the population is getting older and life expectency is increasing:

In surgery older patients are more likely to have worse clinical
outcomes than younger patients. The example below shows ACPGBI data
relating to mortality by age for obstructing colorectal cancers:
Age Range
(yrs) |
Mortality
(%) |
<30 |
20.0 |
30-39 |
0 |
40-49 |
0 |
50-59 |
5.6 |
60-69 |
8.1 |
70-79 |
16.5 |
80-89 |
26.5 |
>89 |
34.9 |
A special class of patients is the very elderly (see age group
>89 above). Many believe that these patients are physiologically
different from younger patients, possibly due to lack of any physiological
reserve, and they need to be addressed as a separate sub-group.
Operative Urgency
It is vital especially when using the prediction models on this
website that one understands what is meant by operative urgency.
The following is the UK NCEPOD definitions:
- Elective Surgery: Carried out at a time to
suit the patient and surgeon
- Urgent Surgery: Carried out within 24-hrs of
admission
- Emergency Surgery: Carried out within 2-hrs
of admission or in conjunction with resuscitation
If surgery is performed as an emergency it is an important factor
in explaining post-operative mortality and long-term survival. The
precise definition of emergency surgery is critical and the NCEPOD
classification above is the most commonly used. It is important
to note that many patients who have an emergency admission do not
have emergency surgery and their risk of dying from surgery approximates
to that of an elective case. The post-operative mortality of a true
emergency case was twice that of an elective/scheduled operation
in the ACPGBI Malignant Large Bowel Obstruction audit (20.0% vs.
12.9%) - see below:
| NCEPOD |
Mortality (%) |
| Elective |
12.8% |
| Urgent |
17.2% |
| Emergency |
20.0 |
It is therefore important that emergency should refer to surgery
rather than the mode of admission.
Malignancy
It has long been known that malignancy increases mortality in surgery.
For example in colorectal surgery Dukes’ staging is used to
stage most bowel cancers. Dukes’ stage D has been defined
as any metastatic disease in the abdomen alone, or any systemic
or residual local disease. In order to overcome the shortcomings
of the Dukes’ classification the TNM systems are increasingly
being applied to stage colorectal cancer patients. Mortality by
Dukes stage from the ACPGBI MLBO Audit is shown below:
Duke's Stage |
Mortality
(%) |
A |
8.7 |
B |
11.3 |
C |
12.3 |
D |
26.7 |
The TNM staging system is now in its 5th revision 1997) containing
rules of classification and staging that correspond with those of
the 5th edition of the American Joint Committee on Cancer, Cancer
Staging Manual. The TNM classification system describes the anatomic
extent of cancer. It is based on the fact that the choice of treatment
and the chance of survival is related to the extent of the of the
tumour at the primary site (T), the presence or absence of tumour
at the regional lymph nodes (N), and the presence of metastasis
beyond the regional lymph nodes. Tumour staging can be classified
prior to treatment, i.e. clinical staging (cTNM) and after resection,
i.e. pathological TNM (pTNM). With regard to short-term outcomes
(30-day operative mortality), Dukes’ A, B or C do not usually
play a significant contribution to the risk estimate whereas Dukes’
D is an independent predictor of outcome in colorectal cancer surgery.
The Operating Surgeon
In the ACGBI MLBO Audit there was no difference in outcome between
Consultant Surgeons and Trainee Surgeons:
Grade |
Mortality
(%) |
Consultant |
16.4% |
Trainee |
13.5% |
Other |
16.9% |
Data on 5-yr survival post resection depending on grade of trainee
also shows no signioficant difference:

In conclusion appropriately trained and supervised surgeons will
have comparable results and therefore this factor does not appear
in the models.
Hierarchical regression models
Hierarchical models are models specifically geared toward the statistical
analysis of data that have a hierarchical or clustered structure.
Such data arise routinely in medical research and clinical practice
with patients nested within clinicians or hospitals.
Older approaches tend to simply ignore the hierarchical structure
of the data and performing the analysis by disaggregating all the
data to the lowest level and subsequently applying standard analysis
methods. The hierarchical regression model is known in the research
literature under a variety of names such as ‘multilevel model’,
‘random coefficient model’ or ‘variance component
model’. These models use different levels of hierarchy, for
example placing the individual patient related risk factors (subscript
i) at the lowest level named the “patient level” while
other explanatory variables which are hospital related (subscript
j) are placed in the 2nd level and finally regional data (subscript
k) are entered at the highest level “3rd level” as seen
in the diagram below:

Conceptually the model can be viewed as a hierarchical system of
regression equations as shown above. The hierarchical nature of
the analysis allows for the possibility that patients from the same
hospital may have more similar outcomes than patients chosen at
random from different units. Using this approach we can explicitly
model the variation between regions or centres and produce individual
regression lines for each unit and region. This is the model the
ACPGBI Colorectal Cancer Model is based
on.
Artificial Neural Networks
Artificial Neural Networks (ANNs) are systems loosely modeled on
the human brain. Biological neural networks are much more complicated
than the mathematical models we use for ANNs. The field goes by
many names, such as connectionism, parallel distributed processing,
neuro-computing, natural intelligent systems, machine learning algorithms,
and artificial neural networks. It is an attempt to simulate within
specialized hardware or sophisticated software, the multiple layers
of simple processing elements called neurons. Each neuron is linked
to certain of its neighbours with varying coefficients of connectivity
that represent the strengths of these connections. Learning is accomplished
by adjusting these strengths to cause the overall network to output
appropriate results. Neural networks, with their remarkable ability
to derive meaning from complicated or imprecise data, can be used
to extract patterns and detect trends that are too complex to be
noticed by either humans or other computer techniques. A trained
neural network can be thought of as an "expert" in the
category of information it has been given to analyze.
ANNs are an abstract simulation of a real nervous system that contains
a collection of neuron units communicating with each other via axon
connections. Such a model bears a strong reasemblance to axons and
dendrites in a nervous system as seen in the diagram below:

The smart computer, good robot and evil android have been staple
ingredients for just about every science fiction film, from the
1950s until the present day. And yet, despite the real-life efforts
of top scientists and renowned academics, we’ve barely begun
to figure out how to make intelligent machines or even simulations
via computer software. ANNs are collections of mathematical models
that emulate some of the observed properties of biological nervous
systems and draw on the analogies of adaptive biological learning.
ANN models contain layers of simple computing nodes or processing
elements (PE) that operate as non-linear summing devices (see above
diagram). These nodes are heavily interconnected by weighted connection
lines, and the weights are adjusted when the data are presented
to the network during a “training” process. Successful
training can result in ANNs that perform tasks such as predicting
an output value, approximating a function and recognising patterns
in large datasets.
Although ANNs have been around since the late 1950's, it wasn't
until the mid-1980's that algorithms became sophisticated enough
for general applications. There are many different types of neural
networks each having its own characteristics. There is no single
ANN which is optimal for all problems, and in the last few years
the literature on the use of ANNs in biomedical sciences has grown
exponentially. Extensive examples are given of the medical application
of ANNs in medical diagnosis (e.g. myocardial infarction, appendicitis),
imaging (chest radiographs, breast US and mammography), pathology
screening (Papanicolaou smears, breast FNAc), waveform analysis
(electrocardiographic, electromyographic) and prediction of outcome
such as cancer patients, critically ill patients and trauma patients.
There are multitudes of different types of ANNs. Some of the more
popular include the multilayer perceptron (MLP) which is generally
trained with the backpropagation of error algorithm, learning vector
quantization, radial basis function, etc etc. Back-Propagated Delta
Rule Networks (BP) (sometimes known and multi-layer perceptrons
(MLPs)) and Radial Basis Function Networks (RBF) are both well-known
developments of the Delta rule for single layer networks (itself
a development of the Perceptron Learning Rule). Some ANNs are classified
as feedforward while others are recurrent (i.e., implement feedback)
depending on how data is processed through the network. Another
way of classifying ANN types is by their method of learning (or
training), as some ANNs employ supervised training while others
are referred to as unsupervised or self-organizing. Supervised training
is analogous to a student guided by an instructor. Unsupervised
algorithms essentially perform clustering of the data into similar
groups based on the measured attributes or features serving as inputs
to the algorithms. This is analogous to a student who derives the
lesson totally on his or her own. ANNs can be implemented in software
or in specialized hardware.
Neural networks cannot do anything that cannot be done using traditional
computing techniques, but they can do some things which would otherwise
be very difficult. In particular, they can form a model from their
training data (or possibly input data) alone.
|