Introduction

“It is important that we join with colleagues in other disciplines to develop measures of the outcome of surgery..... We should attempt to measure our success in these ways. Many of the technical decisions in clinical surgical management are based on the weighing of different outcomes, and we should ensure that these incorporate patient values. Patient opinions are important in determining these relative values.” (Devlin1990)

This quotation shows the importance of outcome measurement and the 'patient centred approach' that is such an important part of the NHS Plan published in 2000; the importance of this area was recognised a decade before the plan's publication!

Surgical practice in the last few years has moved to objective assessment and accountability in the context of clinical governance. Since the introduction of the NHS Plan, the establishment of CHI, NICE NCAA etc it is ever more important to show that one is following accepted evidence based practice and also that one is striving to perform towards nationally accepted standards of practice. The focus of clinical effectiveness and quality improvement now lies with the individual patient which now defines the patient centred approach.

To achieve these ideals one has to implement accepted proven practice and also keep patients informed of their options. It is no longer acceptable for the 'doctor knows best' approach to patient care. The job of the clinician (esp surgeon) in this age is to provide patients with the information they need in a format they understand so they can make their own decisions on treatment. This is the guiding principle of informed consent. How can one give informed consent without providing an estimate of operative risk? Traditionally, when risk has been given, it has been done on the basis of unadjusted data from observed outcomes in the form of studies. This of course will not take into account the individual patients' risk based on their co-morbid risk factors - this is where risk adjustment in surgery comes into its own.

Operative mortality will vary between secondary care units for multiple reasons; case-mix, co-morbid disease, type of presentation etc being the most relevant and important measure, sub-optimal surgical care despite considerable recent media interest is not the only reason for varying mortality rates. Risk stratification by the use of mortality prediction models are have the potential to compensate for the above factors and therefore allow a better means of comparing performance between hospitals. This is not a new concept, Florence Nightingale made note of this over a hundred years ago:

“in the first place, different hospitals receive very different proportions of the same class of diseases. The ages in one hospital may differ considerably from the ages in another. And the state of the cases on admission may differ very much in each hospital. These elements affect considerably the result of treatment altogether apart from the sanitary state of hospitals”

Prediction Systems

ASA Grading

The ASA grading facilitates the division of patients into one of five categories based on their general medical history and examination without requiring any specific tests. It is simple and has been widely used since 1963 when it was first proposed. It is very effective and, when the age is also taken into account, there is an additive, predictive effect. The drawback of ASA is that it is subjective and therefore open to manipulation. The following table shows how mortality varies with ASA grade in 2 conditions.

ASA Grade Definition
Mortality (%) - in general
Mortality (%) - large bowel obstruction due to colorectal cancer
I Normal healthy individual
0.05
2.6
II Mild systemic disease that does not limit activity
0.4
7.6
III Severe systemic disease that limits activity but is not incapacitating
4.5
23.9
IV Incapacitating systemic disease which is constantly life-threatening
25
42
V Moribund, not expected to survive 24 hours with or without surgery
50
66.7

 

APACHE Scoring

Acute Physiology And Chronic Health Evaluation - The APACHE scoring systems are used almost exclusively in the intensive therapy setting. APACHE has now gone through 3 versions. The original APACHE (introduced 1981) used 34 physiological variables taking the worst value in the first 24 hours of admission to ITU. APACHE II (introduced 1985) simplified this to an acute physiological score from 12 physiological variables added to a score derived for age and chronic health.

The APACHE systems can be used to provide information on the risks of death for a group of patients suffering from a specific disease category that may require admission to an intensive care unit, they cannot be used as predictors of the risk of death in individual patients.

APACHE II system is widely used in the UK but it is designed principally for the acutely ill. Its use in the elective surgical patient is questionable. The physiological variable used are listed below:

        1. Temperature - core
        2. Mean arterial pressure
        3. Heart rate
        4. Respiratory rate - ventilated or non-ventilated
        5. Oxygenation
        6. FIO2 > 0.5 record A-aDO2
        7. FIO2 < 0.5 record PaO2
        8. Arterial pH
        9. Serum sodium
        10. Serum potassium
        11. Serum creatinine
        12. Haematocrit
        13. White blood cell count
        14. Glasgow coma score

APACHE III has recently (2001) been introduced to address some of the flaws of APACHE II. It is based upon data from 40 hospitals and over 17,000 patients. Although APACHE III resembles APACHE II, it includes new variables such as prior treatment location and the disease requiring ICU admission. In APACHE III scoring, the patient's age and chronic health history are worth up to 47 points. Within 24 hours of ICU admission, 17 physiologic variables are measured and may add up to a maximum of an additional 252 points. The resulting total score, in combination with prior treatment location and principal ICU diagnosis, is entered into a logistic regression equation. The equation (which is proprietary) provides a predicted mortality. A unique feature of APACHE III is that it uses daily updates of clinical information to provide a refinement of predicted mortality.

SAPS (Simplified Acute Physiology Score) is a derivation of the APACHE score, using 14 of the original 34 variables to predict death and is comparable to APACHE II. The score is assigned after 24 hours of ICU admission. The most recent version, SAPS II, (a revision of this score using 13 physiological variables as well as type of admission (elective or emergency; medical or surgical) and chronic health points) derives this score from the following 17 variables:

    • Twelve physiologic variables
    • Age
    • Type of admission
    • Three underlying disease variables (acquired immune deficiency syndrome, metastatic cancer, and hematologic malignancy)

The resulting SAPS II score is then entered into a published mathematical formula whose solution gives the numerical value of the predicted hospital mortality. SAPS II is based upon data from 8500 patients and has been validated on a sample of 4,500 patients.

Relevant References:
1. Cowen, JS, Kelley, MA. Predicting intensive care unit outcome: Errors and bias in using predictive scoring systems. Crit Care Clin 1994; 10:53.
2. Escarce, JJ, Kelley, MA. Admission source to the medical intensive care unit predicts hospital death independent of APACHE II score. JAMA 1990; 264:2389
3. Knaus, WA, Wagner, DP, Draper, EA, et al. The APACHE III prognostic system: Risk prediction of hospital mortality for critically ill hospitalized adults. Chest 1991; 100:1619.

 

POSSUM Scoring Systems

Background information on POSSUM can be found on this site - here.

 

Veterans Affairs Surgical Risk Study

The Veterans Affairs (VA) Surgical Risk Study is probably the largest and most contemporary risk adjustment programme which has been implemented in the US. The study was conducted in 44 Veterans Affairs Medical Centres and included 87,078 major non-cardiac operations performed under general, spinal or epidural anaesthesia between 1991 and 1993. The main outcome measures were 30-day operative mortality and operative morbidity. The investigators used logistic regression analysis to provide risk-adjustment models for all operations for eight surgical specialities and compared surgical performance using observed to expected mortality and morbidity ratios.

Patient risk factors predictive of operative mortality in general surgery included serum albumin, ASA grade, emergency operation, disseminated cancer, age, presence of ascites, urea, ?GT, functional status and platelets. In total, the VA group identified 26 pre-operative variables for predicting operative mortality and 28 pre-operative variables associated with post-operative morbidity in general surgery. Considerable variability in unadjusted mortality rates for all operations was observed across the 44 hospitals (1.2-5.4%). The major limitation of the VA study is that the patient population was largely middle aged to elderly men, who were generally socioeconomically disadvantaged and had previously served in the military. Such models may not be generalisable to women or to non-VA population and there are no studies that utilise the VA models in the United Kingdom. The table below gives mortality and morbidity risks for various surgical categories:

 

Type of Surgery

Mortality (%)

Morbidity (%)

General

5.6

24.4

Orthopedics (spine, musculoskeletal)

1.8

11.7

Urology (urinary system)

0.7

8.5

Peripheral Vascular (blood vessels)

4.6

29.6

Neurosurgery (nervous system)

2.4

14.2

Otolaryngology (ear nose throat)

2.9

15.7

Thoracic (chest, non cardiac)

5.9

23.5

Plastic (cosmetic, reconstruction etc)

1.3

15.9

Average

3.1

17.4

 

Relevant References:
Khuri SF, Daley J, Henderson W et al. Risk adjustment of the postoperative mortality rate for the comparative assessment of the quality of surgical care: Results of the National Veterans Affairs surgical risk study. J American College of Surgeons. 1997.

 

Other Risk Factors

The Influence of Age

It is known that the rate of mortality increases almost exponentially with age through most of the adult age range but this tends to slow down at very old ages. A possible explanation for this is the selective survival of healthier individuals to older ages. Age is one variable which is recorded in most cases and although the physiology of aging is poorly understood. The figure below based on UK ONS data show how the population is getting older and life expectency is increasing:

mortality risk

In surgery older patients are more likely to have worse clinical outcomes than younger patients. The example below shows ACPGBI data relating to mortality by age for obstructing colorectal cancers:

Age Range (yrs)>
Mortality (%)
<30
20.0
30-39
0
40-49
0
50-59
5.6
60-69
8.1
70-79
16.5
80-89
26.5
>89
34.9

A special class of patients is the very elderly (see age group >89 above). Many believe that these patients are physiologically different from younger patients, possibly due to lack of any physiological reserve, and they need to be addressed as a separate sub-group.

 

Operative Urgency

It is vital especially when using the prediction models on this website that one understands what is meant by operative urgency. The following is the UK NCEPOD definitions:

  • Elective Surgery: Carried out at a time to suit the patient and surgeon
  • Urgent Surgery: Carried out within 24-hrs of admission
  • Emergency Surgery: Carried out within 2-hrs of admission or in conjunction with resuscitation

If surgery is performed as an emergency it is an important factor in explaining post-operative mortality and long-term survival. The precise definition of emergency surgery is critical and the NCEPOD classification above is the most commonly used. It is important to note that many patients who have an emergency admission do not have emergency surgery and their risk of dying from surgery approximates to that of an elective case. The post-operative mortality of a true emergency case was twice that of an elective/scheduled operation in the ACPGBI Malignant Large Bowel Obstruction audit (20.0% vs. 12.9%) - see below:

NCEPOD
Mortality (%)
Elective
12.8%
Urgent
17.2%
Emergency
20.0

 

It is therefore important that emergency should refer to surgery rather than the mode of admission.

 

Malignancy

It has long been known that malignancy increases mortality in surgery. For example in colorectal surgery Dukes’ staging is used to stage most bowel cancers. Dukes’ stage D has been defined as any metastatic disease in the abdomen alone, or any systemic or residual local disease. In order to overcome the shortcomings of the Dukes’ classification the TNM systems are increasingly being applied to stage colorectal cancer patients. Mortality by Dukes stage from the ACPGBI MLBO Audit is shown below:

Duke's Stage
Mortality (%)
A
8.7
B
11.3
C
12.3
D
26.7

The TNM staging system is now in its 5th revision 1997) containing rules of classification and staging that correspond with those of the 5th edition of the American Joint Committee on Cancer, Cancer Staging Manual. The TNM classification system describes the anatomic extent of cancer. It is based on the fact that the choice of treatment and the chance of survival is related to the extent of the of the tumour at the primary site (T), the presence or absence of tumour at the regional lymph nodes (N), and the presence of metastasis beyond the regional lymph nodes. Tumour staging can be classified prior to treatment, i.e. clinical staging (cTNM) and after resection, i.e. pathological TNM (pTNM). With regard to short-term outcomes (30-day operative mortality), Dukes’ A, B or C do not usually play a significant contribution to the risk estimate whereas Dukes’ D is an independent predictor of outcome in colorectal cancer surgery.

 

The Operating Surgeon

In the ACGBI MLBO Audit there was no difference in outcome between Consultant Surgeons and Trainee Surgeons:

Grade
Mortality (%)
Consultant
16.4%
Trainee
13.5%
Other
16.9%

Data on 5-yr survival post resection depending on grade of trainee also shows no signioficant difference:

trainees

In conclusion appropriately trained and supervised surgeons will have comparable results and therefore this factor does not appear in the models.

 

Hierarchical regression models

Hierarchical models are models specifically geared toward the statistical analysis of data that have a hierarchical or clustered structure. Such data arise routinely in medical research and clinical practice with patients nested within clinicians or hospitals.

Older approaches tend to simply ignore the hierarchical structure of the data and performing the analysis by disaggregating all the data to the lowest level and subsequently applying standard analysis methods. The hierarchical regression model is known in the research literature under a variety of names such as ‘multilevel model’, ‘random coefficient model’ or ‘variance component model’. These models use different levels of hierarchy, for example placing the individual patient related risk factors (subscript i) at the lowest level named the “patient level” while other explanatory variables which are hospital related (subscript j) are placed in the 2nd level and finally regional data (subscript k) are entered at the highest level “3rd level” as seen in the diagram below:

heirarchical regression

 

Conceptually the model can be viewed as a hierarchical system of regression equations as shown above. The hierarchical nature of the analysis allows for the possibility that patients from the same hospital may have more similar outcomes than patients chosen at random from different units. Using this approach we can explicitly model the variation between regions or centres and produce individual regression lines for each unit and region. This is the model the ACPGBI Colorectal Cancer Model is based on.

 

Artificial Neural Networks

Artificial Neural Networks (ANNs) are systems loosely modeled on the human brain. Biological neural networks are much more complicated than the mathematical models we use for ANNs. The field goes by many names, such as connectionism, parallel distributed processing, neuro-computing, natural intelligent systems, machine learning algorithms, and artificial neural networks. It is an attempt to simulate within specialized hardware or sophisticated software, the multiple layers of simple processing elements called neurons. Each neuron is linked to certain of its neighbours with varying coefficients of connectivity that represent the strengths of these connections. Learning is accomplished by adjusting these strengths to cause the overall network to output appropriate results. Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze.

ANNs are an abstract simulation of a real nervous system that contains a collection of neuron units communicating with each other via axon connections. Such a model bears a strong reasemblance to axons and dendrites in a nervous system as seen in the diagram below:

Artificial Neural Network

The smart computer, good robot and evil android have been staple ingredients for just about every science fiction film, from the 1950s until the present day. And yet, despite the real-life efforts of top scientists and renowned academics, we’ve barely begun to figure out how to make intelligent machines or even simulations via computer software. ANNs are collections of mathematical models that emulate some of the observed properties of biological nervous systems and draw on the analogies of adaptive biological learning. ANN models contain layers of simple computing nodes or processing elements (PE) that operate as non-linear summing devices (see above diagram). These nodes are heavily interconnected by weighted connection lines, and the weights are adjusted when the data are presented to the network during a “training” process. Successful training can result in ANNs that perform tasks such as predicting an output value, approximating a function and recognising patterns in large datasets.

Although ANNs have been around since the late 1950's, it wasn't until the mid-1980's that algorithms became sophisticated enough for general applications. There are many different types of neural networks each having its own characteristics. There is no single ANN which is optimal for all problems, and in the last few years the literature on the use of ANNs in biomedical sciences has grown exponentially. Extensive examples are given of the medical application of ANNs in medical diagnosis (e.g. myocardial infarction, appendicitis), imaging (chest radiographs, breast US and mammography), pathology screening (Papanicolaou smears, breast FNAc), waveform analysis (electrocardiographic, electromyographic) and prediction of outcome such as cancer patients, critically ill patients and trauma patients.

There are multitudes of different types of ANNs. Some of the more popular include the multilayer perceptron (MLP) which is generally trained with the backpropagation of error algorithm, learning vector quantization, radial basis function, etc etc. Back-Propagated Delta Rule Networks (BP) (sometimes known and multi-layer perceptrons (MLPs)) and Radial Basis Function Networks (RBF) are both well-known developments of the Delta rule for single layer networks (itself a development of the Perceptron Learning Rule). Some ANNs are classified as feedforward while others are recurrent (i.e., implement feedback) depending on how data is processed through the network. Another way of classifying ANN types is by their method of learning (or training), as some ANNs employ supervised training while others are referred to as unsupervised or self-organizing. Supervised training is analogous to a student guided by an instructor. Unsupervised algorithms essentially perform clustering of the data into similar groups based on the measured attributes or features serving as inputs to the algorithms. This is analogous to a student who derives the lesson totally on his or her own. ANNs can be implemented in software or in specialized hardware.

Neural networks cannot do anything that cannot be done using traditional computing techniques, but they can do some things which would otherwise be very difficult. In particular, they can form a model from their training data (or possibly input data) alone.