Scientific Data Documentation
OPCS, English Mortality File, 1986-1990

Description of the Data Files

Numbers of deaths

 The OPCS historic deaths file contains numbers of deaths in England
 and Wales by year, sex, age and ICD cause.  The international
 classification of diseases was introduced In 1900 and has been used
 in the tabulation of mortality in England and Wales since 1911.
 During the period 1901-1910, when the first revision of the ICD was
 used in some countries, a comparably modern (but unnumbered)
 classification was used in England and Wales.  In
 all there have been nine revisions of the international
 classification.  Each of these provides a coding frame for cause of
 death which differs, to varying degrees, from the previous
 revision.  The years for which each ICD revision was used in the
 tabulation of England and Wales mortality data (or a corresponding
 local equivalent) are shown in Table 1.  It should be emphasized
 that no attempt has been made to include on the data file deaths
 prior to 1901, although deaths were tabulated for all of England
 and Wales from July 1837 using an earlier series of national coding

 While changes in ICD revision have influenced the way in which the
 data are stored, the contents of each year's data have been
 constrained by a second pragmatic consideration.  Material on
 individual deaths, coded for tabulation purposes are only available
 as far back as 1959, when computerized record keeping was
 introduced by OPCS for statistical tabulations of mortality.  Prior
 to that date, no regular series of coded or unpublished tabular
 material was retained in any form.

 In consequence, numbers of deaths for the years 1901-1958 have been
 obtained by simply transcribing, on to computer, figures published
 in the Registrar General's Annual Reviews.  For the years 1921
 onwards the figures used were those in Table 17 of the Annual
 Review; in earlier years figures were copied from an equivalent
 unnumbered table in the annual abstracts.  This method of data
 extraction placed self evident limitations on the degree of
 disaggregatic of the data transcribed on to tape; namely that in
 respect of age and cause groupings, the tapes cannot provide more
 detail than was published at the time.
 By contrast, data on the historic death tape for years after 1958
 were compiled from computer tapes used in the production of annual
 reference volumes.  This provided greater flexibility in the choice
 of cause groupings and it was possible to store the data held on
 file according to individual four digit ICD codes, irrespective of
 whether these appeared in routine publications at the time.

 These two methods of constructing the database also required
 different validation techniques and, correspondingly, different
 means of resolving quality check queries.  The details of these are
 discussed in sections 5-7.

 As indicated above, the choice of age groupings held on tape was
 determined by the availability of published material in earlier
 years; in later years a standard quinquennial age grouping was used
 (sub-dividing deaths under one and those at ages one to four and
 aggregating those at 85 and Over).  The details of the age groups
 available in each ICD revision are shown in Table 2.

 As far as other variables on file are concerned, year of death and
 sex, the data are held in individual years and for each sex

Population Estimates

 From time to time OPCS (or, formerly, the General Register Office)
 revises its estimates of the population at risk of dying in England
 and Wales in each year.  This is done so as to incorporate
 information which becomes available after the publication of
 initial estimates in annual reference volumes.  The major sources
 of additional information are the counts of population at
 subsequent censuses. To accompany the historic deaths tape, a tape
 of comparable population estimates has been prepared from the most
 recently revised figures released by OPCS.

 This tape contains population estimates by sex, year and age in the
 same format as the death tapes.  Information is held for each year
 and each sex separately Age groupings for each year correspond to
 those used for the storage of death data for that year.

 The source of these population estimates for each year is shown in Table 3.

Structure of the Files

 Each count held on the death file is stored in a separate record,
 referenced by the cause, sex, age and year to which it refers.
 Where no deaths occurred in a particular year for a specific cause
 sex and age combination, no record
 is held on file.  All counts held on file are thus positive
 integers. The data for each ICD revision is sorted by year, then by
 cause, then by sex and finally by age group.  The layout of each
 record is given in Table 4.

 The population data are held in an identical format to that used
 for numbers of deaths. However, the cause variable is set to zero
 on all records, so that the file contains one record for each year,
 sex and age combination, and is sorted in the latter order.

 In the analysis of any particular set of mortality data, a pivotal
 role is frequently played by national death rates by age, sex and
 cause.  For example, the analysis of cause specific time trends and
 their correlates generally draws directly upon data of this sort;
 instances of this type of application in the literature include
 studies of lung cancer, stomach cancer, heart disease and suicide.
 At a broader level, international comparisons utilize the rates of
 several nations in order to make meaningful inferences about
 possible causal associations (such as the role of diet, alcohol or
 smoking in particular diseases).  By contrast, local mortality
 studies call upon national rates to provide a reference set of
 background mortality levels against which local experience can be
 measured.  In this context, the tern `local' can usefully be
 broadened to encompass the study of any subset or sub-division of
 the national population; the application of standard national death
 rates to subsets of the population provides one of the most
 frequently used techniques in such analyses.  Topics covered
 include geographic comparisons, occupational comparisons and the
 prospective follow-up of a wide variety of possible risk groups.

 Each of these applications requires a tedious set of calculations
 to be performed. As access to computers becomes more widespread
 these calculations are almost invariably automated.  However, the
 extent to which this can be done is dependent on the availability
 of national rates on computer.

 At an international level, WHO showed considerable foresight in
 setting up a database in the mid seventies which contained death
 rates for a number of countries by age, sex and a short list of
 causes.  This data file extends back to 1950 and has been
 distributed to a number of institutions throughout the world.

 Despite the boldness or this concept, the file has two limitations
 as far as applications at a national and sub-national level are
 concerned.  The range and specificity of the available causes ore
 limited to what was readily comparable over time and between
 countries.  Secondly, the time span covered by the data Is limited
 to recent history; many developed countries published comparable
 mortality statistics for a number of years before 1950.  So as to
 extend the range of analyses that could be performed directly on
 computer a number of similar, but more detailed files were set
 up at various institutions throughout the world.

 OPCS, in reviewing both its own needs and the contribution which it
 could make to the greater availability of national death rates on
 computer decided, in 1979, that rather than devoting scarce
 analytic resources to constructing another synthetic database, it
 would use its unique position to construct a database comprising
 only the basic building bricks for constructing any aggregate
 database.  In this instance the basic components or the database
 comprise numbers of deaths, held to the lowest level to which cause
 was routinely coded.  For recent years, this comprises four digit
 codes laid down in the International Classification of Disease
 (ICD) operational at the time the death was registered.  The
 calculation of rate is mode possible with this set of data by the
 provision of a comparable tape of estimates of population at risk.


 Individual years are held on file with each represented by
 the final two digits of the year number; thus 1967 appears as 67.


 Figures appear separately for each sex on the file; the code for
 males is 1 and that for females is 2.


 For most years, data are held in the following age groups: under
 one, one to four, five year age groups from five to 84 and then all
 aged 85 and over.  Exceptions to this rule are the years 1901-1910
 and those covered by the third, fourth and part of the fifth
 revisions of the ICD (1921-1941).  During the former period, at
 ages 25-84 figures are stored in ten year age groups (rather than
 five year groups) while, for the latter period, figures at ages 80
 and over are not sub-divided on the deaths tape for any of the 21
 years and are not sub-divided on the population tape for the years
 1921-1939. In other respects, data by age in these time periods is
 stored identically to that in other time periods.

 Reference values used on the data tape for each age group are held
 in coded form.  The codes used for each of the 26 possible age
 groupings appearing on the file are shown in Table 5.

Coding Scheme

 Since the introduction of the sixth revision of the ICD, cause
 codes have been represented in a broadly similar manner in each
 revision of the classification.  The first three digits of the four
 digit code indicate the cause group to which the death is assigned;
 the fourth digit is then utilized, for some but not all cause
 groups, to provide further amplification of this basic
 classification. The first three digits are always represented as a
 number between 000 and 999.  OPCS convention in publications is to
 represent the last digit by either a number when the sub-division
 is shown or by a hyphen when no sub-division is shown in the ICD.
 For the convenience of those producing computer tabulations, on the
 historic deaths file for ICD revision six to nine, hypens have been
 erplaced by zeros.  The computer are thus purely numeric in the range

 According to these four revisions, deaths which are not due to
 natural causes may be assigned two codes; one covering the external
 cause of injury and the other the nature of injury.  To avoid the
 possibility that deaths may be double counted, only the counts for
 external causes are included on the historic deaths file.  Codes in
 the range 8000-9999 thus refer to causes listed in the ICD volumes
 under the chapter covering external causes of injury.

 Causes of death according to the second, third, fourth and fifth
 revisions of the ICD were classified to something less than 200
 groups, numbered from one upwards.  Some of these groups were then
 sub-divided as in later revisions using, in these instances, a
 combination of alphabetic and numeric subscripts.  In order to
 achieve comparability with later revisions, the major cause
 grouping (belonging to the list of approximately 200 causes) has
 been transcribed directly as a three digit numeric code and the
 various possible sub-divisions numbered from one upwards.  To
 provide a purely numeric fourth digit the absence of any sub-
 division is indicated by a zero.  The entire four digit computer
 code for ICD revisions two to five is thus purely numeric.

 A chart, giving approximate conversions between ICD codes and
 computer codes, is given in Table 6.  One potential difficulty
 created by this method of conversion is that the technique only
 works for causes with nine or less sub-divisions.  In only one
 instance did this actually prove to be a problem.  In the fifth
 revision of the ICD, there were 14 sub-divisions of code 157,
 congenital malformations.  A purely pragmatic solution to this was
 achieved by using 1571-1585 (excluding 1580) to represent these
 sub-divisions; where congenital debility, 158, should have been
 allocated a code 1580, it was instead allocated a code 1589, to
 avoid overlap of the two causes.  Tabulation of mortality from
 congenital anomalies in the years 1940-49 should therefore be
 undertaken with extreme caution.

 During the period 1901-1910, an unnumbered list of causes was used
 in England and Wales in place of the new International
 Classification which had not yet been widely adopted.  On the
 historic mortality data file, the lowest level of disaggregation of
 cause has simply been numbered from 001-191, as shown in Table 6.1.

 In general a fourth digit of zero has been used to avoid
 unnecessary complexity.  However one (little used) category,
 "other specified diseases", has been given a code 1741 to enable it
 to be distinguished both from identifiable specified cause
 groupings and from the (then rather large) group of vague or
 unspecified causes of death.

Missing values

 A cautionary note concerns the years 1901-1958, when data
 were obtained from the Annual Review Volumes.  Where the ICD
 provided a sub-division of a cause but none was published in one or
 more years or where the sub-division was only partially utilized,
 deaths in those years have been arbitrarily assigned to one of the
 unused sub-divisions of that cause.
 The details of codes and years affected are shown in Table 7.

 Using some simple methods of checking, an effort has been made to
 ensure that the data are internally Consistent and are in close
 agreement with the figures published at the time.  At this distance
 removed in time, it has not always been possible to achieve exact
 agreement, as indicated below.

 In terms of quality, three time periods may be distinguished.  From
 1968, it was possible to utilize Computer summaries already created
 for routine tabulation purposes.  The only Checks considered
 necessary for these years was to ensure that every four digit cause
 had been carried across and to make spot checks on the
 values carried across.  There checks have been made and have
 revealed only one problem in so far as the final database is
 Concerned.  During the course of the eighth revision, changes were
 made to the OPCS implementation of the ICD. These involved the
 introduction of additional fourth digit subdivisions in 1975-6.
 These changes are described in Table 8 and provide a cautionary
 note to attempting relevant detailed fourth digit tabulations
 Covering 1968-1978.

 Between 1959 and 1967, it was necessary to re-tabulate counts of
 deaths from computer records of individual deaths.  These computer
 tapes are held in archived form and have been subject to re-Copying
 as old tapes have deteriorated and to reformatting as old computers
 have been replaced.  A Certain amount of data-corruption and loss
 must be considered inevitable in these Circumstances.  Table 9
 shows the magnitude of discrepancies between figures on the
 database and those published at the time by age, sex and year.
 These are considered tolerable for statistical purposes.  At a
 cause specific level, similar error rates are found.

 A somewhat different method of validation was employed in the
 checking of manually transcribed figures for 1901-1958.  For each
 year, the computer summed age specific figures for each four digit
 cause and checked these against the corresponding published figure
 for all ages.  At the same time, cause specific figures for each
 group were summed to produce totals for each ICD chapter.
 These were then checked against corresponding published figures.
 By this dual checking procedure errors were pinpointed and
 corrected.  Consequently on the final tape each cause Contributes
 the correct number of deaths at all ages and figures by age are
 Correct for each chapter.  Despite the cross-checking that was
 undertaken, it is possible that some mutually compensating errors
 are present within the sub-matrices defined in
 this way.  However, it is unlikely that any such errors are of
 sufficient magnitude to distort any statistical analysis performed
 using these data.

 A minor problem was revealed for the years 1934, 1938 and 1939 in
 the course of checking the data.  Some 265 deaths in a colliery
 disaster in 1934 were not registered until 1938-39.  The figures
 relating to these deaths were excluded from Table 17 in both the
 year of occurrence and of registration.  For the sake of
 completeness it was decided to include them on the historic
 mortality data file, but to allocate them, contrary to normal
 practice, to the year of occurrence (namely 1934).  They have been
 allocated cause code 1942.

 Inevitably in a checking procedure such as that described above,
 occasional printing errors are uncovered in published figures.
 Where this has occurred, figures on the historic deaths file have
 been adjusted to achieve the internal numerical consistency
 rather than agreement with evidently incorrect published figures.
 However, no attempt has been made to reclassify deaths for any
 reason.  Thus changes in coding rules from year to year or any
 error in the compilation of cause statistics in isolated years all
 appear on this tape, in full agreement with figures published at
 the time.

List of Tables
 Table    Title

   1      Years for which each ICD revision was
          implemented in England and Wales

   2      Age groups available for the Periods Covered by
          each ICD revision

   3      Sources from which Population estimates
          Computer tape was derived

   4      Computer record layout

   5      Computer Codes for age groups

   6      Computer Codes for cause of death in 1st Revison ICD

Table 1

 Table 1 Years for which each ICD revision was implemented in England & Wales

               ICD Revision        Years

                    1+             1901-10
                    2*             1911-20
                    3*             1921-30
                    4*             1931-39
                    5*             1940-49
                    6              1950-57
                    7              1958-67
                    8              1968-78
                    9              1979-

 + An unnumbered list was used in England and Wales rather than the
   international classification during this period
 * As amended for use in England and Wales

Table 2

 Table 2 Age groups available for the periods covered by each ICD revision

 ICD Revision                 Age groups

     1                        Under  1, 1-4, 5-9, 10-14, 15-19,
                              20-24, 25-34,  3544, 45-54, 55-64,
                              65-74, 75-84,  85+

     2                        Under 1, 1-4, 5-9 and 5 year age
                              groups to 80-84, 85+

     3,4                      Under 1, 1-4, 5-9 and 5 year age
                              groups to 75-79, 80+

     5-9+                     Under* 1, 1-4, 5-9 and 5 year age
                              groups to 80-84, 85+

 + For the years 1940-41 age groups used for mortality are as for
   ICD revisions 3 and 4; that is to say, for these two years the
   highest available age group is 80+ rather than 85+.

 * Neonatal deaths by cause are not included on the tape from 1984
   onwards.  All cause for neonatal deaths is shown at ICD 0000.

Table 3

 Table 3 Sources from which population estimates computer tape was derived

 Years          Source of data

 1901-1910     Population based on final 1911 Census data using the
               method described in the 73rd Report of the Registrar
               General, 1910, but with revised factors and with
               an adjustment for the leap years 1904 and 1908.

               Figures for age `under 1 year' have been estimated
               by taking births minus deaths aged under 1 year for
               two years and dividing by two.

 1911-1920     The Registrar General's Decennial Supplement 1921,
               pt III, Table 1, page 8.  During the period
               1915-1920, for males, figures used Correspond to the
               Civilian population.

 1921-1930     The Registrar General's Decennial Supplement 1931,
               pt III, Table 1, page 2.

 1931-1939     The Registrar General's Statistical Review of
               England and Wales for the years 1938 and 1939 Text,
               Table XCIV, page 156.  Figures used correspond to
               mid-year estimates for the period 1931-1938 and to
               the male and female population in 1939.

               Figures for age `under 1 year' have been estimated
               from Annual Statistical Reviews.

 1940-1945     The Registrar General's Statistical Review of
               England and Wales for the six years 1940-1945, Text,
               Volume II Civil, Table III, page 10.  Civilian
               populations for males and females were used (where
               these were distinguished from total populations).
               Figures for age `under 1 year' have been estimated
               from Annual Statistical Reviews.

 1946-1950     The Registrar General's Statistical Review of
               England and Wales for the five years 1946-50 Text,
               Civil, Table III, pages 10-11.  Figures used
               correspond to the mean Oivilian population for 1946,
               Civilian populations for the period 1947-1949 and
               the Home population for 1950.

               Figures for age `under 1 year' have been estimated
               from Annual Statistical Reviews.

 1950-1952     The Registrar General's Statistical Review, 1952
               Text, Table 1, page 6.  These figures were revised
               using 1951 Census results.

 1953-1959     The Registrar General's Statistical Reviews annually
               1953-1959, pt II, Table A2, Home population.

 1960          The Registrar General`s Puarterly Return for
               England and Wales, Puarter ended 31st March 1966,
               Appendix J, page 46.  These figures were revised
               using 1961 Census results.

 1961-1980     OPCS Monitor Series PP1 84/1 issued 10 January 1984,
               Tables 1(a) and 1(b).  These figures include
               revisions based on 1981 Census results.

 1981-1983     OPCS Monitor Series PP1 84/3 issued May 1984, Tables
               1-3 (final estimates).

Table 4.1

 TABLE 4.1                                              IBM/IOL 2900


 START       SIZE    TYPE         DESCRIPTION            RANGE
 POSITION                                                OF CODES

  1            7     character    Number of deaths       0000001.0012649
  8            4     character    ICD code               0000-9999
 12            2     character    Year                   01-83
 14            1     character    Sex                    1, 2
 15            2     character    Age code               01-26
Table 4.2

 TABLE 4.2                                             IBM/ICL 2900


 POSITION                                          OF CODES

  1          5      character   Number of deaths   0000001-3096687
  6          4      character   Filler             0000
 10          2      character   Year               01-83
 12          1      character   Sex                1, 2
 13          2      character   Age code           01-26
Table 4.3, IBM

 TABLE 4.3                                            IBM


 Tape width                          0.5 inch
 Record length                       16 bytes
 Block size                          2400 bytes
 Header labels                       Unlabelled
 Packing density                     1600 b.p.i
 Data representation standard        EBCDIC
 Recording mode                      Phase encoded
 Parity                              Odd
 Interblock gap                      0.6 inch
 Number of tracks                    9
Table 4.3, ICL 2900

 TABLE 4.3                                              ICL 2900


 Tape width                          0.5 inch
 Record length                       16 bytes
 Block size                          8192 bytes
 Header labels                       Standard ICL 2900
 Packing density                     1600 b.p.i
 Data representation standard        EBCDIC
 Recording mode                      Phase encoded
 Parity                              Odd
 Interblock gap                      0.6 inch
 Number of tracks                    9
Table 5

  Table 5                  Computer codes for age groups

   Age          Code                       Age         Code

   Under 1       01                        45-49       11
   1-4           02                        50-54       12
   5-9           03                        55-59       13
  10-14          04                        60-64       14
  15-19          05                        65-69       15
  20-24          06                        70-74       16
  25-29          07                        75-79       17
  30-34          08                        80-84       18
  35-39          09                        80+         19
  40-44          10                        85+         20

 *  On ICD 1st revision these 10-year age groups are used.

 ** On ICD 3rd, 4th revision, the highest age group is 80+; for
    deaths, this also holds true for 5th revision years 1940-41.
Table 6

 Computer codes for cause of death in each ICD revision

 Unnumbered cause list used in England and Wales 1901-1910 (1st revision)

 CAUSE OF DEATH                                         CODE

 Small-pox:-Vaccinated                                  0010
    "    "  Not vaccinated                              0020
    "    "    Doubtful                                  0030
 Cow-pox and other effects of vaccination               0040
 Chicken-pox                                            0050
 Measles (Morbilli)                                     0060
 German measles                                         0070
 Scarlet fever                                          0080
 Typhus                                                 0090
 Plague                                                 0100
 Relapsing fever                                        0110
 Influenza                                              0120
 Whooping cough                                         0130
 Mumps                                                  0140
 Diphtheria                                             0150
 Cerebro-spinal fever                                   0160
 Pyrexia (origin uncertain)                             0170
 Enteric fever                                          0180
 Asiatic cholera                                        0190
 Diarrhoea due to food                                  0200
 Infective enteritis, Epidemic diarrhoea                0210
 Diarrhoea (not otherwise defined)                      0220
 Dysentery                                              0230
 Tetanus                                                0240
 Malaria                                                0250
 Rabies, Hydrophobia                                    0260
 Glanders                                               0270
 Anthrax (Splenic fever)                                0280
 Syphilis                                               0290
 Gonorrhoea                                             0300
 Phlegmasia alba dolens                                 0310
 Puerperal septicaema, Puerperal septic intoxication    0320
   "  "    pyaemia                                      0330
   "  "    fever (not otherwise defined)                0340
 Infective endocarditis                                 0350
 Pneumonia:-Lobar                                       0360
   "  "      Epidemic                                   0370
   "  "      Broncho                                    0380
   "  "      Not defined                                0390
 Erysipelas                                             0400
 Septicaemia, Septic intoxication (not puerperal)       0410
 Pyaemia (not puerperal)                                0420
 Phegmon, Carbuncle (not anthrax)                       0430
 Phagedaena                                             0440
 Other infective processes                              0450
 Pulmonay tuberculosis (tuberculous phthisis)           0460
 Phthisis (not otherwise defined)                       0470
 Tuberoulous meningitis                                 0493
   "  "      peritonitis                                0490
 Tabes mesenterica                                      0500
 Lupus                                                  0510
 Tubercle of other organs                               0520
 General tuberculosis                                   0530
 Scrofula                                               0540
 Parasitic diseases                                     0550
 Starvation                                             0560
 Scurry                                                 0570
 Alcoholism, Delirium tremens                           0580

