Scientific Data DocumentationOPCS, English Mortality File, 1986-1990ABSTRACT Description of the Data Files Numbers of deaths The OPCS historic deaths file contains numbers of deaths in England and Wales by year, sex, age and ICD cause. The international classification of diseases was introduced In 1900 and has been used in the tabulation of mortality in England and Wales since 1911. During the period 1901-1910, when the first revision of the ICD was used in some countries, a comparably modern (but unnumbered) classification was used in England and Wales. In all there have been nine revisions of the international classification. Each of these provides a coding frame for cause of death which differs, to varying degrees, from the previous revision. The years for which each ICD revision was used in the tabulation of England and Wales mortality data (or a corresponding local equivalent) are shown in Table 1. It should be emphasized that no attempt has been made to include on the data file deaths prior to 1901, although deaths were tabulated for all of England and Wales from July 1837 using an earlier series of national coding frames. While changes in ICD revision have influenced the way in which the data are stored, the contents of each year's data have been constrained by a second pragmatic consideration. Material on individual deaths, coded for tabulation purposes are only available as far back as 1959, when computerized record keeping was introduced by OPCS for statistical tabulations of mortality. Prior to that date, no regular series of coded or unpublished tabular material was retained in any form. In consequence, numbers of deaths for the years 1901-1958 have been obtained by simply transcribing, on to computer, figures published in the Registrar General's Annual Reviews. For the years 1921 onwards the figures used were those in Table 17 of the Annual Review; in earlier years figures were copied from an equivalent unnumbered table in the annual abstracts. This method of data extraction placed self evident limitations on the degree of disaggregatic of the data transcribed on to tape; namely that in respect of age and cause groupings, the tapes cannot provide more detail than was published at the time. By contrast, data on the historic death tape for years after 1958 were compiled from computer tapes used in the production of annual reference volumes. This provided greater flexibility in the choice of cause groupings and it was possible to store the data held on file according to individual four digit ICD codes, irrespective of whether these appeared in routine publications at the time. These two methods of constructing the database also required different validation techniques and, correspondingly, different means of resolving quality check queries. The details of these are discussed in sections 5-7. As indicated above, the choice of age groupings held on tape was determined by the availability of published material in earlier years; in later years a standard quinquennial age grouping was used (sub-dividing deaths under one and those at ages one to four and aggregating those at 85 and Over). The details of the age groups available in each ICD revision are shown in Table 2. As far as other variables on file are concerned, year of death and sex, the data are held in individual years and for each sex separate Population Estimates From time to time OPCS (or, formerly, the General Register Office) revises its estimates of the population at risk of dying in England and Wales in each year. This is done so as to incorporate information which becomes available after the publication of initial estimates in annual reference volumes. The major sources of additional information are the counts of population at subsequent censuses. To accompany the historic deaths tape, a tape of comparable population estimates has been prepared from the most recently revised figures released by OPCS. This tape contains population estimates by sex, year and age in the same format as the death tapes. Information is held for each year and each sex separately Age groupings for each year correspond to those used for the storage of death data for that year. The source of these population estimates for each year is shown in Table 3. Structure of the Files Each count held on the death file is stored in a separate record, referenced by the cause, sex, age and year to which it refers. Where no deaths occurred in a particular year for a specific cause sex and age combination, no record is held on file. All counts held on file are thus positive integers. The data for each ICD revision is sorted by year, then by cause, then by sex and finally by age group. The layout of each record is given in Table 4. The population data are held in an identical format to that used for numbers of deaths. However, the cause variable is set to zero on all records, so that the file contains one record for each year, sex and age combination, and is sorted in the latter order.BACKGROUND In the analysis of any particular set of mortality data, a pivotal role is frequently played by national death rates by age, sex and cause. For example, the analysis of cause specific time trends and their correlates generally draws directly upon data of this sort; instances of this type of application in the literature include studies of lung cancer, stomach cancer, heart disease and suicide. At a broader level, international comparisons utilize the rates of several nations in order to make meaningful inferences about possible causal associations (such as the role of diet, alcohol or smoking in particular diseases). By contrast, local mortality studies call upon national rates to provide a reference set of background mortality levels against which local experience can be measured. In this context, the tern `local' can usefully be broadened to encompass the study of any subset or sub-division of the national population; the application of standard national death rates to subsets of the population provides one of the most frequently used techniques in such analyses. Topics covered include geographic comparisons, occupational comparisons and the prospective follow-up of a wide variety of possible risk groups. Each of these applications requires a tedious set of calculations to be performed. As access to computers becomes more widespread these calculations are almost invariably automated. However, the extent to which this can be done is dependent on the availability of national rates on computer. At an international level, WHO showed considerable foresight in setting up a database in the mid seventies which contained death rates for a number of countries by age, sex and a short list of causes. This data file extends back to 1950 and has been distributed to a number of institutions throughout the world. Despite the boldness or this concept, the file has two limitations as far as applications at a national and sub-national level are concerned. The range and specificity of the available causes ore limited to what was readily comparable over time and between countries. Secondly, the time span covered by the data Is limited to recent history; many developed countries published comparable mortality statistics for a number of years before 1950. So as to extend the range of analyses that could be performed directly on computer a number of similar, but more detailed files were set up at various institutions throughout the world. OPCS, in reviewing both its own needs and the contribution which it could make to the greater availability of national death rates on computer decided, in 1979, that rather than devoting scarce analytic resources to constructing another synthetic database, it would use its unique position to construct a database comprising only the basic building bricks for constructing any aggregate database. In this instance the basic components or the database comprise numbers of deaths, held to the lowest level to which cause was routinely coded. For recent years, this comprises four digit codes laid down in the International Classification of Disease (ICD) operational at the time the death was registered. The calculation of rate is mode possible with this set of data by the provision of a comparable tape of estimates of population at risk.VARIABLES Year Individual years are held on file with each represented by the final two digits of the year number; thus 1967 appears as 67. Sex Figures appear separately for each sex on the file; the code for males is 1 and that for females is 2. Age For most years, data are held in the following age groups: under one, one to four, five year age groups from five to 84 and then all aged 85 and over. Exceptions to this rule are the years 1901-1910 and those covered by the third, fourth and part of the fifth revisions of the ICD (1921-1941). During the former period, at ages 25-84 figures are stored in ten year age groups (rather than five year groups) while, for the latter period, figures at ages 80 and over are not sub-divided on the deaths tape for any of the 21 years and are not sub-divided on the population tape for the years 1921-1939. In other respects, data by age in these time periods is stored identically to that in other time periods. Reference values used on the data tape for each age group are held in coded form. The codes used for each of the 26 possible age groupings appearing on the file are shown in Table 5.CAUSE OF DEATH CODES Coding Scheme Since the introduction of the sixth revision of the ICD, cause codes have been represented in a broadly similar manner in each revision of the classification. The first three digits of the four digit code indicate the cause group to which the death is assigned; the fourth digit is then utilized, for some but not all cause groups, to provide further amplification of this basic classification. The first three digits are always represented as a number between 000 and 999. OPCS convention in publications is to represent the last digit by either a number when the sub-division is shown or by a hyphen when no sub-division is shown in the ICD. For the convenience of those producing computer tabulations, on the historic deaths file for ICD revision six to nine, hypens have been erplaced by zeros. The computer are thus purely numeric in the range 0003-9999. According to these four revisions, deaths which are not due to natural causes may be assigned two codes; one covering the external cause of injury and the other the nature of injury. To avoid the possibility that deaths may be double counted, only the counts for external causes are included on the historic deaths file. Codes in the range 8000-9999 thus refer to causes listed in the ICD volumes under the chapter covering external causes of injury. Causes of death according to the second, third, fourth and fifth revisions of the ICD were classified to something less than 200 groups, numbered from one upwards. Some of these groups were then sub-divided as in later revisions using, in these instances, a combination of alphabetic and numeric subscripts. In order to achieve comparability with later revisions, the major cause grouping (belonging to the list of approximately 200 causes) has been transcribed directly as a three digit numeric code and the various possible sub-divisions numbered from one upwards. To provide a purely numeric fourth digit the absence of any sub- division is indicated by a zero. The entire four digit computer code for ICD revisions two to five is thus purely numeric. A chart, giving approximate conversions between ICD codes and computer codes, is given in Table 6. One potential difficulty created by this method of conversion is that the technique only works for causes with nine or less sub-divisions. In only one instance did this actually prove to be a problem. In the fifth revision of the ICD, there were 14 sub-divisions of code 157, congenital malformations. A purely pragmatic solution to this was achieved by using 1571-1585 (excluding 1580) to represent these sub-divisions; where congenital debility, 158, should have been allocated a code 1580, it was instead allocated a code 1589, to avoid overlap of the two causes. Tabulation of mortality from congenital anomalies in the years 1940-49 should therefore be undertaken with extreme caution. During the period 1901-1910, an unnumbered list of causes was used in England and Wales in place of the new International Classification which had not yet been widely adopted. On the historic mortality data file, the lowest level of disaggregation of cause has simply been numbered from 001-191, as shown in Table 6.1. In general a fourth digit of zero has been used to avoid unnecessary complexity. However one (little used) category, "other specified diseases", has been given a code 1741 to enable it to be distinguished both from identifiable specified cause groupings and from the (then rather large) group of vague or unspecified causes of death. Missing values A cautionary note concerns the years 1901-1958, when data were obtained from the Annual Review Volumes. Where the ICD provided a sub-division of a cause but none was published in one or more years or where the sub-division was only partially utilized, deaths in those years have been arbitrarily assigned to one of the unused sub-divisions of that cause. The details of codes and years affected are shown in Table 7.CONSISTENCY CHECKS ON THE DATA Using some simple methods of checking, an effort has been made to ensure that the data are internally Consistent and are in close agreement with the figures published at the time. At this distance removed in time, it has not always been possible to achieve exact agreement, as indicated below. In terms of quality, three time periods may be distinguished. From 1968, it was possible to utilize Computer summaries already created for routine tabulation purposes. The only Checks considered necessary for these years was to ensure that every four digit cause had been carried across and to make spot checks on the values carried across. There checks have been made and have revealed only one problem in so far as the final database is Concerned. During the course of the eighth revision, changes were made to the OPCS implementation of the ICD. These involved the introduction of additional fourth digit subdivisions in 1975-6. These changes are described in Table 8 and provide a cautionary note to attempting relevant detailed fourth digit tabulations Covering 1968-1978. Between 1959 and 1967, it was necessary to re-tabulate counts of deaths from computer records of individual deaths. These computer tapes are held in archived form and have been subject to re-Copying as old tapes have deteriorated and to reformatting as old computers have been replaced. A Certain amount of data-corruption and loss must be considered inevitable in these Circumstances. Table 9 shows the magnitude of discrepancies between figures on the database and those published at the time by age, sex and year. These are considered tolerable for statistical purposes. At a cause specific level, similar error rates are found. A somewhat different method of validation was employed in the checking of manually transcribed figures for 1901-1958. For each year, the computer summed age specific figures for each four digit cause and checked these against the corresponding published figure for all ages. At the same time, cause specific figures for each group were summed to produce totals for each ICD chapter. These were then checked against corresponding published figures. By this dual checking procedure errors were pinpointed and corrected. Consequently on the final tape each cause Contributes the correct number of deaths at all ages and figures by age are Correct for each chapter. Despite the cross-checking that was undertaken, it is possible that some mutually compensating errors are present within the sub-matrices defined in this way. However, it is unlikely that any such errors are of sufficient magnitude to distort any statistical analysis performed using these data. A minor problem was revealed for the years 1934, 1938 and 1939 in the course of checking the data. Some 265 deaths in a colliery disaster in 1934 were not registered until 1938-39. The figures relating to these deaths were excluded from Table 17 in both the year of occurrence and of registration. For the sake of completeness it was decided to include them on the historic mortality data file, but to allocate them, contrary to normal practice, to the year of occurrence (namely 1934). They have been allocated cause code 1942. Inevitably in a checking procedure such as that described above, occasional printing errors are uncovered in published figures. Where this has occurred, figures on the historic deaths file have been adjusted to achieve the internal numerical consistency rather than agreement with evidently incorrect published figures. However, no attempt has been made to reclassify deaths for any reason. Thus changes in coding rules from year to year or any error in the compilation of cause statistics in isolated years all appear on this tape, in full agreement with figures published at the time.TABLES List of Tables Table Title 1 Years for which each ICD revision was implemented in England and Wales 2 Age groups available for the Periods Covered by each ICD revision 3 Sources from which Population estimates Computer tape was derived 4 Computer record layout 5 Computer Codes for age groups 6 Computer Codes for cause of death in 1st Revison ICD Table 1 Table 1 Years for which each ICD revision was implemented in England & Wales ICD Revision Years 1+ 1901-10 2* 1911-20 3* 1921-30 4* 1931-39 5* 1940-49 6 1950-57 7 1958-67 8 1968-78 9 1979- + An unnumbered list was used in England and Wales rather than the international classification during this period * As amended for use in England and Wales Table 2 Table 2 Age groups available for the periods covered by each ICD revision ICD Revision Age groups 1 Under 1, 1-4, 5-9, 10-14, 15-19, 20-24, 25-34, 3544, 45-54, 55-64, 65-74, 75-84, 85+ 2 Under 1, 1-4, 5-9 and 5 year age groups to 80-84, 85+ 3,4 Under 1, 1-4, 5-9 and 5 year age groups to 75-79, 80+ 5-9+ Under* 1, 1-4, 5-9 and 5 year age groups to 80-84, 85+ + For the years 1940-41 age groups used for mortality are as for ICD revisions 3 and 4; that is to say, for these two years the highest available age group is 80+ rather than 85+. * Neonatal deaths by cause are not included on the tape from 1984 onwards. All cause for neonatal deaths is shown at ICD 0000. Table 3 Table 3 Sources from which population estimates computer tape was derived Years Source of data 1901-1910 Population based on final 1911 Census data using the method described in the 73rd Report of the Registrar General, 1910, but with revised factors and with an adjustment for the leap years 1904 and 1908. Figures for age `under 1 year' have been estimated by taking births minus deaths aged under 1 year for two years and dividing by two. 1911-1920 The Registrar General's Decennial Supplement 1921, pt III, Table 1, page 8. During the period 1915-1920, for males, figures used Correspond to the Civilian population. 1921-1930 The Registrar General's Decennial Supplement 1931, pt III, Table 1, page 2. 1931-1939 The Registrar General's Statistical Review of England and Wales for the years 1938 and 1939 Text, Table XCIV, page 156. Figures used correspond to mid-year estimates for the period 1931-1938 and to the male and female population in 1939. Figures for age `under 1 year' have been estimated from Annual Statistical Reviews. 1940-1945 The Registrar General's Statistical Review of England and Wales for the six years 1940-1945, Text, Volume II Civil, Table III, page 10. Civilian populations for males and females were used (where these were distinguished from total populations). Figures for age `under 1 year' have been estimated from Annual Statistical Reviews. 1946-1950 The Registrar General's Statistical Review of England and Wales for the five years 1946-50 Text, Civil, Table III, pages 10-11. Figures used correspond to the mean Oivilian population for 1946, Civilian populations for the period 1947-1949 and the Home population for 1950. Figures for age `under 1 year' have been estimated from Annual Statistical Reviews. 1950-1952 The Registrar General's Statistical Review, 1952 Text, Table 1, page 6. These figures were revised using 1951 Census results. 1953-1959 The Registrar General's Statistical Reviews annually 1953-1959, pt II, Table A2, Home population. 1960 The Registrar General`s Puarterly Return for England and Wales, Puarter ended 31st March 1966, Appendix J, page 46. These figures were revised using 1961 Census results. 1961-1980 OPCS Monitor Series PP1 84/1 issued 10 January 1984, Tables 1(a) and 1(b). These figures include revisions based on 1981 Census results. 1981-1983 OPCS Monitor Series PP1 84/3 issued May 1984, Tables 1-3 (final estimates). Table 4.1 TABLE 4.1 IBM/IOL 2900 HISTORIC MORTALITY DATA TAPE RECORD DESCRIPTION START SIZE TYPE DESCRIPTION RANGE POSITION OF CODES 1 7 character Number of deaths 0000001.0012649 8 4 character ICD code 0000-9999 12 2 character Year 01-83 14 1 character Sex 1, 2 15 2 character Age code 01-26Table 4.2 TABLE 4.2 IBM/ICL 2900 HISTORIC POPULATION DATA TAPE RECORD DESCRIPTION START SIZE TYPE DESCRIPTION RANGE POSITION OF CODES 1 5 character Number of deaths 0000001-3096687 6 4 character Filler 0000 10 2 character Year 01-83 12 1 character Sex 1, 2 13 2 character Age code 01-26Table 4.3, IBM TABLE 4.3 IBM HISTORIC DATA FILES TAPE DESCRIPTION Tape width 0.5 inch Record length 16 bytes Block size 2400 bytes Header labels Unlabelled Packing density 1600 b.p.i Data representation standard EBCDIC Recording mode Phase encoded Parity Odd Interblock gap 0.6 inch Number of tracks 9Table 4.3, ICL 2900 TABLE 4.3 ICL 2900 HISTORIC DATA FILES TAPE DESCRIPTION Tape width 0.5 inch Record length 16 bytes Block size 8192 bytes Header labels Standard ICL 2900 Packing density 1600 b.p.i Data representation standard EBCDIC Recording mode Phase encoded Parity Odd Interblock gap 0.6 inch Number of tracks 9Table 5 Table 5 Computer codes for age groups Age Code Age Code Under 1 01 45-49 11 1-4 02 50-54 12 5-9 03 55-59 13 10-14 04 60-64 14 15-19 05 65-69 15 20-24 06 70-74 16 25-29 07 75-79 17 30-34 08 80-84 18 35-39 09 80+ 19 40-44 10 85+ 20 * On ICD 1st revision these 10-year age groups are used. ** On ICD 3rd, 4th revision, the highest age group is 80+; for deaths, this also holds true for 5th revision years 1940-41.Table 6 Computer codes for cause of death in each ICD revision Unnumbered cause list used in England and Wales 1901-1910 (1st revision) COMPUTER CAUSE OF DEATH CODE Small-pox:-Vaccinated 0010 " " Not vaccinated 0020 " " Doubtful 0030 Cow-pox and other effects of vaccination 0040 Chicken-pox 0050 Measles (Morbilli) 0060 German measles 0070 Scarlet fever 0080 Typhus 0090 Plague 0100 Relapsing fever 0110 Influenza 0120 Whooping cough 0130 Mumps 0140 Diphtheria 0150 Cerebro-spinal fever 0160 Pyrexia (origin uncertain) 0170 Enteric fever 0180 Asiatic cholera 0190 Diarrhoea due to food 0200 Infective enteritis, Epidemic diarrhoea 0210 Diarrhoea (not otherwise defined) 0220 Dysentery 0230 Tetanus 0240 Malaria 0250 Rabies, Hydrophobia 0260 Glanders 0270 Anthrax (Splenic fever) 0280 Syphilis 0290 Gonorrhoea 0300 Phlegmasia alba dolens 0310 Puerperal septicaema, Puerperal septic intoxication 0320 " " pyaemia 0330 " " fever (not otherwise defined) 0340 Infective endocarditis 0350 Pneumonia:-Lobar 0360 " " Epidemic 0370 " " Broncho 0380 " " Not defined 0390 Erysipelas 0400 Septicaemia, Septic intoxication (not puerperal) 0410 Pyaemia (not puerperal) 0420 Phegmon, Carbuncle (not anthrax) 0430 Phagedaena 0440 Other infective processes 0450 Pulmonay tuberculosis (tuberculous phthisis) 0460 Phthisis (not otherwise defined) 0470 Tuberoulous meningitis 0493 " " peritonitis 0490 Tabes mesenterica 0500 Lupus 0510 Tubercle of other organs 0520 General tuberculosis 0530 Scrofula 0540 Parasitic diseases 0550 Starvation 0560 Scurry 0570 Alcoholism, Delirium tremens 0580
This page last reviewed: Thursday, January 28, 2016
This information is provided as technical reference material. Please contact us at firstname.lastname@example.org to request a simple text version of this document.