Scientific Data Documentation
Public Information Data (1991)
AIDS: Public Information Data (1991)
ABSTRACT
Summary
Public health surveillance represents an ongoing and regular collection,
analysis, interpretation, and application of health data for disease
prevention and control. AIDS surveillance, like other national surveillance
efforts, depends on health-care providers and the state and local health
departments and, thus, requires a balance between information needs versus
practical limitations. AIDS surveillance in the United States has achieved a
high degree of completeness relative to other notifiable diseases. In
addition, the surveillance system has been modified as understanding of AIDS
and HIV infection has grown. Users of the AIDS Public Information Data Set
should be familiar with the characteristics of public-health surveillance in
general as well as with the evolution of AIDS surveillance.
General Information
The AIDS Public Information Data Set is created twice a year by the Division
of HIV/AIDS, National Center for Infectious Diseases, Centers for Disease
Control (CDC) and consists of a data file containing 44 variables extracted
from CDC's national AIDS surveillance data base and a documentation file
which contains cross tabulations of 8 of these variables. The documentation
file contains one set of tables for the entire United States, one set for
each state, and one set for each Metropolitan Statistical Area (MSA) with
500,000 or more population.
This data set is distributed using software called SETS, developed by CDC's
National Center of Health Statistics. SETS is menu-driven and allows you
to create cross tabulations without using a statistical software package
such as SAS or dBASE. It also incorporates the metropolitan area and state
tables previously distributed on microfiche. Users who want to continue
using these data with a statistical or data management package must first
load SETS, and then use the SETS export feature to create and ASCII data
file. See Appendix A or the online documentation for more information.
This manual describes the data set. It is divided into three sections and
two appendices. On-line help screens provide additional documentation for
SETS.
Section 1, AIDS Surveillance in the United States, describes the data
collection process and the effect changes in this process may have on data
analysis and interpretation. The section reviews the source of AIDS
surveillance data and describes which patients are included in the Centers
for Disease Control (CDC) definition for AIDS. It also discusses reporting
delays and reporting completeness.
Section 2, Data File Variables and Coding Schemes, lists the variables
included on the data file and describes each variable's coding scheme.
Section 3, MSA and State Tables, describes the frequency tables and cross
tabulations included on the documentation file.
Appendix A: Loading the SETS Software, describes how to load and run SETS on
your computer. It also suggests computer hardware and software you can use
to analyze the data.
Appendix B: Metropolitan Statistical Areas lists the MSAs included in the
data set.
Assurance of Confidentiality
The data and documentation files on the enclosed diskettes contain informa-
tion abstracted from acquired immunodeficiency syndrome (AIDS) case reports
received by CDC. These data have been reported voluntarily to CDC by state
and local health departments, and are protected under the Assurance of
Confidentiality (Sections 306 and 308(d) of the Public Health Service Act,
42 U.S.C. 242k and 242m(d)), which prevents disclosure of any information
that could be used to directly or indirectly identify patients or
establishments. The statistical data contained in the AIDS Public Information
Data Set are being released for public use in accordance with the Assurance
and do not identify patients directly, nor do they contain information that
can identify patients indirectly.
BACKGROUND
In 1981, after early reports of Pneumocystis carinii pneumonia, Kaposi's
sarcoma, and other opportunistic infections in young homosexual men in
Los Angeles, New York City, and San Francisco, CDC began surveillance for a
newly recognized constellation of diseases, now termed the acquired
immunodeficiency syndrome (AIDS). CDC developed a surveillance case definition
for this syndrome and initially received case reports directly from health-
care providers and state and local health departments. As the epidemic became
more widespread, state and local health departments began to assume the
responsibility for AIDS surveillance, and by 1985 all states had regulations
requiring physicians and other health-care providers to report AIDS cases
directly to state or local health departments. These health departments then
share the reports with CDC, which produces the national AIDS surveillance
data set.
The goals of AIDS surveillance have been to monitor both trends in AIDS cases
and the scope of severe morbidity due to infection with the human
immunodeficiency virus (HIV). Advances in the understanding of the
epidemiology and manifestations of HIV infection and changing diagnostic
practices, however, present multiple challenges to those analyzing and
interpreting the AIDS surveillance data. The following are a few examples:
- A wide variety of persons are at risk for AIDS, including homosexual or
bisexual men, intravenous drug users, transfusion or tissue transplant
recipients, heterosexual partners of infected persons (including persons
born in "Pattern-II" countries - certain Caribbean and central African
countries where heterosexual transmission predominates), children born to
infected mothers, and persons with mucous membrane or percutaneous exposure
to blood or body fluids of infected persons (e.g., health-care workers).
Because homosexual/bisexual males comprise such a large proportion of the
total number of AIDS cases, trends in this subgroup will overshadow those
in other groups unless the data are examined separately. Analysis of data,
without regard to specific subgroups, may conceal information or lead to
misinterpretation of the data.
- The etiologic agent of AIDS, HIV, has been identified, and diagnostic tests
for infection with this virus have been developed. As a result, the
surveillance of AIDS, initially dependent on the presence of certain
indicator diseases specific for the infection, has been expanded to include
additional diseases (perhaps less specific for HIV infection) in the
presence of laboratory evidence for infection. Introduction of these
diagnoses has affected trends in AIDS cases.
- Diagnostic practices have changed over time and vary geographically. AIDS is
now a common diagnosis in many hospitals and clinics, and definitive
diagnostic tests for manifestations of HIV infection (e.g., Pneumocystis
carinii pneumonia or esophageal candidiasis) may not be done. HIV testing
is not performed on all patients. Geographic variations in diagnostic
practices and changes over time could markedly affect trends in AIDS
surveillance.
DESCRIPTION OF POPULATION
Source of AIDS Surveillance Data
CDC maintains national surveillance of AIDS through the receipt of AIDS case
reports submitted by individual state and local health departments. Health
departments either submit the case report forms directly or they report cases
electronically through a CDC developed microcomputer system. All 50 states,
the District of Columbia, and U.S. territories and possessions (including
Puerto Rico, the Virgin Islands, Guam, and certain Pacific islands) report
AIDS cases to CDC.
Although state and local health departments share AIDS surveillance data with
CDC, the responsibility and authority for AIDS surveillance rests with the
individual health departments. Like any reportable disease, the completeness
of AIDS reporting reflects the aggressiveness with which these health
departments solicit case reports. Health departments may depend on health-
care providers to know and comply with reporting requirements. Alternatively,
health departments may regularly contact and interact with health-care
facilities or individual providers to stimulate disease reporting.
CDC has developed guidelines to assist health departments in stimulating AIDS
case reporting and has encouraged them to take an active rather than passive
approach to AIDS surveillance. Through surveillance cooperative agreements
supported by CDC, health departments are encouraged to identify health-care
facilities that serve AIDS patients and work closely with these facilities to
promote reporting. They are also encouraged to send newsletters to health-
care providers and attend professional organization meetings, and to use
existing alternative data sources to identify AIDS cases, including death
certificates, laboratory reports, and tuberculosis and tumor registries.
States vary widely in the structure and organization of their surveillance
systems and, therefore, in the completeness of their case reporting.
Case Definition
AIDS surveillance does not encompass all manifestations of infection with HIV,
but only severe, life-threatening diseases highly specific for the infection,
as delineated in the CDC AIDS case definition. Before HIV was identified as
the etiologic agent for AIDS, CDC defined a case of AIDS as a disease, at
least moderately indicative of a defect in cell-mediated immunity, occurring
in a person with no known cause for diminished resistance to the disease.
Such diseases included Pneumocystis carinii pneumonia, Kaposi's sarcoma, and
many other serious opportunistic infections (see American Journal of Medicine,
March 1984, pages 493-500). With identification of HIV as the causative agent
for AIDS and the availability of laboratory tests to detect HIV antibody, the
case definition was expanded to reflect an increased understanding of HIV
infection. The case definition was revised in 1985 (see CDC's Morbidity and
Mortality Weekly Report, June 28,1985, pages 373-375) and again in 1987 (see
Morbidity and Mortality Weekly Report, August 14,1987, Supplement, pages
3S-15S). These revisions applied to persons with laboratory evidence for HIV
infection. Among diseases added in 1985 were disseminated histoplasmosis,
chronic isosporiasis, and certain non-Hodgkin's lymphomas. Among those added
in 1987 were extrapulmonary tuberculosis, HIV encephalopathy, and HIV wasting
syndrome. In children, recurrent, serious bacterial infections were also
added. In addition, the 1987 revision allowed certain indicator diseases to
be diagnosed on a presumptive rather than confirmed basis.
While the reported incidence of AIDS increased only 3 to 4 percent as a
result of the 1985 revision, roughly one fourth of all cases that were both
diagnosed and reported in the year following the 1987 revision met only the
additional criteria included in the 1987 revision. Furthermore, the
proportion of cases meeting only the new criteria was higher in Hispanics
and non-Hispanic blacks than in non-Hispanic whites, higher in heterosexual
intravenous drug users, and lower in men who have sex with men. Due to the
large number of cases meeting only the revised case definition and to the
inconsistent use of the revised case definition in different populations,
analyses of trends in AIDS cases must take these revisions into account.
VARIABLES AND THEIR CATEGORIES
Data File Variables and Coding Schemes
The data file included in the AIDS Public Information Data Set conatins one
line of data for each AIDS case reported to CDC. Each line contains 62
columns. The columns contain 44 variables extracted from CDC's national AIDS
data set.
Column Variable Description
1 age Age group at diagnosis of the first AIDS-indicator
opportunistic disease
2 sexclass Sexual classification of patient
3 race Race of patient
4 msa Region of residence
5-8 dxdate Month of diagnosis of first AIDS-indicative
opportunistic disease
9-12 repdate Date when CDC first received information about the case
13 death Vital status of the patient
14-17 deathqtr Quarter of death for patients reported dead
18-19 ptgroup Patient grouping by mode of exposure to HIV
20 nir No Identified Risk. Status of investigations for patients
reported without known risk of exposure to HIV
21 multrisk Indicates if patient had more than one risk of exposure
to HIV
22 birth Country of birth
23 categ Indicates which of the CDC AIDS case revisions the
patient meets
24 bact Bacterial infections, multiple or recurrent (including
Salmonella septicemia). Applicable in pediatric cases
only.
25 burkl Lymphoma, Burkitt's (or equivalent term)
26 candesop Candidiasis, esophageal
27 candlung Candidiasis, bronchi, trachea, or lungs
28 cmv Cytomegalovirus disease (other than in liver, spleen, or
nodes); onset at > 1 month of age
29 cmvret Cytomegalovirus retinitis (with loss of vision)
30 cocci Coccidioidomycosis, disseminated or extrapulmonary
31 cryptoco Cryptococcosis, extrapulmonary
32 cryptosp Cryptosporidiosis, chronic intestinal
33 dementia HIV encephalopathy
34 histo Histoplasmosis, disseminated or extrapulmonary
35 HS Herpes simplex: chronic ulcer(s) (>1 month duration);
or bronchitis, pneumonitis, or esophagitis
36 ibl Lymphoma, immunoblastic (or equivalent term)
37 iso lsosporiasis, chronic intestinal (> 1 month duration)
38 KS Kaposi's sarcoma
39 lip Lymphoid interstitial pneumonia and/or pulmonary
lymphoid hyperplasia. Applicable in pediatric cases only.
40 mavium Mycobacterium avium complex or M.kansasii, disseminated
or extrapulmonary
41 myco Mycobacterium, of other species or unidentified species,
disseminated or extrapulmonary
42 pc Pneumocystis carinii pneumonia
43 plb Lymphoma, primary in brain
44 pml Progressive multifocal leukoencephalopathy
45 sals Salmonella septicemia. Applicable in adult cases only.
46 tb M.tuberculosis, disseminated or extrapulmonary
47 tp Toxoplasmosis of brain, onset at > 1 month of age
48 wasting Wasting syndrome due to HIV
49 s_bi Sex with a bisexual man (women only)
50 s_iv Sex with an IV drug user
51 s_other Sex with a person with hemophilia, a person born in a
Pattern-II country, or a transfusion recipient
52 s_hiv Sex with a person known to be infected with HIV or to
have AIDS
53-56 deathrep Date when death was reported to CDC
57-62 adjwgt Reporting delay adjustment weight
Each of these variables is coded numerically. For example, column 13
contains either "0" or "1". These numbers represent the variable death. The
number "0" in this column indicates that CDC has not received a death
notification for this case. A value of "1" indicates that CDC has been
notified that this patient died. The codes used in the AIDS Public
Information Data Set are printed below.
Age (column 1)
This variable contains the patient's age when he or she was first diagnosed
with an AIDS-indicator disease.
0 = Less than 1 year old
1 = 1 to 12 years old
2 = 13 to 19 years old
3 = 20 to 24 years old
4 = 25 to 29 years old
5 = 30 to 34 years old
6 = 35 to 39 years old
7 = 40 to 44 years old
8 = 45 to 49 years old
9 = 50 years old or older
Sexclass (column 2)
Adult/adolescent males are classified according to their sexual orientation.
1 = Adult/adolescent homosexual male
2 = Adult/adolescent bisexual male
3 = Adult/adolescent heterosexual male or pediatric male
4 = Female (both adult/adolescent and pediatric)
Race (column 3)
1 = White (not Hispanic)
2 = Black (not Hispanic)
3 = Hispanic
9 = Asian/Pacific Islander, American Indian/Alaskan Native, or unknown
MSA (column 4)
Region of residence is identified for adult/adolescent patients who live in
MSAs with more than 1 million population, according to the 1990 census.
Residence is defined as place of residence at onset of illness suggestive of
AIDS. The MSA variable is coded as:
0 = Not in an MSA, Population less than 50,000.
1 = Northeast
Bergen-Passaic, N.J.; Boston, Mass.; Hartford, Conn.; Nassau-Suffolk,
N.J.; New York, N.Y.; Newark, N.J.; or Rochester, N.Y.
2 = Central
Chicago, Ill.; Cincinnati, Ohio; Cleveland, Ohio; Columbus, Ohio; Denver,
Colo.; Detroit, Mich.; Indianapolis, Ind.; Kansas City, Mo.; Milwaukee,
Wis.; Minneapolis-Saint Paul, Minn.; or Saint Louis, Mo.
3 = West
Anaheim, Calif.; Los Angeles, Calif.; Oakland, Calif.; Phoenix, Ariz.;
Portland, Oreg.; Riverside-San Bernardino, Calif.; Sacramento, Calif.;
Salt Lake City, Utah; San Diego, Calif.; San Francisco, Calif.; San Jose,
Calif.; or Seattle, Wash.
4 = South
Atlanta, Ga.; Charlotte, N.C.; Dallas, Tex.; Fort Lauderdale, Fla.; Fort
Worth, Tex.; Houston, Tex.; Miami, Fla.; New Orleans, La.; Orlando, Fla.;
San Antonio, Tex.; San Juan, P.R.; or Tampa, Fla.
5 = Mid-Atlantic
Baltimore, Md.; Norfolk, Va.; Philadelphia, Pa.; Pittsburgh, Pa.; or
Washington, D.C.
9 = In an MSA with population less than 1 million, but greater than 50,000.
Dxdate (columns 5 through 8)
This variable contains the year and month in which the first AIDS-indicator
disease was diagnosed. Columns 5 and 6 contain the year; columns 7 and 8
contain the month. Cases diagnosed before 1982 are coded as "8199".
Repdate (columns 9 through 12)
This variable contains the year and month in which CDC received the case
report. Columns 9 and I0 contain the year; columns 11 and l2 contain the
month. Cases reported during 1981 are coded as "8199".
Death (column 13)
0 = CDC has not received a death notification for this case
1 = CDC has been notified that this patient died
Deathqtr (columns 14 through 17)
For patients whose death has been reported to CDC, this variable contains the
year and quarter of death. Columns 14 and 15 contain the year; columns 16
and 17 contain the quarter. For example, the value "8803" indicates that the
patient died in July, August, or September, 1988. Patients who are known to
have died, but whose date of death is unknown are coded as "9999."
Ptgroup (columns 18 and 19)
For surveillance purposes, AIDS patients are grouped into a hierarchy of
exposure categories. Persons with more than one reported mode of exposure to
HIV are counted in the exposure category listed first in the hierarchy,
except for persons with a history of both homosexual/bisexual contact and
intravenous drug use. They are counted in a separate category. Persons with
multiple reported modes of exposure are indicated in the variable multrisk.
"Pattern II" is a term adopted by the World Health Organization, and refers
to countries with a distinctive pattern of HIV transmission. It is observed
in areas of central, eastern, and southern Africa and in some Caribbean
countries. In these countries, most of the reported cases occur in
heterosexuals; the male to female ratio is approximately 1 to 1; and
perinatal transmission is more common than in other areas. Intravenous drug
use and homosexual transmission either do not occur or occur at low levels.
"Other/undetermined" cases are in persons with no reported history of exposure
to HIV through any of the routes listed in the hierarchy of exposure
categories. Undetermined cases include persons who are currently under
investigation by local health department officials; persons whose exposure
history is incomplete because of death, refusal to be interviewed, or loss
to follow-up; and persons who were interviewed or for whom other follow-up
information was available and no exposure mode was identified.
Adult/adolescent exposure categories
1 = Male homosexual/bisexual contact
2 = Intravenous (IV) drug use (female and heterosexual male)
3 = Male homosexual/bisexual contact and IV drug use
4 = Hemophilia/coagulation disorder
5 = Heterosexual contact with a person with, or at increases risk for,
HIV infection
6 = Born in Pattern-II country
7 = Receipt of transfusion of blood, blood components, or tissue
8 = Other/undetermined
Pediatric exposure categories
9 = Hemophilia/coagulation disorder
10 = Mother with, or at risk for, HIV infection
11 = Receipt of transfusion of blood, blood components, or tissue
12 = Other/undetermined
NIR (column 20)
NIR (no identified risk) is coded only for patients whose mode of exposure to
HIV is coded as undetermined in ptgroup.
1 = Patient currently under investigation
2 = Patient died, red interview, or is lost to follow-up
3 = Patient investigation complete but no mode of exposure was identified
Multrisk (column 21)
Multrisk is coded only for adult/adolescent patients (13 years old or older)
and indicates if the patient has risk(s) of exposure to HIV other than the
one indicated by ptgroup.
0 = Patient's only mode of exposure to HIV is that indicated by ptgroup
1 = Patient has additional risk(s) of exposure
Birth (column 22)
1 = Patient was born in the United States or its dependencies and possessions,
or place of birth was not specified
2 = Patient was born in Pattern-II country
3 = Patient was born in a foreign country which is not Pattern II
Categ (column 23)
This variable reflects changes made over time to the CDC surveillance
definition for AIDS. Only cases meeting the current (1987) surveillance
definition are included in this data set. Categ indicates whether the patient
also met the pre-1985 or 1985 surveillance definition, and whether the
diagnosis, if it meets only the 1987 definition, was definitive or presumptive.
Cases that meet more than one of these surveillance definitions are classified
into the definition category listed first. For more information about the
1987 definition, see Morbidity and Mortality Weekly Report, August 14,1987,
Supplement, pages 3S-15S.
1 = Case meets the pre-1985 surveillance definition
2 = Case meets the 1985 surveillance definition
3 = Case meets the 1987 surveillance definition and was diagnosed definitively
4 = Case meets the 1987 surveillance definition and was diagnosed
presumptively
AIDS-indicator opportunistic diseases (columns 24 through 48)
Columns 24 through 48 contain information about each of the AIDS-indicator
diseases listed on the AIDS confidential case report form. Each of these
variables is one character long and is coded as follows:
0 = AIDS-indicator opportunistic disease was not diagnosed
1 = AIDS-indicator opportunistic disease was diagnosed definitively
2 = AIDS-indicator opportunistic disease was diagnosed presumptively
Heterosexual risk information (columns 49 through 52)
These variables (s_bi, s_iv, s_other, and s_hiv) contain additional risk
information for patients infected heterosexually. All 4 variables are coded
as follows:
0 = no
1 = yes
9 = missing/unknown
The variable s_bi is coded only for women (for men, the variable contains a
blank). All 4 variables contain "9" (missing/unknown) for patients with
hemophilia, regardless of whether the risk information is in fact unknown.
This restriction is neck in order to comply with the Assurance of
Confidentiality on page 5. Of the 1,535 AIDS cases reported through June
1991 among adults/adolescents with hemophilia, less than 3 percent also
reported heterosexual contact with a person at increased risk for AIDS or
HIV infection.
Deathrep (columns 53 through 56)
For patients whose death has been reported to CDC, this variable contains the
year and quarter when CDC received the report. Columns 53 and 54 contain the
year; columns 55 and 56 contain the quarter. For example, the value "8803"
indicates that the patient's death was reported to CDC in July, August, or
September, 1988.
CDC began collecting this variable in October 1987. Deaths reported to CDC
before October 1987 are coded as "8799".
Adjwgt (columns 56 through 62)
This variable contains an adjustment weight which, when used as a weighting
variable in a frequency tabulation, produces tabulations of AIDS cases that
are adjusted for delays in case reporting (see page 10 for a discussion of
delays in reporting). The weights are based on estimated reporting delay
distributions that take into account exposure, geographic, and demographic
variations in case reporting. The adjustment weights and the resulting
tabulations are not reliable for cases diagnosed during the most recent 3 to
6 months. Please note, this variable must not be used for tabulations
involving dates of report to CDC (repdate), the living status of a patient
(death), the date of death (deathqtr), or the date when a death was reported
to CDC (deathrep). It is reasonable to use this variable for tabulations
involving any other variable in the data set, including date of diagnosis.
METHODS
Case report form
Separate case report forms are used for pediatric patients (those less than
13 years of age at the time of diagnosis) and adult/adolescent patients
(those 13 years of age or older at the time of diagnosis). Although the forms
are very similar, the pediatric form includes risk factor information for the
child's mother. These forms are completed by the health-care provider or by
the AIDS surveillance staff in the local or state health department.
Names are retained by the state or local health department and are converted
to an alpha-numeric code called "soundex" for use by CDC. CDC does not
receive names of persons with AIDS. Because more than one state may report an
individual case, CDC screens reported cases by soundex code and date of
birth to cull duplicate reports.
The variables available on the AIDS data set are listed in section 2.
However, a few deserve special comment.
- Living status. Patients survive for a variable amount of time following the
diagnosis of AIDS. Because death usually occurs after the initial report
to CDC, case reports may not be updated to reflect the change in living
status. As a result, reporting of death among AIDS patients may be incom-
plete.
- Exposure category. Some patients may have more than one mode of exposure to
HIV. For surveillance purposes, AIDS cases are counted only once in a
hierarchy of exposure categories (see section 2, pages 16 and 17). This
hierarchy is based on the most likely source of HIV infection. Persons with
more than one reported mode of exposure are counted in the category that
appears first in the hierarchy, except for persons with a history of both
male homosexual/bisexual contact and intravenous drug use. They are counted
in a separate category.
- Diseases indicative of AIDS. Patients may develop additional diseases
indicative of AIDS after their initial AIDS diagnosis. The case report form
may not be updated to reflect additional diseases. Therefore, proportions
of patients with the various AIDS-indicator disease should be considered
minimal estimates.
- Date of diagnosis. CDC collects only one diagnosis date per patient, i.e.,
the date when he or she was initially diagnosed with an AIDS-indicator
disease. Patients who develop additional diseases do not receive additional
diagnosis dates. Therefore, for patients with multiple AIDS-indicator
diseases, you cannot determine which disease occurred first.
Special Case Investigations
Certain AIDS cases receive special follow-up by state and local health
departments. Investigations are frequently performed after the initial case
report to CDC. Case updates are incorporated into the data set as they
are available to CDC.
- No identified risk (NIR) patients. NIR patients are those reported without
any recognized mode of exposure to HIV. Approximately 3 percent of cases
are NIR patients at any one time. However, when additional information can
be obtained for these patients, approximately 75 percent are reclassified
into a known exposure category. For those not reclassified, the demographic
profile is more similar to that of other persons with AIDS than to the
general U.S. population.
- Health-care workers. Ninety-five percent of health-care workers with AIDS
are classified into a known exposure category. Of the health-care workers
with an undetermined mode of exposure to HIV, less than one third cannot be
reclassified after investigation.
Delay in Reporting
The timeliness of AIDS case reporting to CDC depends on several factors.
These include the volume of cases reported from a state or locality and the
availability of staff to complete ase report forms. In many instances,
initial case report forms are incomplete and require additional follow-up by
state and local health department staff, including reviews of other record
systems and contact with healthcare providers.
About 55 percent of all cases are reported to CDC within 3 months of the
date of diagnosis, but about 20 percent are reported more than a year after
diagnosis. Delays vary widely among exposure, geographic, racial/ethnic, and
age categories. They are substantially longer for pediatric cases and for
transfusion-associated cases in adults. Due to the delay, the number of
cases diagnosed during any period often exceeds the number reported during
that period. This is particularly important in examining trends over time,
since many cases in recent periods of time will not yet be reported.
To account for delays in the reporting of cases, a variable called adjwgt has
been added to the data set. This variable may be used to weight each case on
the data set and obtain adjusted case counts. For example, summing adjwgt
for cases would estimate the number of cases diagnosed through the time
period covered by the data set.
Early Reporting Dates
Before 1990, CDC occasionally received reports on patients before they met
the CDC AIDS case definition. If such patients were later diagnosed with
AIDS, the diagnosis date on their record (indicating when the patient first
met the CDC definition) would be after the report date (when CDC first
received information about the patient). Such records should be excluded
from certain analysis. CDC's AIDS surveillance data base no longer receives
reports on patients who do not meet the AIDS case definition.
Follow-up of Reported AIDS Cases
AIDS case records maintained at CDC contain all information reported to date
from state or local health departments. As patients progress through their
illness, additional diseases and conditions may be reported, or the patient's
vital status may change. However, not all health departments have the
resources to routinely follow-up patients for additional information,
including vital status. For this reason and because many patients move out of
the reporting health department's jurisdiction, CDC records do not always
contain all current information for each patient.
Non-reporting and Evaluation of AIDS Surveillance
Cases of AIDS may not be reported to CDC for a variety of reasons. The
diagnostic tests needed to confirm the AIDS diagnosis may not be performed,
or physicians and hospital personnel may fail to report cases to the health
department. Further, some patients with HIV disease may be ill or die from
diseases or conditions not included in the current AIDS surveillance
definition.
Both CDC and state and local health departments have commissioned a variety
of studies to evaluate the completeness of AIDS surveillance. Most evaluation
projects have used alternate data resources, such as death certificates,
hospital discharge records, and laboratory records. Individual records from
these alternate sources have then been matched against records in AIDS
surveillance data bases. Evaluation studies have varied in size and scope
(e.g., varying numbers of ICD-9 codes from death certificates or computerized
discharge records), geographic area covered, detection of both inpatient and
outpatient cases, and time frames. In general, estimates from these studies
suggest that the completeness of reporting varies considerably, from 56 to
100 percent. High-prevalence areas for AIDS appear to have have more complete
reporting than low-prevalence areas.
TECHNICAL INFORMATION
Memory/Storage Requirements and Mediums
The AIDS Public Information Data Set contains large quantities of data and
requires significant computer resources for analysis. You need a 386-based
MS-DOS microcomputer with at least 30 megabytes of disk storage and a
high-density diskette drive. The new SETS interface allows you to dislay
simple statistics without additional software such as SAS, SPSS, BMPD, or
PRODAS. More complex analysis, however, will still require additional
software.
To transfer the data to another software package or to a mainframe computer
for analysis, you must first load SETS, then use the Export option to extract
the records and variables you wish to analyze. The Export option will create
an ASCII data file, which can then be processed by other software, or
transferred to your mainframe using software designed for this purpose.
Examples of transfer programs include Crosstalk and Hayes SmartCom.
Loading SETS
The AIDS Public Information Data Set consists of over 10 diskettes. To
install it onto your computer, insert diskette #1 in drive A and type the
following DOS commands
a: <ENTER>
install <ENTER>
The first command changes the current drive to A, and the second command
begins the installation process. Please note that the first diskette is also
the last diskette, i.e., you will need to process it at the beginning and the
end of the installation procedure. You will need at least 30 minutes to
install this software.
Once you have installed SETS, type the command
sets <ENTER>
to run the program. SETS is a menu-driven program which can be mastered with
minimum effort.
Getting Help
You can access help from SETS in two ways, by pressing the <F1> key at any
time you are running SETS, or by selecting the Browse feature. Once you
select Browse, select Documentation, then SETS Manual.
SETS Features
From the SETS main menu, you can select the following options:
BROWSE-to browse the documentation, MSA and state tables, and the main data
file. Browsing the main data file allows you to display the variable names
and value labels contained in the data file.
TABLE-to create and display cross tabulations of any of the variables in the
data set. Tabulations are displayed in a spreadsheet format which can be
saved and loaded onto the Lotus software.
SUBSET-to specify which variables and records should be included in
tabulations or in exported files.
DEFAULTS-to adjust the setting for export drives and directories, and for the
autosave feature and screen colors.
Creating Tables
After you install the SETS program, you can create a table by following these
steps:
1. From the SETS directory, type sets <ENTER>, to run the program.
2. When the program appears, press <ENTER> until the main menu appears.
3. Use the arrow keys to highlight "Table" and press <ENTER>.
4. At "Display" press <ENTER>.
5. At the screen, "What would you like to do," press the <ENTER> key to
select, "Create a new record subset."
6. Type "all" to select all records and press <ENTER>.
7. When the spreadsheet appears, press the <F2> key, edit.
8. Press the <F6> key for table expression assistance.
9. To create a table of SEXCLASS by RACE, begin by using the arrow keys to
highlight the variable SEXCLASS, the press <ENTER> to select this
variable.
10. Use the arrow keys to highlight the variable RACE, then press <ENTER>
to select it.
11. Press the <F10> key to accept these two variables.
12. The spreadsheet will reappear with the text, "Edit: SEXCLASS, RACE"
displayed at the bottom of the screen.
13. Type the text, "/labels" and press <ENTER>. Do not type the quotation
marks.
The SETS program will create the table. This process can take half and hour
or longer, depending on the speed of your machine. Additional detail on how
to create tables is provided in the on-line documentation.
MSA and State Tables
The microfiche contain frequency tables and cross tabulations of 8 variables
extracted from CDC's national AIDS surveillance data set. They contain one
set of tables for the entire United States, one set for each state, and one
set for each MSA. The variables are:
Variable Description
age Age group at diagnosis of the first AIDS-indicator disease
categ Indicates which of the CDC AIDS case revisions the patient meets
dth_hyr Half-year of death for patients reported dead
dx_hyr Half-year of diagnosis of first AIDS-indicator disease
ent_hyr Half-year in which CDC first received information about the case
ptgrp Patient grouping by mode of exposure to HIV.
race Race of patient
sex Sex of patient
The values used for these variables are printed below.
Age
This variable contains the patient's age when he or she was first diagnosed
with an AIDS-indicator disease. Ages printed on the microfiche are grouped
as follows:
0 - 1
1 - 12
13 - 19
20 - 29
30 - 39
40 - 49
50 +
Categ
This variable reflects revision made to the CDC surveillance definition for
AIDS. Only cases meeting the current (1987) surveillance definition are
included on the microfiche. Categ indicates whether the patient also meets
the pre-1985 or 1985 surveillance definition, and whether the diagnosis, if
it meets only the 1987 definition, was definitive or presumptive. Cases that
meet more than one of these surveillance definitions are classified into the
definition category listed first. For more information about the 1987
definition, see Morbidity and Mortality Weekly Report, August 14,1987,
Supplement, pages 3S-15S.
1 = Case meets the pre-1985 surveillance definition
2 = Case meets the 1985 surveillance definition
3 = Case meets the 1987 surveillance definition and was diagnosed definitively
4 = Case meets the 1987 surveillance definition and was diagnosed
presumptively
Dth_hyr
For patients whose death has been reported to CDC, this variable contains
the half-year of death. The first two numbers indicate the year; the second
two indicate the first or second half of that year. For example, the value
"8802" indicates that the patient died in the second half of 1988. Patients
whose death has been reported to CDC, but whose date of death is unknows are
coded as "9999".
Dx_hyr
This variable contains the half year in which the first AIDS-indicator
disease was diagnosed. The first two numbers indicate the year; the second
two indicate the first or second half of that year.
Ent_hyr
This variable contains the half-year in which CDC received the case report.
The first two numbers indicate the year; the second two indicate the first
or second half of that year.
Ptgrp
For surveillance purposes, AIDS patients are grouped into a hierarchy of
exposure categories. Persons with more than one reported mode of exposure to
HIV are counted in the exposure category listed first in the hierarchy,
except for persons with a history of both homosexual/bisexual contact and
intravenous drug use. They are counted in a separate category.
"Pattern II" is a term adopted by the World Health Organization, and refers
to countries with a distinctive pattern of HIV transmission. It is observed
in areas of central, eastern, and southern Africa and in some Caribbean
countries. In these countries, most of the reported cases occur in
heterosexuals; the male to female ratio is approximately 1 to 1; and perinatal
transmission is more common than in other areas. Intravenous drug use and
homosexual transmission either do not occur or occur at low levels.
"Other/undetermined" cases are in persons with no reported history of exposure
to HIV through any of the routes listed in the hierarchy of exposure
categories. Undetermined cases include persons who are currently under
investigation by local health department officials; persons whose exposure
history is incomplete because of death, refusal to be interviewed, or loss
to follow-up; and persons who were interviewed or for whom other follow-up
information was available and no exposure mode was identified.
01 = Male homosexual/bisexual contact
02 = Intravenous (IV) drug use (female and heterosexual male)
03 = Male homosexual/bisexual contact and IV drug use
04 = Hemophilia/coagulation disorder
05 = Heterosexual contact with a person with, or at increased risk for, HIV
infection
06 = Born in Pattern-II country
07 = Receipt of transfusion of blood, blood components, or tissue
08 = Adult/adolescent other/undetermined
09 = Pediatric hemophilia/coagulation disorder
10 = Mother with, or at risk for, HIV infection
11 = Pediatric receipt of transfusion of blood, blood components, or tissue
12 = Pediatric undetermined
Race
1 = White (not Hispanic)
2 = Black (not Hispanic)
3 = Hispanic
4 = Asian/Pacific Islander
5 = American Indian/Alaskan Native
9 = Unknown
Sex
1 = Male
2 = Female
Locating Individual Tables
In accordance with CDC guidelines on protecting confidentiality and with an
agreement made with state and local health departments for release of these
data, entries whose value is 5 or less are not included in the tables. Only
MSAs with 500,000 or more population (according to 1991 census estimates)
are included on the microfiche.
The AIDS Public Information Data Set contains frequency tables of 8
variables, and every possible 2-way cross tabulation of those variables
for each state, each MSA with 500,000 or more population, and for the entire
United States. Tables for the entire United States also contain cross
tabulations of 2 additional variables, STATE and MSA.
To access these tables, select the Browse feature on the SETS menu, then
select "Documentation." A menu will appear which divides the country into
9 geographic regions, New England, North Atlantic, Mid-Atlantic, South
Atlantic, Mid-West, Great Plains, South Central, Mountain, and Pacific. For
example, to access data for New York City, first select the North Atlantic
region. SETS will display an list of all states and MSAs in that region,
including New York City. To view the tables for any state or MSA in that
region, select the name of the state or MSA.
SETS will then display the first table for the state or MSA you have
selected. It will first display the 1-way frequency tables, 1 table per
screen, then the 2-way cross tabulations. Tables are displayed alpha-
betically, beginning with AGE and progressing to RACE and SEX.
SETS allows you to search for individual table entries within each state
or MSA file. Press the <F6> key to begin the search. It will also allow you
display or print a particular page in the file. SETS contains on-line
documentation that describes the search process in more detail.
SETS also allows global searches, i.e. you can search for tables in any of
the state or MSA files included in the data set, not just those contained in
the current state or MSA file. For example, if you are displaying data for
New York City, and want to compare them to data from Los Angeles, you can use
the global search function to search for the entry "Los Angeles." SETS would
then locate the first table in the Los Angeles file. To begin a global
search, press the <F9> key.
SAMPLE TABLE(s) OF INFORMATION
MSA Codes
Definitions for MSAs are issued by the Office of Management and Budget (OMB)
to be used in presentation of statistics by agencies of the federal
government. The metropolitan areas used on the AIDS Public Information Data
Set are the MSAs for all areas except the 6 New England states. For these
states, the New England County Metropolitan Areas (NECMA, also defined by OMB)
are used. Metropolitan areas are named for a central city in the MSA or NECMA
and may include several counties and cross state boundaries.
Code Metropolitan area
80 Alkron, Ohio
160 Albany-Schenectady, N.Y.
200 Albuquerque, N.M.
240 Allentown, Pa.
360 Anaheim, Calif.
520 Atlanta, Ga.
640 Austin, Tex.
680 Bakersfield, Calif.
720 Baltimore, Md.
760 Baton Rouge, La.
875 Bergen-Passaic, N.J.
1000 Birmingham, Ala.
1123 Boston, Mass.
1163 Bridgeport, Conn.
1280 Buffalo, N.Y.
1440 Charleston, S.C.
1520 Charlotte, N.C.
1600 Chicago, Ill.
1640 Cincinnati, Ohio
1680 Cleveland, Ohio
1840 Columbus, Ohio
1920 Dallas, Tex.
2000 Dayton, Ohio
2080 Denver, Colo.
2160 Detroit, Mich.
2320 El Paso, Tex.
2680 Fort Lauderdale, Fla.
2800 Fort Worth, Tex.
2840 Fresno, Calif.
2960 Gary, Ind.
3000 Grand Rapids, Mich.
3120 Greensboro, N.C.
3160 Greenville, S.C.
3240 Harrisburg, Pa.
3283 Hartford, Conn.
3320 Honolulu, Hawaii
3360 Houston, Tex.
3480 Indianapolis, Ind.
3600 Jacksonville, Fla.
3640 Jersey City, N.J.
3760 Kansas City, Mo.
3840 Knoxville, Tenn.
4120 Las Vegas, Nev.
4400 Little Rock, Ark
4480 Los Angeles, Calif.
4520 Louisville, Ky.
4920 Memphis, Tenn.
5000 Miami,FIa.
5015 Middlesex, N.J.
5080 Milwaukee, Wis.
5120 Minneapolis-Saint Paul, Minn.
5190 Monmouth-Ocean City, N.J.
5360 Nashville, Tenn.
5380 Nassau-Suffolk, N.Y.
5483 New Haven, Conn.
5560 New Orleans, La.
5600 New York, N.Y.
5640 Newark, N.J.
5720 Norfolk, Va.
5775 Oakland, Calif.
5880 Oklahoma City, Okla.
5920 Omaha, Nebr.
5960 Orlando, Fla.
6000 Oxnard-Ventura, Calif.
6160 Philadelphia, Pa.
6200 Phoenix, Ariz.
6280 Pittsburgh, Pa.
6440 Portland, Oreg.
6483 Providence, R.I.
6640 Raleigh-Durham, N.C.
6760 Richmond, Va.
6780 Riverside-San Bernardino, Calif.
6840 Rochester, N.Y.
6920 Sacramento, Calif.
7040 Saint Louis, Mo.
7160 Salt Lake City, Utah
7240 San Antonio, Tex.
7320 San Diego, Calif.
7360 San Francisco, Calif.
7400 San Jose, Calif.
7440 San Juan, P.R.
7560 Scranton, Pa.
7600 Seattle, Wash.
8003 Springfield, Mass.
8160 Syracuse, N.Y.
8200 Tacoma, Wash.
8280 Tampa-Saint Petersburg, Fla.
8400 Toledo, Ohio
8520 Tucson, Ariz.
8560 Tulsa, Okla.
8840 Washington, D.C.
8960 West Palm Beach, Fla.
9160 Wilmington, Del.
9243 Worcester, Mass.