ASSESSMENT OF THE DATA QUALITY

horizontal rule

 
 


 

Coale and Li (1991) concluded that, among all developing countries, the Chinese Han population is probably the only one that has both large numbers of extremely old people and age reporting that is about as reliable as in developed countries. A pilot study of our healthy longevity project reached very similar conclusions. This study focused on age validation of Chinese Han centenarians through rigorous analysis and comparison of the demographic indices of the age reporting with those of Swedish, Japanese, French, and Italian oldest old populations. Our conclusion is that the age reporting of the Chinese Han oldest old population is generally as good as that in Western countries up to the age of 105. The age reporting after age 105 is somewhat questionable (Wang, Zeng, Jeune, and Vaupel, 1998). These “super-centenarians”, who reported ages of 106 and above (156 in total in our sample), will be investigated later case by case and are not included in our analysis in this paper.

How is the age reporting in our 1998 survey, which also included some minority ethnic groups (about 7 percent of the sample) living in the 22 Han-dominated provinces? One of the most effective ways to answer this question is to evaluate the age distribution of the interviewed centenarians. We found the age distribution of centenarians interviewed in our 1998 survey to be very similar to that of Swedish centenarians, especially the curves for the Chinese and Swedish male centenarians, which are almost identical (see Figure 1 on page 7). This fact has led us to believe that age reporting in our 1998 survey is generally good. The Han, Zhuang, Hui, Yao, Manchu, Korean, and Mongolian ethnic groups make up 92.8, 4.4, 1.3, 0.7, 0.3, 0.1, and 0.03 percent of the sample, respectively. These seven ethnic groups account for 99.7 percent of the entire sample, and their Whipple’s Index and Mayer’s Index based on the 1982 and 1990 census data all indicate “very good” quality of age reporting (see Table 1). A comparison of age distributions and sex ratios of Han and Swedish centenarians shows a high degree of similarity. When one compares the centenarians from the other six minority ethnic groups with their Swedish counterparts the differences are somewhat greater, but the data are still generally acceptable (see Table 2). Some other ethnic minority groups living in the other nine provinces that do not report age accurately (e.g. Uygurs, which consist of 47.5 percent of the total population in Xinjiang) are not included in our sample.

Table 1 Ethnic Composition of the Chinese 1998 Survey and the Age Reporting Indices

Ethnic group

Percent of the Sample

Whipple’s Index

Mayer’s Index

1982 census

1990 census

1982 census

1990 census

Han

92.8

101.5

100.5

1.48

2.85

Zhuang

4.44

100.1

102.1

2.79

2.25

Hui

1.33

101.4

102.4

1.81

2.71

Yao

0.66

101.1

101.1

3.58

2.28

Manchu

0.34

100.1

105.3

2.57

3.13

Korea

0.11

103.2

104.3

1.96

2.33

Mongolia

0.03

99.7

104.0

2.56

2.45

 Whipple’s Index: <105 very good, 105-110 good, 110-125 so-so, >125 poor.
 Mayer’s Index: <10 good, 10-20 so-so, >20 poor.

Table 2. Age Distribution (Males and Females Combined) and Sex Ratio of
Han and Minority Chinese Centenarians Interviewed in the 1998 survey,
as Compared to Swedish Centenarians

Age Distribution

Sex Ratio

Total #

Age

100

101

102

103

104

105

106

107

108

109

Total

100-104

105-109

100-109

Han

38.61

24.49

14.57

8.94

4.92

3.13

2.14

1.61

0.85

0.76

100

26.19

21.02

2249

Minority

30.00

24.71

15.88

8.82

9.41

2.35

4.12

2.35

1.76

0.59

100

17.05

18.75

179

Swedish

44.64

25.40

14.29

7.72

4.07

2.25

1.04

0.41

0.14

0.04

100

24.53

19.34

5556

The reliability coefficients of the 10 categories of variables are rather good (see Table 3). For example, the ADL reliability coefficient is 0.88 in our 1998 survey, while it was 0.87 in the Duke Older American Resources and Services Program survey (Fillenbaum, 1988) and 0.89 in the Canadian 1991-92 elderly survey (Penning and Strain, 1994). Another way of measuring reliability is to conduct factor analysis to see whether interviewees’ answers to questions of the same category are consistent (Anita et al., 1992). If they are, the answers to questions of the same category should be classified in one component and the values of the coefficients should be close to each other. Our factor analysis demonstrates clearly that the consistency and reliability of the answers are rather good (see Table 4).

Table 3 Reliability Coefficients

Category of Variables

# of questions

RCa

1. Orientation

5

0.9395

2. Registration

3

0.9886

3. Calculation

5

0.9910

4. Recall

5

0.9850

5. Language

6

0.8923

6.Unable to answer those question which are supposed to be answered by interviewees

32

0.9827

7. Correctly answer those questions concerning orientation, registration, attention, calculation, recall and language which are supposed to be answered by interviewees

23

0.9530

8. Upper extremities

2

0.8086

9. Capacity of body movementb

2

0.8452

10. ADL

6

0.8790

ADL (U.S.A.)c

6

0.84

ADL (Canada)d

18

0.89

Notes: a. RC = Reliability Coefficient. We used Cronbach’s alpha to compute RC as follows:

Rtt(alpha)=1-[(MSwithin/MSrespondents)-((MSrespondents-MSwithin)/MSrespondents)]
where MS is Mean Squares. See: Anita L. Stewart, et al. 1992: 82.

b. Include standing up from a chair and picking up a book from the floor.
c. See: G.G. Fillenbaum, 1988: 24.
d. See: Margaret J. Penning et al. 1994: s204.

Table 4 Factor Analysis - Pattern Matrix

Notes: 1. All of the estimates of the coefficients of the same category items are consistent except the estimates for the language-b item, which are not consistent with other language items because the language-b item is not properly designed. It asked the interviewee to repeat a sentence “Grow melons and beans in front of the house and in back of the house”, and it turned out that this sentence was simply too difficult for the oldest old persons to understand and repeat.
2. SPSS Software (SPSS 9.0 version, 1998) was used to conduct the principal component analysis. The rotation method used was Oblimin with Kaiser normalization.

The rates of logically inconsistent answers seem reasonably low (see Table 5). The rates of “Don’t know” and “Missing” answers are also relatively small (see Table 6). The above-mentioned indicators have led us to believe that the data quality of our 1998 survey is generally good. However, we also realize that some problems exist in the data set. This is expected given its nature as the first large survey of oldest old people conducted in a developing country. For example, the proportion of oldest old living in institutions might be over-estimated. We noticed that the proportions of those living in institutions based on the 1998 survey had substantially increased as compared with the corresponding figures based on the 1990 census. This increase might be due to two main factors. First, because of rapid economic development and increased governmental and societal attentions to social service for the elderly since 1990, there was a real increase in institutional facilities between 1990 and 1998. Second, as discussed earlier, for each centenarian in the randomly selected half of the counties and cities, we instructed our survey teams to try to interview one close-by octogenarian and one close-by nonagenarian of pre-specified age and sex. If the centenarians resided in or close to an institution, it was likely that the matched octogenarians and nonagenarians were selected from the institution. Therefore, the institutionalized oldest old might be over-sampled. Another example is that we found that a substantially higher proportion of the oldest old, especially centenarians, were unable to answer more of the personality-related questions than other types of questions. This is because some illiterate oldest old, especially centenarians, could not understand the questions about personality, which ask oldest old to provide a comparison of themselves to a typical person with a specified disposition (e.g. always look on the bright side of things). We expect that further evaluation, accompanied by subsequent in-depth analysis, will provide information that will permit a more critical examination of the data quality. 

Table 5 Percentage of Oldest Old with Inconsistent Responses

Items of inconsistent response

#

%

1. Age at beginning smoking is greater than age at stopping smoking

2

0.02

2. Didn’t smoke in the past, but gave age of beginning (or stopping) smoking

11

0.12

3. Didn’t drink alcohol in the past, but gave age of beginning (or stopping) drinking

35

0.39

4. Age of beginning exercising is greater than age of stopping exercising

3

0.03

5. Didn’t exercise in the past, but gave age of beginning (stopping) exercising

36

0.40

6. Age of beginning doing physical labor is greater than age of stopping doing it

1

0.01

7. Didn’t do physical labor in the past, but gave age of beginning (or stopping) doing it

93

1.04

8. Age difference between mother’s age and the interviewee’s age at time of mother’s death is less than 12 years

29

0.32

9. Age difference between father’s age and the interviewee’s age at time of father’s death is less than 12 years

26

0.29

10. Reported “no sibling”, but gave sibling information or vice versa.

7

0.08

11. Age difference between interviewee and his or her eldest child is less than 12 years

42

0.47

12. Reported bedridden, but able to stand up from a chair

114

1.28

13. Fully dependent (ADL) but able to turn around and stand up from a chair without help

256

2.86

14. Unable to stand up from a chair but doing housework and field work everyday

21

0.23

15. Unable to turn around by self but doing housework and fieldwork everyday

128

1.43

16. Reported bedridden but doing housework and field work everyday

12

0.13

17. Some items reported by proxy, but interviewer chose “no one helped the interviewee to answer any question” (this inconsistency might be caused by the interviewer’s misunderstanding the question “Did anyone help the interviewee to answer any question?” addressed to the interviewer. They might have misunderstood it as referring only to those questions that must be answered by interviewee.)

1181

13.21

Oldest old with at least one of the inconsistencies listed above (not including the proxy inconsistency, as described in the above item 17)

756

8.46

Table 6 Percentage of “Don’t know” and “Missing” Answers among
all Questions Asked in the Chinese 1998 Healthy Longevity Survey

Males (%)

Females (%)

Age

“Don’t know”

“Missing”

Total

“Don’t know”

“Missing”

Total

80-89

1.18

1.64

2.82

1.33

1.24

2.57

90-99

1.51

1.07

2.58

1.85

0.87

2.72

100+

1.71

0.95

2.66

2.20

0.98

3.18

 

 


 

China Population and Development Research Center
12 Dahuisi Road, Haidian District, Beijing 100081
P.R.China
Email:info@cpirc.org.