Sensitivity and specificity of neuropsychological tests for dementia and mild cognitive impairment in a sample of residential elderly in South Africa

A growing number of people are surviving into old age, with an associated increase in the prevalence of dementia.[1] It has been postulated that up to 80% of cases remain undiagnosed,[2] which has resulted in a call for dementia to be regarded as a global health priority.[3] The accurate detection of cognitive deficits due to dementia and mild cognitive impairment (MCI), a predementia stage, is important to distinguish these from normal age-associated cognitive deficits. Delaying the progression from MCI to dementia by even 1 year has been shown to result in significant cost savings.[4] The absence of reliable, universally acceptable biological and radiological markers for dementia necessitates the reliance on clinical assessments for a diagnosis,[5] supported by the assessment of cognitive disturbances using a range of screening tests. The use of neuropsychological screening tests allows for the assessment of specific cognitive domains, can distinguish age-related cognitive deficits from those due to MCI or dementia, and is superior to brief cognitive tools for which floor and ceiling effects threaten their validity.[6,7] There is currently no universally accepted battery of tests for MCI[8,9] and the functioning of neuropsychological tests in lowresource residential settings in South Africa (SA) has not been widely tested. The objective of this study was to determine the sensitivity and specificity of a battery of neuropsychological tests in a sample of elderly persons living in a residential setting in SA.

A growing number of people are surviving into old age, with an associated increase in the prevalence of dementia. [1]It has been postulated that up to 80% of cases remain undiagnosed, [2] which has resulted in a call for dementia to be regarded as a global health priority. [3]he accurate detection of cognitive deficits due to dementia and mild cognitive impairment (MCI), a predementia stage, is important to distinguish these from normal age-associated cognitive deficits.Delaying the progression from MCI to dementia by even 1 year has been shown to result in significant cost savings. [4]he absence of reliable, universally acceptable biological and radiological markers for dementia necessitates the reliance on clinical assessments for a diagnosis, [5] supported by the assessment of cognitive disturbances using a range of screening tests.The use of neuropsychological screening tests allows for the assessment of specific cognitive domains, can distinguish age-related cognitive deficits from those due to MCI or dementia, and is superior to brief cognitive tools for which floor and ceiling effects threaten their validity. [6,7]There is currently no universally accepted battery of tests for MCI [8,9] and the functioning of neuropsychological tests in lowresource residential settings in South Africa (SA) has not been widely tested.
The objective of this study was to determine the sensitivity and specificity of a battery of neuropsychological tests in a sample of elderly persons living in a residential setting in SA. engagement in the assessment procedures.The study comprised three stages of cognitive assessments, conducted sequentially.In stage one, a random sample (n=302) of residents aged ≥60 years was selected based on a statistical calculation from all the homes.The Mini-Mental State Examination (MMSE), [10] a subjective memory rating scale, [11] and an informant questionnaire, the Deterioration Cognitive Observée (DECO) [12] were used to screen for dementia. [13]The sample size was calculated on a sensitivity and specificity of 85% for the MMSE [14] and a conservative estimate of 20% prevalence for dementia in residential homes, based on the reported ranges of 16% [15] to 75%. [16]In stage two, 140 participants from stage one, including MMSE screen positives and a random selection of MMSE screen negatives, were clinically assessed by psychiatrists and categorised as having dementia, MCI or not being clinically cognitively impaired. [17]Diagnostic and Statistical Manual of Mental Disorders, 4th edition (text revised) criteria A and B for Alzheimer's and vascular dementia were used to assign a general diagnosis of dementia without reference to aetiology. [18]MCI was diagnosed using the criteria contained in the report of the International Working Group on mild cognitive impairment. [19]In stage 3, 140 participants from stage 2 were available and consented to the administration of a neuropsychological battery of tests.Of these, 2 participants died during the study and 20 either refused or were unavailable to participate.One person was unable to complete any of the neuropsychological tests and was excluded from the dataset.Of the remaining 117 participants, 9 were participants with a diagnosis of dementia, 30 with MCI and 78 participants did not meet diagnostic criteria and were classified as controls.
The overall study was approved by the Biomedical Research Ethics Committee of the University of KwaZulu-Natal, with residents providing written, informed consent to participate.

Neuropsychological tests
Eleven neuropsychological measures (including 29 tests) were administered by clinical psychologists in English in a single session at the participants' residences.The tests were: Rey auditory verbal learning tests (RAVLTs); [20] digit span [20] and digit symbol; [20] controlled oral word association test (COWAT F-A-S and Animal); [20] short story comprehension and recall (using an SA adaptation of the cowboy story, i.e. ' A Farmer from Transkei'); [20] the token test (short version); [20] Rey complex figure (RCF); [20] trail making tests A and B (TMT-A and -B); [20] the clock drawing test [21] (the freedrawing version with the '10 past 11' time setting instruction using Rouleau's 10-point scoring system); [22] the Luria hand sequence; [20] and the Maze test. [20]The psychologists were blind to the participants' cognitive performances in the previous phases of the study.

Data analysis
Data were analysed using IBM SPSS (version 21.0) and MedCalc (version 12.5.0).Descriptive statistics were calculated for demographic variables, and mean neuropsychological scores and 95% confidence intervals (CIs) were calculated for each of the classified groups.Between-group comparisons were undertaken using non-parametric Kruskal-Wallis independent samples tests and Pearson's χ 2 tests or Fisher's exact tests where appropriate.Between-group tests were not analysed by race due to the low number of black participants (n=4).Individual test random operating curve (ROC) analysis was conducted for discrimination validity and diagnostic accuracy of the different tests for dementia (n=9) v. non-dementia participants (controls and MCI cases; n=108) and for controls (n=78) and MCI participants (n=30), respectively.For each test, the area under the curve (AUC) was calculated with 95% CIs.Swet's interpretation of AUC scores was used in this study: 0.5 = non-informative; 0.5<AUC≤0.7 = less accurate; 0.7<AUC≤0.9= moderately accurate; 0.9<AUC<1 = highly accurate; and the perfect test has an AUC = 1. [23]Optimal cutoff scores based on the Youden index and the associated sensitivity and specificity values were generated and ranked as significant tests for dementia and for MCI identification, respectively.

Cognitive deficits by diagnostic groups and controls
Across the three classification categories, the differences in score means of the RAVLT (total), digit symbol (90 s), and COWAT Animal were highly significant, with p<0.001.With the exception of the digit span (forward), digit span total, COWAT A, narrative memory test (delayed recall), token test and the Luria hand sequence test, there were significant mean differences between all other tests in the three groups (Table 2).The mean score on most tests demonstrated a progressive declining pattern in cognitive performance

Discrimination validity: Dementia v. non-dementia participants
Using individual ROC curves to measure how well the tests discriminated between participants with dementia and without dementia (participants with MCI and controls), AUCs ranged from 0.519 for digit span (forward) to 0.828 for digit symbol (90 s).Fifteen of the 29 tests achieved a significant difference, with p<0.001 demonstrated on the RAVLT trial II (AUC = 0.753), RAVLT   3).The 15 significant tests were ranked, based on optimum balance between sensitivity and specificity (Fig. 1).The most balanced test was the digit symbol, with a sensitivity of 87.5% and specificities of 80.6% at 90 s and 75.0% at 120 s, with cut-off scores of <15 and <20, respectively.Although the RAVLT trial III and the RAVLT total had the best discrimination sensitivity (100% at cut-off scores <7 and <34), this was at the expense of their specificities, which were 50.0%and 50.9%, respectively.

Discrimination validity: MCI v. control participants
Using individual ROC curves to measure how well the tests discriminated between participants with MCI and controls, 17 of the 29 tests achieved a significant AUC (p<0.05)(Table 4).AUCs ranged from 0.621 to 0.754, with significance levels of p<0.001 on the digit symbol (90 s), COWAT F, COWAT Total, and COWAT Animal.
The 17 significant tests were ranked, based on optimum balance between sensitivity and specificity (Fig. 2).The most balanced subtest score, the RCF (delayed recall), had a sensitivity of 77.8% and specificity of 76.9%, at a cut-off score of ≤14.The highest sensitivity reported was 93.3%, which was for digit span (backwards) and digit symbol (120 s), at cut-off scores of ≤5 and ≤36, respectively, with the highest specificity being 83.3% (Maze total), at a cut-off score of >544.

Discussion
This research sought to determine the sensitivity and specificity of a battery of neuropsychological tests in elderly participants from a low-resource residential setting, who were diagnosed with MCI and dementia.While the tests used have been widely researched, to our knowledge, this is the first time that their diagnostic discriminability was evaluated in a heterogeneous elderly SA population.The findings are discussed in terms of screening for overall cognitive decline, dementia and MCI.

Screening for overall pathological cognitive decline
With the exception of recall on the narrative memory test and the token test, all the tests were able to significantly discriminate between controls and those with clinically significant cognitive impairment (dementia or MCI).The token test was found to be of little value, in keeping with previous research that found that the token test ceiling effects limited its utility. [24]he data from this research supported the conclusion that the clock drawing test has value for screening moderateto-severe cognitive impairment, but is relatively poor at detecting milder forms of cognitive impairment. [25]The clock drawing test has been described as the ideal cognitive screening test owing to its ease  of administration and scoring, its ability to assess a range of cognitive abilities and good psychometric properties. [26,27]While it has shown good correlation with the Mini-Mental State Examination (MMSE), [26] it has also shown moderate sensitivity and specificity in detecting executive dysfunction in patients who have a normal MMSE. [28] this study, the clock drawing test, with an AUC of 0.732, supported its utility as a dementia assessment [27] and confirmed earlier findings of its ability to differentiate normal from pathological cognitive decline. [29]However, the sensitivity of 44.4% in this study was much lower than the mean of 85% reported in the literature for dementia screening, while the specificity of 91.7% compared favourably with the specificity of 85% reported in the literature. [26]reening for dementia Memory disturbances are one of the most common cognitive complaints in the elderly, and can be attributed to either normal decline associated with ageing or dementia. [30]In this research, the RAVLTs demonstrated the best discrimination validity for dementia, thus confirming their utility in diagnosing the disorder.With the exception of trial I of the RAVLTs, all the RAVLT subtest measures displayed significance at p<0.001.These tests are brief tests of memory function that are easy to administer and sensitive to encoding, storage and retrieval of memory. [31]This finding is consistent with those of previous studies, showing that the RAVLTs are useful in distinguishing participants with dementia from those in a control group. [32]Some studies have also shown that the RAVLTs are useful in distinguishing normal participants from those with dementia associated with Alzheimer's disease and vascular dementia, [32,33] and that the RAVLTs are also able to predict the conversion to dementia in those individuals with subjective memory complaints [32] and MCI. [33]owever, the most balanced test was found to be the digit symbol (p<0.001), which measures attention and working memory.The mean scores on the digit symbol (90 s) in the study were lower than those reported by Hart et al., [33] which may be owing to the lower mean education level of participants; education level is reported to affect performance on the digit symbol test.These findings suggest that the memory deficits evident by the RAVLT scores may be attributable to storage and retrieval difficulties, as the digit symbol test is more demanding of attention and concentration consistent with the graded deterioration of cognitive functions in dementia.Two other tests useful for screening of dementia and consistent with other studies   were the RCF [34] and COWAT Animal. [35,36]he RCF findings were similar to a study that showed that it was able to distinguish patients with Alzheimer's disease and vascular dementia from those in the control group. [34]The COWAT semantic fluency ('animal') category displayed diagnostic significance for dementia (p=0.017),whereas phonological fluency (FAS) did not.The superiority of semantic fluency over other verbal fluency measures has been shown in SA [35] and elsewhere. [36]The animal category is also simple to administer and interpret, and could be valuable as a screening and/or diagnostic measure for use in low-income country settings.Although the TMT is a popular neuropsychological test and is included in most test batteries, in this study the TMT did not significantly discriminate dementia from non-dementia participants.However, this finding may be owing to the non-clinical sample in this study, as neuropsychological tests are considered to be less accurate in community samples. [29]reening for MCI MCI is widely regarded as an intermediate stage between Alzheimer's disease and normal ageing, and has a heterogeneous cognitive profile. [37,38]It is therefore recommended that an extensive range of domains be covered when assessing for MCI, including language, memory, executive functions and attention. [9,38]Several tests displayed significant diagnostic accuracy for MCI, namely digit symbol (90 s), COWAT F, FAS Total and Animal, and the TMT-B, with the highest sensitivity reported for digit span (backwards), digit symbol (120 s), and Maze total.The most balanced test was the RCF (delayed recall).The findings showed a similarity to the Goteborg MCI study, [38] with significantly low scores on memory, attention and working memory, and visuospatial and executive functions.
Compared with the profile of tests that were significant for dementia, the language tests (COWAT F-A-S, Total and Animal) and the working memory/attention tests (digit span) showed little overlap, suggesting that in this sample at least, MCI and dementia may represent different clinical entities rather than MCI being a milder or earlier stage of dementia.Our findings therefore validate MCI as a distinct clinical entity that is distinguishable, with neuropsychological testing, from age-related and dementiaassociated cognitive decline.

Study limitations
The lack of stratification of the sample per demographic variables reviewed (race, gender, age, education), the small number of dementia cases and the specificity of the residential setting limited the generalisability of our findings.Though the sample reflected the population of the residential homes, it did not represent all race groups in the SA community adequately.In view of the confounding effects of culture on test performance, it is recommended that this research is repeated with a large sample of black African participants.As the ability to speak English was required for eligibility  to participate in the study, the confounding effects of language cannot be excluded, as English was not the first language of all participants.Further, the discrimination capacity of the tests for dementia may have been diminished by the inclusion of MCI participants in the comparison group, inflating the mean test scores of the non-dementia group.

Conclusion
A recommended neuropsychological test battery can be used effectively either to screen for or discriminate between early and later stages of cognitive impairment in the elderly.However, in line with the Jacova [7] 's evidence-based review, it is recommended that neuropsychological tests should not be used alone for diagnostic purposes; they should be part of a clinically integrative process, used selectively to aid in distinguishing normal age-related cognition from MCI and early dementia, for the differential diagnosis of cognitive impairment and possibly for assessing the risk of progression from MCI to dementia. [7] article trial III (AUC = 0.775), RAVLT trial V (AUC = 0.812), RAVLT total (AUC = 0.805), RAVLT immediate recall (AUC = 0.770), RAVLT 20 min recall (AUC = 0.741), digit symbol (90 s) (AUC = 0.828), digit symbol (120 s) (AUC = 0.804) and RCF copy (AUC = 0.783) (Table article

Table 1 . Demographic data per diagnostic group
article from the control group to MCI to dementia subjects.The exceptions to this pattern were on digit span total (no difference in mean scores between MCI (12.2) and dementia (12.2) (p=0.066) and the COWAT group, where the mean dementia group scores were slightly better than the MCI group scores, with COWAT F (5.5 v. 4.6; p=0.010),COWAT A (3.8 v. 3.0; p=0.102),COWAT S (5.5 v. 5.4; p=0.011) and COWAT Total (14.4 v. 12.4; p=0.002).Similarly, for the RCF (copy), the MCI score (40.4) was better than that of the controls (30.7)(p=0.017).

Table 4 . ROC analysis tests for MCI v. controls
ROC = random operating curve; AUC = area under the curve; CI = confidence interval; RAVLT = Rey auditory verbal learning tests; COWAT = controlled oral word association test; RCF = Rey complex figure; TMT = trail making test.* Significance level set at p<0.05 and 95% CIs.