
Appendix A
HRY DataBank Methodology
Formation of the HRY DataBank Work Group
CSAP, through its contractor The CDM Group, identified a work group that included members from CSAP, program evaluators, and The CDM Group staff to design and develop implementation of the HRY DataBank. These different staff persons make up the High-Risk Populations Work Group. Members of the work group first met in January 1995 and have met since then on a quarterly basis to discuss document review, coding, and data entry procedures. In addition, the group discusses HRY DataBank products and addresses issues related to future development of the database. The five work group evaluators are also an integral part of the document review process, forming the core of the staff performing evaluation review and coding.
Document Review
After reviewing hundreds of documents for content, project staff determined that the most important documents for the HRY DataBank were the initial grant application, special papers written in response to CSAP calls for findings, and final reports. In 1996, the Evaluation Status Reports (ESRs) were added to this list. Grantee submission of ESRs was required in Continuing Applications as a comprehensive interim evaluation report.
Document Coding
The CDM Group staff and consultant evaluators review grantee documents and extract descriptive and evaluative information using two coding forms. One form was designed to collect descriptive data. A second form was designed to capture information about the evaluation design and methodology and to identify and categorize reported findings. The evaluation coding form is used by consultant evaluators.
Initially, all descriptive information was coded verbatim. Using all of the verbatim information, the final coding schema were defined by a consensus procedure used among senior coders. Rules for categorizing and coding items in each report were formalized in a Coder’s Manual, which was distributed to all those coding descriptive information. The manual was used both to train coders and as a handy reference document. In addition, to maximize consistency in coding methodology, a team of coders met regularly to review coding questions and experiences. The manual itself was updated as new coding categories became evident to the team.
Components of the HRY DataBank
Descriptive component
The HRY DataBank contains basic descriptive information on every HRY and PPWI program funded by CSAP. Descriptive data include the following:
- Identifier Data: Grant number, project name, grantee agency, city, state, and funding period.
- Target Population: Age, ethnicity, gender, and risk factors.
- Interventions: Activities and number of sites.
Evaluation component
Although the data in the descriptive component provide an overview of project characteristics and can provide general information about individual grants, the overarching purpose of the HRY DataBank is to provide a systematic record of outcomes. The data related to outcomes give the HRY DataBank its potential as an analytic and/or planning tool. Not all programs were coded for findings. In general, evaluation coding was initiated only for the subset of programs that reported outcomes and documented their intervention implementation and evaluation research methods sufficiently for reviewer understanding. The program evaluation component of the HRY DataBank contains the following data on 147 different HRY initiatives:
- Evaluation Methodologies: Qualitative and quantitative study designs.
- Treatment and Comparison Groups: Sample size, level and treatment of attrition, presence and nature of comparison groups; method of assignment to treatment or comparison group; initial comparability of treatment and comparison groups; and method of correcting for noncomparability.
- Implementation of Intervention Activities: Dosage, outcome measures, and instruments.
- Findings Identification: Findings for HRY grants are arranged by CSAP domain (individual, peer, family, school, community, and society). In addition, a substance abuse domain is used to isolate findings directly related to changes in drug use, knowledge, and attitudes. When reported, results of statistical analyses are indicated.
- Findings Ratings: Each reported finding is rated for effectiveness and for level of confidence.
- Domain Ratings: A program may have findings in several domains. For each program, the domains in which findings have been reported are assigned ratings based on methodological rigor and overall level of effect.
Criteria for Review of Rigorous Evaluation Designs
Teams of two expert evaluation consultants reviewed and coded program evaluation status reports, findings papers, and final reports. To be viewed as producing credible results (“above the line” or ˇ3 on a 5-point integrity Likert-type scale), quantitative studies were examined to ensure they possessed relatively rigorous research designs having the following characteristics:
- Experimental design in which participants were randomly assigned to an experimental or control group in which no services were offered, an alternative service to the experimental program was offered, or program services were offered only after a protracted delay.
Or
- Quasi-experimental design in which participants were not randomly assigned to a program treatment or to a comparison group in which they received no treatment. Quasi-experimental designs also include those designs that randomly assign blocks of participants to treatment and comparison conditions (e.g., schools) and cohort sequential designs.
And
- Pre- and posttests were given to both treatment and comparison groups, or change was assessed through the use of clinically meaningful measures.
- Reliable and valid instruments with cultural relevance were used to collect data on the treatment and comparison groups.
- Appropriate statistical procedures were consistently used in data analysis.
- No other internal design (e.g., high or differential attrition, poor program implementation) or external conditions (e.g., municipality implements an intervention) affecting treatment and/or comparison samples could reasonably explain the observed results.
To be viewed as producing credible results (“above the line”), qualitative studies were examined to ensure that:
- They clearly demonstrated evidence of a systematic, replicable, unbiased approach to data collection (e.g., evidence of standard procedures for conducting interviews and focus groups). Where observations were reported, evidence of a protocol or standard format for recording information was required.
- Measures used were reliable, possessing at least face validity, and/or had some standard against which to gauge change (e.g., clinically anchored measures or performance measures).
Quantitative or qualitative programs that satisfied these criteria were judged to be at least moderately rigorous and capable of producing quantitative data in which we have some confidence. These studies were coded completely. Findings from other efforts failing to meet these criteria were not often coded for integrity or effectiveness since, by definition, the credibility of the results reported was suspect.
Assessing Reported Findings
To gauge both direction and magnitude of effect, pairs of reviewers extracted and rated each individual finding reported. Initially, test statistics accompanying a reported finding were translated to reflect an estimated or “rough” effect size. To further prevent incautious use of these estimates as true effect sizes, estimates were further translated into ratings made on a 7-point Likert-type scale, with 1 indicating a highly significant or meaningful negative effect, 4 indicating no meaningful or significant effect, and 7 indicating a highly significant or meaningful positive effect. Effect size translation tables, along with conventions for scoring estimated effect sizes on the Likert-type scale, are presented in Appendix B. Because ratings of direction and magnitude are not exact estimates of effect size, the constructed metric used in the DataBank should serve only as a guide for program planners, policy analysts, and researchers looking through the database to identify promising, effective, or ineffective practices.
Quantitative and qualitative findings were also rated for integrity. This rating indicates the degree of confidence the reviewer has in the reported finding. The integrity ratings use a 5-point Likert-type scale on which 1 indicates no confidence and 5 indicates high confidence. In assessing the level of confidence in findings, it was important to consider both the study design from which the finding was extracted and the quality of implementation of that study design. For example, a true experimental study from which there was high and differential attrition inspires little confidence. On the other hand, confidence may be relatively high for data gathered during the course of a well-executed ethnographic study, a rigorously implemented and analyzed set of focus groups, or a post-only study with a comparison using clinically significant or objective and comprehensive record data.
Domain Ratings
After individual findings related to program objectives were rated for effectiveness and level of confidence, they were grouped by domain, and overall ratings for the effects and level of confidence within each outcome domain were generated. The overall measure of effectiveness is somewhat subjective because the individual raters may have differed to some degree in how they weighted the evidence presented. Level of confidence assessments tended to be less variant because virtually all findings within a domain derived from the same research design.
Consensus Among Reviewers
Pairs of trained evaluator reviewers extracted both the descriptive research and findings information from appropriate program documents. The following criteria were used to determine if the paired reviewers’ ratings were unacceptably divergent:
- If an overall rigor/integrity rating by domain was scored a 2 (weak, at best some confidence) by one reviewer and scored a 3 (mixed, some weak, some strong characteristics) by the other reviewer or if their ratings were on opposite sides of the scale (e.g., 2/4).
- If their overall effect ratings by domain were 2 or more points apart (e.g., 5/7).
If the evaluators disagreed, they contacted one another (usually by phone) to discuss the basis for their discrepant ratings. Each evaluator was provided a copy of the co-reviewer’s original coding sheets. After consensus was reached, project staff were notified so the acceptable ratings could be entered into the HRY DataBank. Overall, 37 projects were initially identified as meeting these criteria for inclusion.
Reviews for Quality of Implementation
Subsequent to this first set of reviews, if programs produced data in which reviewers felt at least moderately confident, a second set of reviews took place. Here, a pair of outside expert reviewers scrutinized the source documents for information on quality of implementation as well as the quality of the evaluation research reported. In addition, because this review focused on identifying those projects clearly demonstrating their effectiveness, the criteria for inclusion was set higher for research integrity (ratings ˇ4), and the positivity of data was reported. Findings were examined for both consistency (across measures within a domain and across domains) and directions of effects. Again, consensus among reviewers was required before any final decision regarding the disposition of a project was made. Eight programs were identified as meeting these criteria for inclusion.
Previous | Table of Contents | Next
Last Updated: March 4, 2002