TO: Doug Marker, Fish and Wildlife Division Director, Northwest Power Planning Council
FROM: ISRP
SUBJECT: Review of March 27, 2002 Draft Guidelines for Action Effectiveness Research Proposals for FCRPS Offsite Mitigation Habitat Measures by C. Paulsen, S. Katz, T. Hillman, A. Giorgi, C. Jordon, M. Newsom, and J. Geiselman
At your and BPA's request, the ISRP reviewed the March 27, 2002 Draft "Guidelines for Action Effectiveness Research Proposals for FCRPS Offsite Mitigation Habitat Measures" (Paulsen et al.) that BPA would like to reference in their solicitation letter for the mainstem and systemwide solicitation. The Action Agencies (Bonneville Power Administration, United States Army Corps of Engineers, and the Bureau of Reclamation) and the National Marine Fisheries Service (NMFS) have developed the proposed guidelines for sponsors and reviewers of action effectiveness research projects. For this review, the ISRP was asked specifically to address three questions, namely:
- Do the guidelines clearly identify the general approach and scope of research projects needed under the BO RPA action 183?
- Do the project specific guidelines bound the projects at an appropriate level?
- Are the statistical design requirements for detectability of effects at an appropriate level for the action categories targeted?
The short answers, to all three questions, are "No," but the ISRP recognizes that the challenge to designing an adequate monitoring program is very great, and the ISRP also recognizes the present document for evaluating effectiveness of actions as an important step in the right direction. Generally speaking, the ISRP was gratified to see that this sort of attention is beginning to be paid to planning for monitoring, and we hope that these discussions will continue until a really adequate overall monitoring design is defined, and detailed guidance and criteria are developed for monitoring proposals. In the interim, if the present document is released, we think its effects will be incrementally positive, but this document does not provide sufficient guidance to ensure generation of the right mix of monitoring proposals, and it does not provide a comprehensive set of criteria for review of such proposals.
The document makes clear that a carefully developed design will be needed in order to generate monitoring data capable of answering the check-in questions required by the BiOp, but the document does not provide that design or explain how such a design will arise. The ISRP is concerned that the premise of the document is that an adequate design may be arrived at by some unspecified, bottom-up process during the course of reviewing and funding many independent projects, whereas we believe that the design requirements, especially with respect to documenting effects on salmon survival, are not likely to be met without strong top-down articulation of a design and strong top-down coordination of its implementation.
It will prove frustrating to all concerned parties if the hope is that the review process by the ISRP will somehow, by itself, provide the top-down coordination. For the next round of reviewing, this would put the ISRP in the position of essentially rejecting all the proposals on the grounds that there is no overarching design. A proper overall design has to be developed first. It would be a good idea for the ISRP to review that design, once a draft is available. When an acceptable design has been agreed upon, then the ISRP can use it as one point of reference in evaluating individual action effectiveness monitoring proposals.
FOUR KINDS OF MONITORING UNDER CONSIDERATION
It must be understood that the contemplated monitoring is intended to provide data to answer four quite distinct kinds of questions, which respectively present quite different kinds of requirements for data quality, design of controls, and variables actually measured. It would be a mistake to attempt a "one size fits all" approach to guidance for the four respective data uses. The four data uses are:
- Determination of salmon survival response to mitigation actions for BiOp check-in;
- Determination of salmon stock status (abundances and survival rates) for BiOp check-in;
- Determination of response of habitat variables to mitigation actions; and
- General monitoring of status and trends in habitat variables.
Use (1) seems to be the highest priority from the perspective of the document under review, though use (2) will also be very important in the decision making process under the BiOp check-in. Uses (1) and (2) are explicitly defined in the BiOp, and the required data quality should be evaluated with reference to the BiOp and in consultation with the office in NMFS that will eventually use the data to arrive at the determinations. Coincidentally, use (1) presents the greatest difficulties for design, and probably presents the highest demands for data quality (as will be discussed below).
Use (3), as described in the present document, apparently is proposed as a surrogate for addressing use (1) when direct answers to the question behind use (1) are not feasible. The motivation for this is not clearly explained. Page 1 states "Because habitat actions may require time beyond the BiOp planning horizon to manifest fish survival effects, there is a need to establish cause-and-effect relationships between tributary actions and physical/environmental effects that may be more immediate." However, the implicit bottom line is still fish survival (and reproduction). The ISRP is aware of the problem and conflict. We have strongly and explicitly encouraged principal investigators on projects to collect surrogate data on habitat variable that may respond more quickly to mitigation actions than the salmon populations themselves. We have done so under a working assumption that salmon will also respond to some of these habitat changes. We have also been explicit that salmon survival data will have to be collected to ground truth the habitat/salmon response linkage. In the end, no amount of cause-and-effect studies relating habitat actions to habitat responses will substitute for cause-and-effect studies relating actions to fish responses or cause-and-effect studies relating fish responses to habitat responses. Without good measurement of salmon survival, the exercise is empty for purposes of substituting use (3) for use (1).
Use (4) is not described at length in the document. This use is implicit, in that there will doubtless be some desire to use available monitoring data to ground truth some of the assumed maps of habitat quality in EDT, and to verify some of the expert opinion that is the basis for the assumed relationships between salmon production and habitat quality in EDT, and to calibrate the relation between habitat variables and salmon survival so as to validate use (3) for purposes of the BiOp check-in.
FURTHER COMMENTS ON DETERMINING SURVIVAL RESPONSE TO MITIGATION, AND DETERMINING STOCK ABUNDANCE AND SURVIVAL
This guidance document treats a data-gathering program that is primarily directed at a specific pair of questions, which have already been articulated by another set of parties who will be the data users. The questions originate in the RPA section of the 2000 Hydro Biological Opinion, where there was a commitment to collect data that would be the basis for determinations, at a 5-year and 8-year check-in, to reassess the status of the listed stocks, and to verify that mitigation activities have had the intended effect. The key quantities, both for the stock status evaluation and for the verification of mitigation are the stage-by-stage survival rates of the salmon.
Presumably, the resolution in survival rate estimation needed for the status evaluation, and the magnitude of the effect that is going to be looked for in survival rate responses to mitigation activities, are spelled out (or at least implied) in the BiOp itself. The authors of the present document should consult with the appropriate office in NMFS to establish what resolution in survival rate estimation is needed in order to comply with the letter and/or spirit of the BiOp. This may be a situation analogous to the Habitat Conservation Plan for the mid-Columbia PUDS, which spelled out exactly the required (and very stringent) resolution for their survival rate estimation to be in compliance.
The present document does not explain why the proposed blanket performance characteristics (20% type I error rate, 0.8 power, for detecting a 10% change in 5 yrs, referencing only the Oregon monitoring plan) should be the appropriate resolution for the specific need of the BiOp check-in. How was this determined? As discussed below, it seems unlikely that these criteria are equally and simultaneously applicable to all indicator variables, including for example sediment particle size, water temperature, fish survival rates, and fish abundance statistics.
Once we establish what resolution is required, by the BiOp, for measuring salmon survival rates for purposes of status evaluation at the check-in, the calculation of the design parameters (how many fish need to be marked, when and where, and how high a detection efficiency in resighting is needed, when and where) to achieve this resolution is a familiar exercise for professional statisticians. Then we would ask, simply, whether there is a design, calculated to deliver that resolution. The ISRP does not believe it reasonable to hope that this design calculation, carried out independently in a flurry of independent projects, will fortuitously converge on the correct overall design. We are fully convinced that an overall design, with the proper characteristics, must be drafted first, for the entire system. This overall design would in effect constitute part of the specification of an RFP, and then individual project proponents could submit their plan for satisfying the design requirements in their proposed piece of the data gathering operation. Some capable group with statistical expertise (perhaps the Paulsen et al. team) should in fact produce the master design that the individual projects can follow.
Once we establish what resolution is required, by the BiOp, for measuring changes in salmon survival rates for purposes of verifying effectiveness of mitigation actions, a much more difficult calculation must be initiated to determine the design parameters that can deliver this resolution. The difficulty here is that the intention is to measure a change in response to a "treatment." In other words, the bottom line is a comparison between a "treatment" and a "control." Because of the known large interannual variation in salmon survival rates, a "before/after" design will not prove effective "control." Therefore, there will be a pervasive need to establish actual control sites with concurrent measurements. This will not be easy either, since there are inevitable site-specific differences. Probably, the most promising general design strategy will be to combine before/after measurements at treatment/control sites, using the "before" baseline measurements at all sites to attempt to factor out the site-specific differences, and then compare the "before/after (in time)" change at the control sites with the "before/after (treatment)" change at the treatment sites. Even so, some care will have to go into the selection of control sites to ensure that there is not too much site specific difference in responses to background temporal variation.
The ISRP does not believe that it is reasonable to expect successful specifications for such a complex and sophisticated design to arise spontaneously from the collective of individual independent projects, guidance or no guidance. Some capable, authoritative, and knowledgeable group is really going to have to take responsibility for drafting an adequate design. Even with such a draft design in existence, it will be an institutional challenge, given the way the Fish and Wildlife Program operates, to ensure implementation. Ultimately, the success of the implementation will depend on the combined results of many sites and many projects, where project A may be making measurements that serve as a control for project B, etc. Considerable top-down planning and management is needed to make this happen. If a technically correct overall design is adopted as part of the specifications of the eventual solicitation, conformity to the design could be an important review criterion that the ISRP could apply to the individual proposals.
When some group is constituted to draft a design for the effectiveness studies, it would be a good thing if the group communicated with the NMFS parties who will be making the "check-in" determinations. Plans for data gathering always benefit from communication between the data providers and the data users.
FURTHER COMMENTS ON DETERMINING HABITAT RESPONSE TO MITIGATION
Use (3), as described in the present document, presents great difficulties with respect to defining the appropriate resolution of estimates. The idea is to measure responses of habitat variables to mitigation, where it is hoped that the habitat response is a predictor of eventual salmon response. So the design must consider both the habitat effect size that is likely to be sufficient to eventually cause a meaningful salmon response and the habitat effect size that is expected to result from the mitigation action.
In this context, it seems quite implausible that the proposed blanket performance characteristics stated in the present document (20% type I error rate, 0.8 power, for detecting a 10% change in 5 yrs) are really the correct resolution for this suite of needs. How could one decide that all variables, from sediment particle size, to water temperature, to fish survival rates, should be measured with the same level of resolution? After all, some variables are inherently more difficult and expensive to measure to a given level of resolution, and some variables will turn out to be much more important than others for answering the actual question. So, one would think that the actual demands for resolution should be subjected to some cost-benefit analysis, variable by variable. Where is the paper trail of that thought process?
FURTHER COMMENTS ON MONITORING HABITAT STATUS AND TRENDS
Because use (4) is the least specific in its data demands, and is in some sense "exploratory," the proposed generic guidelines for resolution stated in the document may be a reasonable starting point, but only a starting point, with respect to this use. The proposed blanket performance characteristics are 20% probability of type I error rate, and 0.8 power, for detecting a 10% change in 5 years. The document references an Oregon monitoring plan as the source of this proposed resolution. In many respects, use (4) is more directed at "effectiveness research" than at "effectiveness monitoring." As such, it should be understood essentially as a Tier III enterprise in the classification scheme of the 2000 BiOp.
We would note that the proposed statement of performance (limits on type I and type II errors) is framed in terms of hypothesis testing, which is not the way the data actually will be used. The data will be used predominantly for estimating effect size, so it might be more natural to state (and justify) the desired resolution in those terms.
Most research analyses traditionally approached by classical statistical testing of a null hypothesis can be formulated alternately in terms of estimation of effects and accuracy and precision of estimates. Mechanical application of classical hypothesis testing is prone to distort design priorities when the real interest is in the size of an effect (see Johnson, 1999). It should be noted also that a study may be planned to achieve a certain expected precision for an anticipated effect size, but the realized precision will depend in part on aspects of an unknown future such as the actual effect size and the actual behavior of confounding variables.
A design that is optimized for one indicator variable might not be optimal for other indicator variables that in fact are equally, or more, important. Effective guidance will have to give more consideration to the spectrum of questions that are being asked, the scientific hypotheses that are current candidates for answers, and the indicators that will be measured to judge among the scientific hypotheses.
Planning for research projects could be conducted in terms of desired precision of estimates of the differences (or ratios) of key indicator variables on study (treatment) sites and reference sites. The ability to estimate the size of an effect, in this sense, may depend on more stringent data quality standards that would be required just to detect a trend.
Because use (4) has an exploratory component, the guidance should offer some room for a variety of possible analytical models and research designs, such as bioequivalence testing, and fitting models with a predictor variable selected along a gradient, and modern methods like Akaike Information Criterion (as described, for example, in Burnham and Anderson 1998) for selecting among alternative predictive models.
The value of gradient designs arises when adequate reference areas are not available. Then a gradient of conditions can be represented in the study, from which a model may be fitted to the indicator variables as a function over the ranges of values of a collection of predictor (independent) variables. And so on.
Bioequivalence testing corresponds to testing the reverse null hypothesis (Dixon and Garrett 1992, Erickson and McDonald 1995, McDonald and Erickson 1994, USEPA 1988, 1989). This approach to hypothesis testing makes much more sense than the standard classical hypothesis testing of a null hypothesis in the current setting. A brief introduction is given here. Successful application requires thorough understanding of the relationships between indicator variables, classification variables, dependent variables, and independent variables.
The Action Agencies (Bonneville Power Administration, United States Army Corps of Engineers, and the Bureau of Reclamation) and the National Marine Fisheries Service (NMFS) wish to provide guidelines for sponsors and reviewers of action effectiveness research projects. Projects are being proposed to ?fix? problems in tributary habitat. In this sense, projects are not research projects with no effect on the environment. Projects are both action management (at least for the study areas) and research. Evaluation is both adaptive management and analysis of research results.
The reverse null hypothesis is: the study area is defective in one or more indicator variables (Table 3 in the draft guidelines) when compared to a reference area or standard value. The alternative hypothesis against which this is tested is: the study area is bioequivalent to the reference area or the standard. The fact that a project is proposed indicates a prima facie belief that the area under study is deficient in one or more of the indicator variables as listed in Table 3 of the document.
For example, the reverse null hypothesis for depth of fines might be: the depth of fines in the study area is greater than 120% of depth of fines in the reference area. This would be tested against the alternative: the depth of fines in the study area is less than 120% of depth of fines in the reference area.
In the example, the implicit definition is given that the study area is bioequivalent to the reference area with respect to this indicator variable if depth of fines is less than 120% of the depth of fines on the reference area. The management action under evaluation in the research project is judged to remedy the deficiency in the study area if the reverse null hypothesis can be rejected.
The advantage of bioequivalence testing is in adaptive management. A management action (or actions) in a research project is judged to be successful if the affected area is bioequivalent to the reference area at the end of the project. Burden of proof is placed on the principal investigators and the analysis, not on the action agencies. The action agencies do not have to interpret significance tests that may be statistically significant but of no practical importance, compared to non-significant results on large effects that may be of extreme importance. Research projects would be judged successful if the treatments result in movement to bioequivalence or toward bioequivalence.
NOTES ON STYLE AND CLARITY
The present document suffers from a certain obscurity, owing both to an excess of jargon and ambiguity about the intended audience. The purported audience is the community of researchers who potentially will submit proposals to the program, but who are not statistical specialists. For that audience, the present document would probably be too theoretical and technical (i.e., not adequate as an instruction manual). Realistically, the present document is more of a white paper, where the actual audience is research planners, administrators, and reviewers such as the ISRP. As a white paper, the document describes some aspects of the nature of the problem, and it presents some ideas about a strategy to address the problem. In that context, the theoretical and technical tone is fairly appropriate, but still some effort could be made to improve clarity.
The document has lots of marginally defined and undefined terminology. Bureaucratic language of the BiOp is used repeatedly without being described in other, more common words. The document refers the reader to Hillman and Giorgi (2002) for help with terms, etc., but this reference is not listed in the Literature Cited section. Is this an internal, un-published document?
CONCLUSION
The ISRP concludes that the draft RME guidance document is a useful first step at developing necessary guidelines for planners, investigators, and reviewers, although in the present form it is too narrow and insufficiently targeted to actual information needs for universal evaluation of action effectiveness research proposals. Such guidance is sorely needed. Incremental revisions will be helpful to focus on the audience(s), objectives, and intended outcomes. It is recommended that this draft be revised in two ways. First, revision as a scoping document for planners and administrators is needed to provide clear top-down guidance that actually stipulates overall design specifications to address the need for collecting data to answer the BiOp check-in questions about effectiveness of mitigation actions on salmon survival. This document would focus on the first two questions posed to the ISRP, with abbreviated reference to the third question (and reference to a second document).
Second, revision as a more methodology-oriented document intended for use in a bottom-up fashion by researchers and technicians, where guidance on alternative methods, statistical approaches and statistical design requirements are given in detail. This document would focus on the third question posed to the ISRP. It would present the scope and guidelines in an abbreviated fashion (with reference to the first document). The ISRP recommends further discussion among authors and potential researchers and the community of data users before consensus on the specific methods is solidified for this second document. The first document is perhaps the most appropriate and serviceable as an accompaniment for a solicitation; the second document would be made available as reference for those preparing and reviewing proposals.
REFERENCES
Burnham, K.P. and D.R. Anderson. 1998. Model selection and inference: A practical information-theoretic approach. Springer-Verlag, NY.
Dixon, P. M., and Garrett, K. A. (1992). Statistical issues for field experimenters. Technical Report. Savannah River Laboratory, University of Georgia, Drawer E, Aiken SC 29802.
Erickson, W.P, and L.L. McDonald. 1995. Tests for bioequivalence of control media and test media in studies of toxicity. Environmental Toxicology and Chemistry14:1247-1256.
Johnson, Douglas H. 1999. The insignificance of statistical significance testing. Journal of Wildlife Management 63:763-772.
McDonald, L.L., and W.P. Erickson. 1994. Testing for bioequivalence in field studies: Has a disturbed site been adequately reclaimed? In Statistics in Ecology and Environmental Monitoring, pp. 183-197 [Eds. D.J. Fletcher and B.F.J. Manly], Otago Conference Series No. 2, University of Otago Press, Dunedin, New Zealand.
U.S. EPA. (1988). Guidance Document for Conducting Terrestrial Field Studies. Ecological Effects Branch, Hazard Evaluation Division, Office of Pesticide Programs, U.S. Environmental Protection Agency, 401 M Street, S.W., Washington, DC 20460.
U.S. EPA. (1989). Methods for Evaluating the Attainment of Cleanup Standards. Volume 1: Soils and Solid Media. Statistical Policy Branch (PM-223), Office of Policy, Planning, and Evaluation, U. S. Environmental Protection Agency, 401 M Street, S.W., Washington, DC 20460.