EXPERIMENTS IN INTERNATIONAL BENCHMARKING OF U.S. RESEARCH FIELDS
Marye Anne Fox and Robert M. White
As a nation, we support a large research enterprise, whose funding must be consistently and fairly allocated. At present, there is no reliable tool to allow for the evaluation of the quality of federally funded research programs and provide a basis for the subsequent allocation of funds to those programs. The National Academies Committee on Science, Engineering, and Public Policy (COSEPUP) has been testing the validity of international benchmarking, comparing the quality and impact of research in one country or region with world standards of excellence, as a possible tool for funding allocation. COSEPUP has run a series of experiments to test the use of benchmarking as a tool to understand the relative world standing of US research in a field and to understand the factors that are critical to US leadership in that field. This article describes the background behind benchmarking and the methodologies and results of COSEPUPs experiments in internationally benchmarking three fields: mathematics, materials science and engineering, and immunology.
In 1993, COSEPUP issued its report Science, Technology, and the Federal Government: National Goals for a New Era, which recommended that the federal government should continue vigorous funding of basic research and to seek to support basic research across the entire spectrum of scientific and technological investigation. Specifically, the report made two recommendations: First, that
"The United States should be among the world leaders in all major areas of science,"
and second, that
"The United States should maintain clear leadership in some major areas of science."
By following these recommendations, the U.S. would position itself among world leaders in all major fields of research and would be ready to apply and capitalize on research advances wherever they may occur.
Two years later, in 1995, a committee (on which I was a member) chaired by Frank Press, the former president of the National Academy of Sciences, stated that "to continue as a world leader, the United States should strive for clear leadership in the most promising areas of science and technology and those deemed most important to our national goals. In other major fields, the United States should perform on a par with other nations so that it is poised to pounce if future discoveries increase the importance of one of these fields."
The committee also considered how the federal government could gauge the overall health of the research enterprise and determine the adequacy of national funding and whether it is supportive of national research objectives. The committee wrote that it is possible to monitor US performance with field-by-field peer assessments, which may be accompanied by:
"...the establishment of independent panels consisting of researchers who work in a field, individuals who work in closely related fields, and research users who follow the field closely [can provide that kind of evaluation]. Some of these individuals should be outstanding foreign scientists in the field being examined."
The technique of comparative international assessments, or what we at COSEPUP came to call "international benchmarking," had been discussed at that time, but had not been practiced. From this, COSEPUP made the decision to undertake a set of experiments to test the utility of international benchmarks in evaluating entire research fields.
The committee acknowledged that quantitative indicators which are often used to assess research programs for example, dollars spent, number of papers cited, and number of researchers supported all provide valuable information, but by themselves are not sufficient indicators of leadership. This quantitative information is often difficult to obtain or compare across national borders, illuminates only a portion of the research process.
COSEPUP decided that benchmarking should rely more prominently on the judgement of experts. The premise for this decision was that only leaders of a particular research field are in a position to judge leadership. COSEPUP charged each panel to provide answers to three primary questions:
The committee deliberately chose three fields that have a range of scope and subject matter. Of the three, mathematics is the closest to being a traditional discipline, but is broad in the sense that it has a language and is a tool used by other research fields. Immunology is not a disciplinary field in the traditional senseit embraces many disciplines including biochemistry, genetics, and microbiology. Like immunology, materials science and engineering have even a broader span of disciplines.
Determining how one country stands relative to another in science involves a great deal of subjective judgement. The first critical step in an international benchmarking process is the selection of panel members. Once selected, each panel divided their field into subfields. Then each panel used a variety of methods to assess their subfields, including:
The first method used was the "virtual congress," where each panel asked leading experts in the field to identify the "best of the best" researchers for particular subfields, anywhere in the world. For example, the panel members of the materials science and engineering panel asked colleagues, for nine different subfields, such as ceramics and polymers, to identify five or six current hot topics and eight to ten of the best researchers in the world. The information was used to construct tables that characterized the relative position of the United States in each of the subfields.
While the "virtual congress" may determine how the U.S. currently stands relative to other countries, additional factors must be considered to predict its future ranking. The materials science and engineering panel developed what it called "determinants of leadership." These are factors that indicate the level of research likely to occur in the future. They include:
National imperatives refer to national objectives, whose byproducts are likely to produce scientific leadership. The Cold War, for example, drove the development of materials for stealth aircraft.
Another method used by the panels was citation analysis, which is traditionally used to evaluate the international standing of a countrys research in a field. Each panel used an analysis by the United Kingdom Office of Science and Technology to evaluate US research quality. This analysis included both the numbers of citations and the "relative citation impact", which compares a countrys citation rate (the number of citations per year) for a particular field to the worldwide citation rate for that field. This latter measure takes into account the size of the US enterprise relative to that in other countries. (See the Web edition of this issue for tables containing some illustrative numerical results. [powerpoint format file])
The immunology panel used a method called "journal publication analysis." The panel identified four leading general journals and one of the top journals that focused specifically on immunology. Panel members analyzed the tables of contents of each of the journals. In the general journals, they identified immunology papers and the laboratory nationality of the principal investigator; and in all the journals, they identified subfields. That allowed a quantitative comparison between publications by US-based investigators and publications by investigators based elsewhere.
The panels found it difficult to obtain suitable, unbiased quantitative information which would allow comparisons of the major features of the scientific enterprise, (e.g. education, funding and government agencies) in different countries. Mathematics was among those exceptions and illuminates the use of human resources as a consistent source quantitative data. Taken from the American Mathematical Society, attachment III shows that the number and proportion of non-US PhD recipients in mathematics increased by 78% from 1985 to 1995. Furthermore, in every year since 1990, foreign students have received more than half the PhDs awarded in mathematics in the United States.
Each of the panels analyzed the key prizes given in its field. For example, in mathematics, the key international prizes are the Fields medal and the Wolf prize. The numbers of non-US and US recipients of these medals were analyzed on the basis of where they now conduct their research.
Another condition that can be quantified is representation among the plenary speakers at international conferences. Although conference organizers strive for geographic balance by inviting speakers from different countries, speakership is a useful indicator of quality by comparing the relative number of US invited speakers to the relative number of publications.
Each panel concluded that the US was at least among the world leaders in its field. However, each panel identified subfields in which the United States lagged the world leaders. Each panel also identified key infrastructure concerns.
The mathematics panel found that although the United States was currently the world leader in mathematics, this leadership was dependent upon foreign talent that came to the United States particularly preceding and during World War II and following the collapse of the Soviet Union. The difficulty of attracting U.S. talent to mathematics today and the unpredictability of recruiting foreign talent led the panel to be concerned about the future leadership status of the United States in this field.
In materials science and engineering research, the U.S. was among the few world leaders in this field. Other countries are more aggressively pursuing research in this field, as it is essential to economic growth. A particular concern is that major facilities are newer in other countries than in the United States, such as neutron sources in Europe and Japan, which are younger than U.S. sources by 20-30 years.
A panel found that the U.S. was the world leader in immunology. Despite the fact that U.S. financial investment in this field overshadows that of other countries, the U.S. was not the leader in all subfields. Of particular concern was clinical immunology since restrictions of managed care systems in the U.S. versus other countries make it more difficult to attract the patients needed for clinical studies.
COSEPUP Analysis of Benchmarking Experiments
After the panels work was completed, COSEPUP evaluated panel results independently along with comments from participants in a workshop, which included White House, congressional staff, and agency staff and disciplinary societies. The reviewers who took part in the National Academies normal report review process were of particular importance, as they are chosen to represent diverse industrial and academic backgrounds.
Based on these deliberations, COSEPUP found international benchmarking to be rapid and inexpensive compared with procedures that rely entirely on the assembly of a huge volume of quantitative information. In the words of one panel chair, "We were able to get 80% of the value in 20% of the time, for a far lower cost."
COSEPUP also found good correlation between findings produced by different indicators. For example, the qualitative judgments of the virtual congress were similar to the results of quantitative indicators, such as publications cited or papers delivered at international congresses. Lending credence to the technique was a parallel benchmarking experiment in mathematics conducted by the National Science Foundation that produced similar results to COSEPUP's math study, despite differences in panel makeup and the mandates of each organization.
The Government Performance and Results Act of 1993 (GPRA) emphasized the need for a method to assess the fruits of scientific and technological research investments by the federal government. COSEPUPs GPRA report in 1999 concluded that the most effective means of evaluating federal research is to provide expert review of the quality, relevance, and leadership of research programs. Some elements of benchmarking may indeed provide useful input to agency strategies to comply with GPRA.
In summary, COSEPUP concluded that this experiment should be regarded as an encouraging first step toward the design of an efficient and reasonably objective evaluation tool. Additional benchmarking exercises could lead to more effective assessment methods, better understanding of the factors that promote research excellence, and better decision-making by those who fund science and technological innovation
Marye Anne Fox
Chancellor, North Carolina State University
Office of the Chancellor Box 7001/A Holladay Hall Raleigh, NC 27695-7001
(919) 515-2191 Fax: (919) 831-3545
: Robert M. White
Laboratory for Advanced Materials Dept. of Materials Science and Engineering
Jack McCullough Bldg., Rm. 349, MC 4045 476 Lomita Mall
Stanford University, Stanford, CA 94305-4045
(650) 736-2152 Fax: (650) 736-1984
Committee on Science, Engineering, and Public Policy (COSEPUP). 1997. International Benchmarking of US Mathematics Research. Washington, DC: National Academy Press.
Committee on Science, Engineering, and Public Policy (COSEPUP). 1998. International Benchmarking of US Materials Science and Engineering Research. Washington, DC: National Academy Press.
Committee on Science, Engineering, and Public Policy (COSEPUP). 1999. International Benchmarking of US Immunology Research. Washington, DC: National Academy Press.
Committee on Science, Engineering, and Public Policy (COSEPUP). 2000. Experiments in International Benchmarking of US Research Fields. Washington, DC: National Academy Press.
Committee on Science, Engineering, and Public Policy (COSEPUP). 2000. Evaluating Federal Research Programs: Research and the Government Performance and Results Act. Washington, DC: National Academy Press.
National Research Council (NRC). 1995. Allocating Federal Funds for Science and Technology. Washington, DC: National Academy Press.
United Kingdom Office of Science and Technology. 1997. The Quality of the UK Science Base. London, UK: Department of Trade and Industry. March.