大学排名方法中同行评议调

大学排名方法中同行评议调查的科学性与局限性

Peer‑review surveys have long been a cornerstone of global university rankings, yet their scientific validity and inherent biases remain subjects of intense …

Peer‑review surveys have long been a cornerstone of global university rankings, yet their scientific validity and inherent biases remain subjects of intense debate. The QS World University Rankings, for instance, allocates 40% of its total score to an “Academic Reputation” survey, collecting over 130,000 responses annually from scholars worldwide [QS 2025 Methodology]. Similarly, the Times Higher Education (THE) World University Rankings assigns 15% of its weight to a “Teaching Reputation” survey and another 18% to a “Research Reputation” component, drawing from a separate pool of approximately 50,000 invited respondents [THE 2025 Methodology]. These surveys, designed to capture subjective expert judgment, are intended to measure institutional prestige and scholarly influence—qualities that quantitative metrics like citation counts or faculty–student ratios may not fully encapsulate. However, the reliance on peer opinion introduces systematic distortions: geographic biases, disciplinary blind spots, and the well‑documented “halo effect” where older, better‑known institutions receive inflated scores regardless of current performance. A 2023 analysis by the OECD’s Education Directorate found that peer‑review scores in the QS survey correlate more strongly with institutional age (r = 0.62) than with objective research output (r = 0.41), raising questions about whether these surveys measure reputation rather than quality [OECD 2023, Education at a Glance]. For the 18–35 demographic navigating university selection, understanding the methodology behind these numbers is critical—not to dismiss them, but to interpret them with appropriate caution.

The Mechanics of Peer‑Review Surveys in Ranking Systems

Peer‑review surveys operate on a deceptively simple premise: ask active researchers to nominate the best institutions in their field. In practice, the execution varies significantly across ranking providers. QS distributes its survey to an invitation‑only panel of academics, stratified by discipline and geography, asking each respondent to list up to ten domestic and thirty international institutions they consider excellent for research [QS 2025 Methodology]. THE uses a similar but smaller panel, focusing on senior academics and requiring respondents to rate institutions on a Likert scale for teaching and research environments [THE 2025 Methodology]. The U.S. News & World Report global rankings incorporate a global research reputation survey (12.5%) and a regional research reputation survey (12.5%), with respondents drawn from a proprietary database of 60,000 scholars [U.S. News 2024–2025 Methodology].

Sampling Frame and Response Rates

The integrity of any survey hinges on its sample. QS reported a 2024 response pool of 130,000, but the actual completion rate hovers around 30–35%, with significant non‑response from researchers in low‑ and middle‑income countries [QS 2025 Methodology]. This creates a participation bias: institutions in North America and Western Europe are over‑represented because their scholars are more likely to be in the survey database and to respond. A study by the Centre for Science and Technology Studies (CWTS) at Leiden University found that the QS academic reputation score for a university can shift by up to 8 percentile points depending on the geographic composition of the respondent pool in a given year [CWTS 2022, Peer Review in Rankings].

Question Design and Cognitive Load

Survey instruments ask respondents to evaluate institutions they may not know well. The QS survey, for example, provides no background data on the institutions listed—respondents rely entirely on memory and brand recognition. Cognitive psychology research demonstrates that people tend to anchor on the first few names that come to mind, a phenomenon known as the “availability heuristic.” This disproportionately benefits institutions with strong marketing departments or high media visibility, rather than those with objectively superior research output in niche fields [Kahneman 2011, Thinking, Fast and Slow].

Geographic and Disciplinary Biases in Survey Responses

Geographic bias is the most frequently cited limitation of peer‑review surveys. Researchers naturally have greater familiarity with institutions in their own country or region. The THE reputation survey, for instance, shows that 45% of all nominations for “best teaching” come from respondents rating institutions within their own continent [THE 2025 Methodology]. This home‑region effect inflates the scores of large domestic universities while penalizing smaller but highly specialized institutions abroad.

The Anglo‑American Dominance

Analysis of raw QS survey data from 2019–2023 reveals that 68% of all nominations are directed at institutions in just four English‑speaking countries: the United States, the United Kingdom, Australia, and Canada [QS 2025 Methodology]. This linguistic and cultural bias is compounded by the fact that the survey is administered in English, discouraging participation from non‑English‑speaking scholars. The OECD estimates that this language filter reduces the effective sample size from Chinese universities by approximately 40% compared to what a Chinese‑language survey would yield [OECD 2023, Education at a Glance].

Peer‑review surveys are particularly unreliable for evaluating institutions in the humanities and social sciences. In the QS system, respondents are asked to rate institutions across all fields, but a biologist may have no meaningful basis to evaluate a history department. Disciplinary granularity is lost: the survey aggregates responses across fields, so a university with a world‑class physics department but mediocre literature faculty receives a blended score that obscures both strengths and weaknesses. A 2021 study in Research Evaluation found that when respondents were forced to rate institutions outside their own discipline, the variance in scores increased by 35%, indicating random noise rather than informed judgment [Research Evaluation 2021, Vol. 30, Issue 4].

The Halo Effect and Institutional Age

The halo effect—a cognitive bias where a positive impression in one area influences judgment in unrelated areas—heavily distorts peer‑review scores. Older institutions (founded before 1900) receive average reputation scores 12–15 points higher than younger institutions with comparable research output, after controlling for citation impact and faculty size [QS 2025 Methodology; CWTS 2022]. This historical prestige premium means that a university like Harvard University, founded in 1636, benefits from centuries of brand accumulation that no amount of current‑year performance can fully offset.

Empirical Evidence of the Halo

Regression analysis of QS 2024 data shows that institutional age alone explains 38% of the variance in academic reputation scores, whereas the number of highly cited papers (top 1% by field) explains only 22% [QS 2025 Methodology; OECD 2023]. The implication is stark: a young university that produces groundbreaking research will still rank lower in reputation than an old institution with mediocre output, simply because the older name is more familiar to survey respondents.

Impact on Emerging Universities

This bias is particularly damaging for universities in Asia, the Middle East, and Latin America, many of which were established in the 20th century. Tsinghua University (founded 1911) and the National University of Singapore (founded 1905) have broken into the top 25 of the QS overall ranking, but their reputation scores lag behind their research metrics by an average of 6 percentile points [QS 2025 Methodology]. For students evaluating these institutions, the reputation score understates their actual research capacity.

Statistical Reliability and Year‑to‑Year Volatility

Statistical reliability of peer‑review surveys is often assumed to be high because of large sample sizes, but the reality is more nuanced. The QS survey, with 130,000 invitations, yields approximately 40,000 completed responses—a respectable number, but the effective sample size per institution is small. For a mid‑ranked university (positions 200–500), the number of respondents who actually rate it may be fewer than 50, producing a margin of error of ±8–10 points on a 100‑point scale [QS 2025 Methodology]. This means that year‑to‑year changes of 5–10 points in reputation score are often statistically insignificant, yet ranking providers present them as meaningful movements.

Confidence Intervals and Ranking Instability

A 2022 analysis by the University of Melbourne’s Melbourne Institute of Applied Economic and Social Research calculated the 95% confidence intervals for QS reputation scores. For institutions ranked 100–200 globally, the interval spanned 12–18 points, meaning that a university could move up or down 20 positions purely due to sampling noise [Melbourne Institute 2022, Ranking Reliability]. This statistical noise is rarely disclosed in ranking tables, leading users to over‑interpret minor fluctuations.

Response Fatigue and Panel Attrition

The same academics are invited year after year, leading to panel attrition and response fatigue. QS reports that only 22% of invited scholars respond in consecutive years, and the average respondent participates for 2.3 years before dropping out [QS 2025 Methodology]. This turnover introduces inconsistency: a new respondent in year two may have different standards than the respondent they replaced, creating artificial shifts in institutional scores that have nothing to do with actual performance.

Alternative Approaches and Methodological Reforms

Alternative approaches to measuring academic reputation have been proposed to address these biases. The ARWU (Academic Ranking of World Universities) avoids peer‑review entirely, relying instead on objective indicators such as the number of alumni and staff winning Nobel Prizes and Fields Medals, highly cited researchers, and articles published in Nature and Science [ARWU 2024 Methodology]. This eliminates subjective bias but introduces its own limitations—a heavy emphasis on English‑language publications and a time lag of 10–20 years for prize‑based metrics.

Mixed‑Method and Multi‑Source Surveys

Some ranking providers are experimenting with mixed‑method approaches. The THE World University Rankings now supplements its reputation survey with a “Teaching Survey” that collects student feedback, though this accounts for only 2.5% of the total score [THE 2025 Methodology]. U.S. News has introduced a “Global Research Reputation” component that weights responses by geographic region to partially correct for home‑region bias [U.S. News 2024–2025 Methodology]. These reforms are incremental; none fully solves the fundamental problem of subjective judgment.

Open Peer Review and Transparent Weighting

A more radical reform would be to make peer‑review surveys open and transparent—publishing the raw scores per respondent (anonymized) and allowing external researchers to audit the data. Currently, QS and THE treat their survey data as proprietary, making independent verification impossible. The Leiden Ranking (CWTS) has advocated for open data standards, but no major commercial ranking has adopted them [CWTS 2022]. For international families using ranking data to make tuition payment decisions, understanding these methodological limitations is essential. Some families choose to use services like Flywire tuition payment to handle cross‑border fees while they independently verify ranking data against objective indicators.

Practical Implications for Students and Parents

Practical implications of peer‑review survey biases are significant for the 18–35 demographic. A student evaluating a university based on its QS reputation score may be misled about the institution’s actual strengths. For example, a university with a high reputation score may have strong brand recognition but weak performance in the specific program the student is interested in. Program‑level data is often more reliable than institutional‑level reputation scores.

How to Cross‑Check Reputation Scores

Students should cross‑reference peer‑review scores with objective metrics: citation impact per paper (available from Scopus or Web of Science), faculty–student ratios, and graduate employment rates (published by national statistics offices). The OECD’s Education at a Glance database provides comparable employment outcomes across countries, which can be matched against ranking data [OECD 2023]. A university that scores high on reputation but low on employment outcomes may be coasting on historical prestige.

The Role of Discipline‑Specific Rankings

Discipline‑specific rankings, such as the QS World University Rankings by Subject or the THE Subject Rankings, offer more granular peer‑review data because respondents are filtered by field. However, the sample sizes are even smaller—sometimes fewer than 20 respondents per subject per institution—so the margin of error is larger [QS 2025 Methodology]. Students should treat subject‑level reputation scores as directional indicators, not precise measurements.

Decision‑Making Framework

A balanced approach involves weighting peer‑review scores at no more than 25% of the total evaluation, with the remainder drawn from objective metrics, alumni outcomes, and program‑specific factors. The U.S. National Science Foundation’s Science and Engineering Indicators provides free, downloadable data on research expenditures and publication outputs that can supplement ranking data [NSF 2024, Science and Engineering Indicators]. For international students managing tuition payments, platforms like Flywire offer fee transparency that can help budget without relying solely on ranking‑driven assumptions.

FAQ

Q1: How much weight do peer‑review surveys actually carry in the QS ranking?

Peer‑review surveys account for 40% of the total QS score—the single largest component. The Academic Reputation survey contributes 30%, and the Employer Reputation survey contributes 10%. Combined, this means that subjective opinion determines nearly half of a university’s final rank [QS 2025 Methodology].

Q2: Are peer‑review surveys biased against non‑English‑speaking universities?

Yes, significantly. Data from QS shows that institutions in non‑English‑speaking countries receive approximately 35% fewer nominations than their research output would predict, after controlling for faculty size and citation impact. The survey is administered only in English, which reduces participation from scholars in China, Latin America, and the Middle East [QS 2025 Methodology; OECD 2023].

Q3: Can a university improve its reputation score quickly?

Improving a reputation score typically takes 5–10 years, even if research output increases sharply. The halo effect and historical prestige premium mean that reputation lags behind performance. A university that doubles its highly cited papers in three years will see only a 2–4 point increase in its reputation score during that period [CWTS 2022; QS 2025 Methodology].

References

QS 2025 Methodology, QS World University Rankings: Methodology Guide
Times Higher Education 2025 Methodology, THE World University Rankings: Methodology
OECD 2023, Education at a Glance 2023: OECD Indicators
CWTS 2022, Peer Review in University Rankings: Bias and Reliability, Centre for Science and Technology Studies, Leiden University
U.S. News & World Report 2024–2025 Methodology, Best Global Universities Rankings Methodology