Altmetrics in Humanities and Social Sciences

Kathleen Fitzpatrick and Rebecca Kennison

30 October 2017

INTRODUCTION

The spread of open digital forms of scholarly communication, combined with increasing institutional pressure to track research “impact,” has encouraged scholars and administrators in the humanities and social sciences to turn their attention to metrics that promise to help in the assessment of research outputs. However, significant concerns have been raised in recent years about the value of traditional metrics in such assessment. For instance, the journal impact factor — as its name would suggest — only measures the impact of a publication as a whole, not the significance of any individual piece of work that it contains. Similarly, citation metrics such as the h-index, while author specific, only reveal a single aspect of the impact a scholar’s work may have, failing to account for the ways that an article moves through digital scholarly networks today. Moreover, citation metrics’ focus on journal-based citations make them particularly inapplicable in the fields within the humanities and social sciences that do not rely on peer-reviewed articles as the primary form of scholarly communication. (On the shortcomings of and potential damage done by traditional bibliometrics in the assessment of scholarship, see Burrows, 2016; de Rijcke and Rushforth, 2015; Gruber, 2014; Haustein and Larivière, 2015. On the specific problems with using such biblometrics in humanities and social sciences, see Archambault et al., 2006; Nederhof, 2006; Nederhof et al., 1989; Pontille and Torny, 2010.)

As a result of the limitations of traditional bibliometrics, a number of alternative metrics systems for measuring research impact have recently gained popularity, especially in science, technology, engineering, and medicine, known collectively as the STEM fields (Hammarfelt, 2014; Hug, Ochsner, and Daniel, 2013; Kousha and Thelwall, 2016; Priem et al., 2010; Thelwall and Delgado, 2015). These so-called “altmetrics” attempt to account not merely for citations of published scholarship in journal-based articles, but also mentions of the work in popular news outlets, inbound links to the work from social media such as Twitter and Facebook, and capture of the work in social bookmarking and citation management systems such as Mendeley and Zotero, and seek to track other factors that collectively indicate the ways that a publication moves across the Internet. While skeptics argue that social media attention does not equal quality, relevance, or impact (Bornmann, 2014; Scott, 2012), and while grave concerns exist regarding the potential uses and abuses of metrics in personnel reviews (Flaherty, 2016; Laudel and Gläser, 2006), promoters of such alternative metrics suggest that they provide new insights into the ways that scholarly work is disseminated by its creators and used by its audiences.

To assess the current state of altmetrics within humanities and social sciences disciplines, this study proposed to develop a taxonomy of the altmetrics tools and measures most widely used by or familiar to researchers and scholars, with the goal of determining the current level of acceptance within the academic community of altmetrics, especially in relation to decisions concerning tenure and promotion. Our sense, in beginning this study, was that we would meet with a fair degree of concern about the effects of applying metrics developed for the sciences to fields that operate with quite different structures through which work circulates. Our hope was that we might provide some guidance for department chairs and deans in humanities and social science fields as they encounter requests for analytic data at the university level.

Traditionally, a white paper such as this one would begin with a thorough literature review. We were dissuaded from doing so in this case by the experience of the Higher Education Funding Council for England (HEFCE), which in 2014–2015 commissioned a review of the current landscape for metrics in research evaluation. As part of that review, a team of researchers primarily affiliated with the Centre for Science and Technology Studies at Leiden University (Sarah de Rijcke, Paul F. Wouters, Alex D. Rushforth, Thomas P. Franssen, and Björn Hammarfelt) conducted an analysis of the extant literature on evaluation practices and the uses and misuses of metrics therein. They noted in the process the difficulties involved in accounting for the full breadth of the literature:

Providing a complete overview of the literature is not feasible for a couple of reasons. First of all, the literature is very diverse. Studies on evaluation systems, evaluation practices, and effects of indicator uses are published in different media, and the preferred outlets are not necessarily always international journals that are covered well by web-based citation databases. The hundreds of sources are spread out over books, edited volumes, articles, reports, and other forms of gray literature that are sometimes relatively inaccessible. Secondly, the relevant literature is scattered over a large number of social science fields, including sociology of science, innovation studies, library and information science, higher education studies, sociology of evaluation, evaluation studies, economics and business studies, medical sociology, science policy studies, research management and innovation, political science, and governance studies. A third hampering factor in presenting a complete overview is the epistemic nature of the evidence presented in the literature. The studies range from surveys and interviews on researchers’ perceptions of evaluations and formal policy analysis of principal–agent relationships to cultural critiques of the evaluation society and ethnographic studies of evaluation in action (an emerging body of work). The resulting heterogeneity of the evidence poses particular challenges in integrating the literature in a single review.

(de Rijcke et al., 2016, 161–162)

De Rijcke et al. opt in their report to present a qualitative literature review that maps the primary issues across the literature rather than striving for completeness. Needless to say, since their report was published, the quantity and diversity of the literature have only expanded, and our own reading of the relevant work supported the soundness of their decision. Moreover, the primary themes that their review uncovered — questions about the effects of the use of metrics on knowledge production and the consequences for their deployment in research assessment — were confirmed both from our reading and from our primary research. As a result, we have opted to refer the interested reader to their study, rather than reinvent that particular wheel.

As de Rijcke et al. note, much of the extant literature connects the rise of new metrics for research impact, including so-called “altmetrics,” to an increase in demands for researcher accountability (see, e.g., studies of the impact of the Research Assessment Exercise/Research Excellence Framework on researchers, including Collini, 2012; Hoecht, 2006). The deployment of metrics in research assessment in many cases steers researchers to become more market-oriented, more instrumentalist, and more privatized (see Leisyte and Dee, 2012; Willmott, 2011). Moreover, assessment systems that affect researchers’ funding or reputations will tend to cause them to shift their goals to focus on the outcomes of the assessment, rather than the purposes of the research, or will otherwise encourage them to revise their processes so as to avoid risk (see Hicks, 2012). De Rijcke et al. express particular concern about these effects in the areas of the curriculum on which our own project most focuses; in particular, they cite studies indicating that the use of metrics in research assessment can hinder interdisciplinary research. They also note the extent to which the arts and humanities, as well as the book-oriented social sciences, suffer when scholars in these disciplines are evaluated based on metrics that have been designed for fields whose research outputs are entirely centered in journals.

These concerns were a primary driver behind the study that follows. We sought a more direct understanding of the state of altmetrics adoption and usage in the evaluation of research in humanities and social science fields, as well an understanding of faculty and administrator perceptions of that usage. Where concerns about the uses of metrics in the humanities and social sciences remain, we also sought to begin an exploration of ways scholars and administrators in the fields we address might seek to provide better forms of articulation of the desired impact of research.

METHODS

Phase 1 of this study included the above review of the literature and of targeted social media outlets. While there is a relatively significant body of literature related to altmetrics in the sciences, there are fewer studies to date looking at disciplines in the humanities and social sciences. To help fill this lacuna, this phase of the study focused on reviewing the work that is currently being done on altmetrics across the disciplines, as well as on gathering discussions of metrics in the humanities and social sciences by sampling social media outlets, including academic blogs and Twitter feeds. We also explored discussions of concerns about metrics in the higher education mainstream press and in professional publications, and we conducted a small focus group with humanities and social science deans designed to surface the questions and concerns they have about metrics and their uses. The aim of this phase of the study was to identify, summarize, and synthesize the current state of altmetrics within the academy; to develop an initial taxonomy of the types of metrics most commonly used or known, whether within STEM, humanities, or social sciences; and to derive from this investigation the questions to be explored in phases 2 and 3.

Phase 2 of the study was originally to consist of in- depth interviews with approximately 10–12 tenured and tenure- track faculty members and academic administrators from a representative sample of North American institutions, including liberal arts colleges and midsized and large public and private universities, evenly distributed between humanities and social science disciplines. These interviews, conducted via online questionnaire, were to be used to validate the taxonomy and further explore the issues and discussions taking place within different types of institutions and different disciplines surrounding the adoption of altmetrics. Results and analysis of the questionnaire were to be used to develop the survey instrument for the third phase of the study. However, our attempts to conduct these interviews were challenged by a surprisingly high lack of response among the initially selected participants. We are uncertain whether this silence had to do with the busyness of the participants or their perception that the subject did not pertain to them. After several follow-ups and in the interest of moving the study forward, we made the decision instead to open up the survey, as part of phase 3, in order to capture the responses of as many interested scholars as possible.

Phase 3 involved an international online survey that sought to provide us with a broader perspective on the use and level of acceptance of altmetrics in the humanities and social science in cases of tenure and promotion. We also sought participation from respondents from a range of kinds of institutions and from as wide a geographical distribution as possible. We invited participation by reaching out to humanities and social science listservs and to social media networks to maximize our response rate. We then used results from this survey to refine the taxonomy of metrics and to provide further data on the issues, use, and acceptance of altmetrics within higher education in the humanities and social sciences within a primarily English-speaking educational environment.

Preliminary Interview

We began our investigation of the ways that metrics in general, and altmetrics in particular, are being used in the evaluation of research productivity in the humanities and social sciences by conducting a one-hour interview with a dean of social sciences at an elite large private research university (“Large Private U”). We expected to hear a recognizable story about the assessment-based pressures toward the quantitative engendered by contemporary university bureaucracies. Instead, this dean quickly reframed our work: this particular institution not only resists modes of assessment that are seen as being overly “bean-counting,” but has avoided developing internal metrics for assessing performance at a range of levels. The dean noted that at Large Private U even basic data are missing and that what data exist are siloed and inconsistently captured. Annual reviews, for instance, rely on narrative self-reports sent first to the chair, and then to the dean, in the form of a Word document. Large Private U’s course management system cannot generate reports containing usable data, and the dean’s financial reports are limited to endowment and gift accounts and faculty research accounts.

When we turned the conversation specifically to the use of metrics or analytics in personnel processes such as tenure and promotion reviews, the dean let us know that Large Private U does not subscribe to a metrics service such as Academic Analytics. Moreover, Large Private U does not require any metrics such as citation counts or h-indexes be included in a tenure or promotion case. Instead, assessment is conducted overwhelmingly qualitatively, and largely in narrative form. Departments are asked to provide a “comparison set” of four or five other scholars in the field against which external reviewers are instructed that a candidate’s work should be compared, though the dean noted that in many fields, particularly the humanities, the reviewers refuse such comparisons.

When asked about the kinds of data that would be useful in these personnel processes, the dean noted a desire to be able to count everything, but at the very same time a concern about where researchers publish, about who is taking leaves, about who is getting grants, and other such information would help the dean determine parity among faculty members. As it is, however, such data are not tracked, and even where they are, the data are inconsistent. Large Private U attempts to track service performed by faculty, for example, but the information is reported in Word documents and summarized in spreadsheets and there is no university-wide view available. As a result, the same “good citizens” get tapped with service requests over and over, without commensurate reward, while other faculty members may do no service at all.

In discussing altmetrics in particular, the dean emphasized the importance of having an ethical apparatus in place before such counts could be considered within personnel reviews, pointing to evidence that scholars of color can experience a backlash on social media to a much greater degree than white scholars. The dean noted recent work in sociology that has suggested that the greatest predictor of “virality” is the density of social networks (Goel et al., 2012). If that is truly the case, the dean noted, significant questions arise about the utility of altmetrics in the assessment of small or specialized fields. Moreover, such research suggests that altmetrics may actually measure more about the networks themselves than about the quality of the material moving through them.

Deans’ Focus Group

Based on the insights gained from this interview, we moved on to conduct a focus group discussion with three deans responsible for humanities and social science departments; one of focus group participants was from a large public land-grant university (“Land-Grant U”), one was from a public regional comprehensive university (“Regional U”), and one was from a small private university (“Small Private U”). We asked each participant to begin by introducing him- or herself and to say a bit about their campus; we then asked a series of six questions designed to structure our discussion:

  1. What kinds of metrics does your institution gather on faculty research productivity and impact?
  2. How are those metrics used, especially within tenure and promotion processes?
  3. Are there metrics that you don’t gather that you wish you did?
  4. What kinds of information about research impact can you imagine being useful to you as you assess individual/departmental performance?
  5. What risks can you imagine in the gathering and use of such information?
  6. Are you familiar with altmetrics? What opportunities and challenges do you see with adding altmetrics to evaluation of research impact?

In our discussion, we discovered, unsurprisingly, that Land-Grant U was the most metrics-attuned of the three institutions. The university has an Academic Analytics subscription, though there is as yet no requirement to use the information provided in personnel processes; currently the information mostly seems to inform discussions between the provost and the deans. At Land-Grant U, each department operates guided by its own bylaws, which mandate the evaluation of research, teaching, and service both for annual review processes and for personnel milestones. In most departments, a committee produces a report for the chair, and the chair writes an evaluation based on that report. The same process is used for annual merit reviews and for promotion and tenure reviews, though the latter are more structured. The dean at Land-Grant U who participated in our discussion has recently instituted a third-year review during a tenure-track faculty member’s probationary period, in part as a means of asking departments to articulate their expectations for junior faculty productivity, given that the departmental level is where much of the challenge for junior faculty lies. Certain traditional expectations about what “counts” for annual evaluations, tenure, and promotion have become entrenched within many departments, this dean admits; while the mission of the land-grant institution leads the dean to want to encourage publicly engaged scholarship that blurs traditional categories, many faculty are hesitant. How to incentivize that category-blurring work is a key question for Land-Grant U.

Small Private U similarly relies on department guidelines for tenure and promotion in determining what information is gathered in review processes. Those guidelines mostly focus on loose, non-quantified expectations about refereed publications; no standardized metrics are gathered. Because of the size of the college in which humanities and social science departments are housed at Small Private U, the bulk of personnel evaluation processes falls to department chairs. Both annual merit reviews and tenure and promotion processes ask chairs to articulate the significance of faculty research productivity. A key challenge at Small Private U is that the annual review process is entirely based in departments, but many faculty have joint appointments or are working under other kinds of interdisciplinary memoranda of understanding. Most department chairs have lately become adept at dealing with this interdisciplinarity, noted this dean, but a few still require more guidance.

The most surprising early discovery in the focus group, perhaps, was that Regional U’s personnel processes are entirely defined by its collective bargaining agreement, which permits no measures of research productivity. The annual review form at Regional U includes sections for research, teaching, and service, but because many kinds of public scholarship resemble service, those two categories can be difficult to separate. When asked how the information, including metrics, that the various departments gather are used in personnel processes, the dean emphasized that the collective bargaining agreement does not allow for the collection of electronic information. Instead, all data must be delivered in a three-ring binder, a format that often prevents the work being submitted from being evaluated thoroughly. The dean at Regional U also expressed some concern about the failures of the collective bargaining agreement to spell out anything concerning new forms of scholarship, resulting in a significant gap between the ways that many faculty are working and the ways that review can be conducted; this is especially the case for digital humanities projects. The good news, the dean noted, is that all forms of publication “count,” regardless of venue or level of formality.

These observations by the dean from Regional U led to a conversation about the technologies of the review process, in particular the three-ring binders of Regional U versus the entirely digital portfolios of Small Private U and the kinds of scholarly artifacts that each fails to account for; while the former misrepresent work that needs to be navigated digitally, the latter cannot really contain books. Land-Grant U does not use either binders or digital portfolios, but instead relies on an electronic form that is filled out by the faculty member and accompanied by supplemental material; that form has sections for the assessment of the department chair and the college dean, and it is transmitted forward through the university hierarchy (university-level review committee, associate vice provost, vice provost, provost). The Land-Grant U administration is apparently working on a revision to this form, sensing that the form and the process are out of date with respect to the kinds of information that are required versus what is being asked, but (as is not unusual in university processes) attempts to revise the form are somewhat contested. The Land-Grant U dean noted that faculty tend to find workarounds for the form’s restrictions; for example, the form does not fully accommodate new kinds of scholarship, but descriptions of that work and its significance can be captured in narrative instead.

Asked whether there were metrics or other forms of information that are not currently captured in these review processes that might be useful, the dean of Regional U noted that it would be helpful to have confirmation of faculty members’ off-campus participation in various events and initiatives; this information is not currently explicitly spelled out as one of the elements to be included in the review binder. The dean of Small Private U was interested in knowing more about the acceptance rates of journals and other related information about publication venues, particularly with respect to the standing of journals and publishers based outside the United States. The dean of Land-Grant U moved immediately to thinking about the difficulty of articulating the impact of scholarly work, noting that even when things are known about a publication venue (the stature of a press, for instance), there is no way of knowing how important a given project is. This dean went on to suggest that an entirely new approach to thinking about promotion and tenure evaluations was needed, which might begin with asking a faculty member to articulate early in their probationary period, and then in reviews annually, their approach to developing dean. That evolving document, which could include a structured work plan, could then be used as the basis for evaluation, rather than individual faculty being held to an often inapplicable “universal” standard. This approach would enable the process of the tenure review to shift from being primarily retrospective to being more projective in orientation; moreover, such a restructured process might also empower faculty, helping them to cultivate habits of thinking holistically about academic life and understanding research, teaching, and service as fully integrated.

Asked about the kinds of information about research impact that might be useful in the assessment of both individual and departmental performance, the deans noted some significant challenges, most notably that the forms of impact that often matter most in an academic life — such as leadership and collegiality — are hard (or at times risky) to quantify, and the things that can be quantified (like book sales figures, citations, and so forth) are not necessarily correlated with quality. (These themes were surfaced time and again throughout our study.) The deans noted their reliance on peer review as an indicator of quality, emphasizing that they want their faculty members to produce conscientious, responsible work but also work that can address a very small field and still be important. The dean from Land-Grant U raised questions about the problematic habits of scholarship that might be cultivated by fetishizing citation metrics.

All the deans were interested in the information about engagement that altmetrics might provide; in particular, the dean from Regional U noted that the institution’s public mission would be served by helping faculty make the case for their impact through information about the hits, comments, and downloads their work has received. The deans from Small Private U and Land-Grant U both noted that such information would be particularly useful for faculty members undergoing review, giving them more tools with which to articulate the impact of their work in the self-evaluation process. Land-Grant U has a project underway designed to help faculty create a curated presence for their work online, which (it is hoped) will enable them to tell a more compelling story about that work and its impact; the dean expected that the conversation about altmetrics on campus would be informed by the faculty’s engagement with that project. All three deans noted the importance of ensuring that faculty receive appropriate training and support in engaging with new forms of online scholarly communication and scholarly communities, but they also stressed the importance of establishing means of engagement that are not terribly time-consuming (as complaints about annual reporting requirements already abound).

When asked whether they saw potential concerns about altmetrics and their uses, the dean from Land-Grant U observed that no metrics, no matter how “objective,” are value-neutral, and that celebrating some metrics over others will inevitably wind up changing scholarly practices. The crucial thing, noted this dean, is to figure out what practices of scholarship enrich the work of scholars and then to figure out how those can be fostered. The dean from Small Private U pointed out that much of the assessment process must be conducted subjectively; departments must make a case for individual achievements that cannot be objectively verified. This dean cautioned us to be realistic about data-driven “fixes”; the complexities of academic lives will always require assessment practices that involve messy work. The other deans seconded that point, and they noted the importance of having deans talk with one another about these processes and the values that underwrite them.

Survey and Survey Takeaways

Based on the discussion with our focus group, we designed a slightly expanded online questionnaire, seeking narrative responses to the six questions listed above from a select group of faculty members at a range of institutional types and points in their careers. However, as noted in the Methods section, our call for participation failed to find traction among these faculty members; of 12 invited respondents, only three completed the questionnaire, and one of those responses was highly abbreviated. We do not know how to interpret the questionnaire’s failure: it could be that the request landed at a challenging moment during the academic year (during the summer break), it could be that more potential participants meant to reply but forgot, or it could be (and this is where our surmise is centered) that many potential respondents simply did not have enough to say about the subject, whether because they did not feel that it applied to them or they did not have enough background knowledge about the subject.

In order to move past that roadblock, we reformulated our questionnaire, transforming it into an open online survey instrument. Rather than seeking open-ended narrative responses, the revised questions provided structured answers, while leaving room for additional comment as desired. We opened this survey on 8 August 2017, announcing it widely through scholarly listservs and social media; we closed the survey after approximately one month.

Our online survey contained 16 questions designed to uncover current and desired practices in both metrics and altmetrics, primarily among humanities and social science administrators and faculty. The complete survey summary is appended to the PDF version of this white paper, which is found on Humanities Commons.

Of the 89 respondents who started the survey, 64 completed it. The vast majority (74.16%) were at doctorate-granting universities, followed very distantly by those at master’s colleges or universities (8.99%), baccalaureate colleges (6.74%), or associate’s colleges (or other two-year colleges) (1.12%); the balance (8.99%) were at a mix of funding agencies, research institutes, high schools, and libraries. The interest from research-intensive individuals is perhaps unsurprising, given the potential impact of metrics on the evaluation of their work. Similarly, the majority of respondents (53.93%) were at very large (10,000+ FTE students) institutions; 19.10% were at large (5,000-9,999 FTE students) institutions, 11.24% at medium-sized (2,000-4,999 FTE students) institutions, 6.74% at small (500-1,999 FTE students) institutions, and 8.99% at very small (<500 FTE students) institutions. Almost all respondents were researchers, whether faculty (64.03%), post-doctoral or graduate students (17.98%), or research fellows (1.12%); 13.48% were administrators; the balance (3.39%) were a mix of staff and librarians. Respondents skewed more male (55.55%) than female (40.74%); 3.71% identified as transgender, agender, or declined to answer. While we encouraged global participation, most respondents were from the United States and Canada (56.18%) or Western Europe (26.97%), with the balance from Australia and New Zealand (6.74%), Eastern Europe (4.49%), Latin America and the Caribbean (2.25%), Southern Asia (2.25%), and Southern Africa (1.12%); we received no responses from the Middle East, Central Asia, Eastern or Southeastern Asia, Northern or Sub-Saharan Africa, or Oceania (other than Australia or New Zealand).

Respondents were drawn from across a range of humanities and social science disciplines. (They were permitted to check all fields they felt applied, as well as to write in others, so totals exceed 100%.) Represented were literature or composition and rhetoric (23.60%), history (21.35%), interdisciplinary studies (16.85%), linguistics and languages (13.48%), philosophy (10.11%), anthropology (6.74%), archeology (6.74%), sociology (6.74%), ethnic and cultural studies (5.62%), visual arts (5.62%), religion (4.49%), organizational studies (2.25%), political science (2.25%), area studies (1.12%), economics (1.12%), geography (1.12%), and psychology (1.12%). Only gender/sexuality studies and performing arts were lacking representation. In addition, respondents included the following in their list of their disciplinary homes: library and information science (7.87%), digital humanities (5.62%), media studies and communications (5.62%), education (3.37%), law (3.37%), architecture and design (2.25%), and, yes, even STEM (3.37%).

In contrast with the responses that emerged from our in-person interview and deans’ focus group, the takeaways from comments made both in our faculty questionnaire and in our anonymous survey expressed fairly uniform negativity toward (alt)metrics, ranging from caution to despair in how metrics and altmetrics are or could be used in the humanities and social sciences:

  • Are we measuring the right things? Several respondents cautioned against using metrics designed for STEM that do not take into account “cultural differences” (especially the “slow, steady uptake of good work”); that are not based on values (“It is important for the deepest values of scholarly research to inform the adoption of the metrics we use”); and that merely reinscribe the “skewing toward prestige, gender, and racial preferences” that are already problematic in the academy. (“In general, it seems to me that many of the objections to longer-standing citation metrics apply to altmetrics as well,” observed one respondent, speaking for many.)
  • How will those measurements be used? Respondents fear that altmetrics may be used “in equally terrible ways as the current research metrics” because “someone will always come up with some inane, petty way to twist the data.” They are concerned as well that “not all labour needs to be measured but it seems that if we don’t measure it it isn’t worth anything in the current system.” Several respondents pointed to currently unrewarded or undercounted work (e.g., “peer reviews done, PhDs delivered, editorships,” etc.) that might be counted more — and criticized a system that privileges publication over other kinds of research output, especially digital forms — but are worried that the stated goal of metrics (“equitable, objective, and predictable” measurements) has never been achieved and that measuring even more will only exacerbate the problem. Altmetrics in particular were seen as “too uneven to be fair.”
  • There are already too many metrics, without enough context or understanding of indicators. Quantity is easy to measure; quality is much more difficult. (“I want more holistic, situated, and contextual institutional policies and practices for hiring and promotion, not another metric.”) Several respondents expressed the desire for better understanding of the metrics we have rather than adding even more poorly understood metrics that would only provide even more “approximate measures of the intended target.” (“How much does, say, a site visit count actually measure engagement? How does one really measure engagement with a scholarly digital project?”)
  • “Metrics are not really about impact, but money.” Several respondents noted that metrics (especially bibliometrics) were developed by publishers to reinforce their own importance but are used by administrators to support hiring and firing on the basis of “data” that are skewed rather than objective. Respondents also wondered about the temptation for scholars to “game the system” in a similar way (“If corporations game their web-gathered metrics, would academics start doing that too?”).
  • More metrics create expanded requirements. “Publish-or-perish” and “productivity” requirements combined with metrics that measure quantity rather than quality have “flooded the bibliosphere” with mediocre publications, which in turn waste researchers’ and promotion and tenure committees’ time and energy, as they must wade through all this material to sort out the good from the mediocre, all in service of questionable ends (“I see the metrics identified … in this survey as mostly about generating more work for overworked colleagues and more opportunities for administrators to blather about ‘data-driven’ decisions”).

Some of the reasons for this skepticism could be seen in the contrast between what is currently being measured and what the respondents wished would be measured.

Not surprisingly, topping the list of what currently “counts” most is the quantity of traditional publications (e.g., books, articles, book chapters) produced (weighted importance on a 5-point scale: 4.72); not one of the 67 respondents who answered that question rated the importance of traditional publications lower than a 3. Grants were also very important in evaluation, both in terms of the amount of funding of the grant awards (4.20) and the number of awards granted (4.12). Of mid-range importance in terms of current evaluation are the rankings of particular publication or presentation venues (e.g., impact factor) (3.73) and information about citations in publications (e.g., bibliometrics, h-index) (3.20). (On this point, one respondent admitted, “We have a list of ‘quality’ publications; annual reviews are primarily based on number of outputs in these outlets.”) Of less importance are the quantity of invited talks (2.98), the quantity of conference presentations (2.74), information about citations or discussions in mainstream media (2.66), the quantity of other research outputs (e.g., datasets, digital projects) (2.55), the quantity of gray literature (e.g., working papers, technical reports) (2.43), and information about citations in policy documents (2.42) Even less important still in terms of what “counts” in current evaluation practices were information about being linked, mentioned, or cited in blogs (1.87); the quantity of blog posts or other non-traditional publications (1.81); information about being linked, mentioned, or cited in online reference managers (e.g., Zotero, Mendeley) (1.81); information about inclusion on syllabi (e.g., Open Syllabus Project) (1.78); information about being linked, mentioned, or cited on social media (e.g., Twitter, Facebook, etc.) (1.75); information about being linked, mentioned, or cited in other online sources (e.g., Wikipedia) (1.74); and information about discussions and comments on post-publication peer-review platforms (e.g., PubPeer) (1.68).

In other words, what respondents reported as counting most in current research evaluations were traditional metrics: the number of publications and citations appearing in highly ranked journals or published by highly esteemed presses and the number and amount of grant awards. Less formal publications in still-recognizable formats (such as gray literature or digital projects) and coverage in mainstream media counted for more than did any of the primary sources for altmetrics scores — blogs, online reference managers, syllabi, social media, online resources such as Wikipedia, post-publication peer review platforms — which all rank at the bottom of the list in terms of their importance in current evaluation recognition and reward, although one respondent noted: “While many of the above are not considered as part of the assessment process, they often lead to outputs that can be counted, such as peer-reviewed articles or conference invitations.”

What metrics on faculty research productivity and impact do respondents wish their institution gathered or used that it presently does not? Topping that list was the quantity of other research outputs (e.g., datasets, digital projects), with 46.15% of the 65 respondents expressing interest in reporting that metric, followed very closely by wishing they could have more information about — and report as part of their evaluation — inclusion on syllabi (44.62%), information about being linked, mentioned, or cited on social media (41.54%), and the quantity of blog posts or other non-traditional publications they produce (40.00%). Several respondents questioned the validity of quantity or “productivity” as a metric and argued that a “qualitative assessment of actual impact” should matter more: “Quality is a far better assessment metric than quantity in most areas of scholarship.” There was disagreement, however, as to how to make that assessment, as respondents seemed to understand the problems inherent in impact factor or citation metrics as a proxy for quality or as indicative of reach: “In some fields too much emphasis is given to a short list of standard journals. This is lazy and has contributed to several negative impacts, including monopolistic business practices.” One respondent pointed in particular to the desire to know how work is taken up by the public (“I would like to know: who finds specific work useful and why, beyond scholars (where citation evidence already speaks to that question”).

Interestingly, nearly one in five respondents had no problem with the metrics being used at their institution; 18.46% responded by saying, “I am satisfied with the metrics my institution currently gathers.”

When asked what kinds of information about research impact respondents could imagine being useful as they report on or assess individual or departmental performance, several bits of information seemed attractive, both a mix of traditional metrics (inclusion in review articles or book reviews [76.92%], citation counts [64.62%], and downloads [63.08%] and of altmetrics (engagement indicators such as press coverage and social media discussions [61.54%] and public interest such as site visits, comments, and bookmarks [60.00%]). Considerably less attractive were inclusion in online bibliographic tools (e.g., Zotero) (47.69%) and pageviews (46.15%), with some questioning what such metrics truly tell us about engagement or quality (“To what extent do ‘viewed/discussed/saved’ serve as true measures of quality? Would these ‘traffic levels’ come to replace qualitative peer analysis?”). Economic impact rated last (23.08%), perhaps not unsurprisingly. That said, considerable skepticism was raised as to how useful any of these measurements might be: “None of these metrics are particularly useful,” noted one respondent, “since most of them can be gamed, and others aren’t meaningful,” with several arguing that simply serving up numbers alone was problematic: “Evaluators should be willing to read,” said one, while another lamented, “There’s really no good measure for the kind of slow, steady uptake of good work.”

These qualms came to the fore when the respondents were asked about challenges and risks of using altmetrics as part of the evaluation process. Most were very concerned about misinterpretation of the numbers by administrators or other faculty (78.13%) and about the unevenness or fairness in comparing small fields (such as classics) with large fields (such as English) (64.06%). Many were concerned, as expressed above, about the potential for gaming the system (60.94%). And many more (perhaps they were thinking of themselves!) noted that resistance by skeptical faculty (62.50%) would mean slow adoption of any kind of altmetrics. Although several respondents had already pointed out that some work takes a very long time to be recognized, considerably less concern was raised by the idea that altmetrics might priviledge “early adoption” over the “long tail” (43.75%). Even less of a worry was the potential for scholars to be harmed in a social media environment (40.63%) — although that was foremost among the concerns raised in our initial interview with the dean from Large Private U and an area we feel is important for further study. Back behind all these concerns, however, continues to be considerable skepticism about altmetrics as a useful tool at all: there is a poorly understood correlation between downloads (easy to measure) and “actual impact” (hard to discern), noted one respondent; another complained that “there’s the potential for nonsense about ‘data-driven decisions’ regarding hiring and firing” to be based on numbers that (said several) are “uneven,” “unreliable,” and “phenomenally bad” but that might nonetheless be subject to “formulaic interpretation” in comparison to STEM fields, which would in turn be “used to devalue our contributions [in humanities and social science fields] even further.”

Even where inclusion of altmetrics into a research evaluation portfolio was seen to provide some opportunities for humanities and social science scholars — by tracking engagement with a broader audience (70.31%), opening up opportunities to talk about impact outside the academy (e.g., with industry) (65.63%), telling a more nuanced story about research (60.94%), or telling a more immediate story (53.13%) — the negatives remain strong. “I’m skeptical,” admitted one respondent. “[I’m] [n]ot sure that altmetrics has to be part of this — [public scholarship] can happen quite well without altmetrics, as professors have always engaged with the public in various ways, well before altmetrics,” noted another. “There are … some dangers here in terms of quantity versus quality,” warned a third, while a fourth worried, “Few academics are well prepared critically to assess the kinds of ‘traffic’ data that we can currently gather from online engagement with scholarly resources, so it is easy to imagine false conclusions being drawn either way.”

DISCUSSION

The disparity in responses between the deans, to whom we spoke synchronously, and the researchers, whose responses were mediated by the questionnaire and the survey, may in part be due to that difference in mode of communication; because we could not press on the responses given in the survey by asking follow-up questions or requesting clarification, neither the respondents’ thought processes nor our own investigative strategies were able to evolve. But there is of course a preceding difference in perspective to be considered as well: the deans have a primary responsibility to the institutions and administrative processes they serve, and they are therefore more likely to understand the evaluations that metrics facilitate to be inevitable. The researchers, who are subject to those processes of evaluation but operate in primary service to their own individualized research areas, are far more likely to see the use of metrics as both a limitation and an imposition and thus to resist their premises. This kind of disparity between administrators and faculty would surface in a study of any number of processes and structures of the modern university, and so it does not necessarily tell us anything terribly specific about altmetrics per se.

That having been said, both the conversations we had with the deans and the information we gathered from the questionnaire and survey respondents lead us to note the importance of balancing those perspectives. On the one hand, it is clear that metrics will be applied to humanities and social science fields, irrespective of the desires of those working in those fields. On the other hand, research demonstrating the ways that existing metrics — including altmetrics — fail to account for the forms of impact that scholars in the humanities and social sciences consider important in the evaluation of their work presents an opening for those scholars to design and implement new metrics of their own. It is only in taking on such an active role in articulating and defining scholarly values that scholars might ensure that what academic assessment processes measure are worth measuring and that what is subsequently being rewarded is what should be.

Moreover, there is a question to be raised about measurement itself, and whether — especially in fields where work is deeply non-quantitative — it is the wrong approach to assessing scholarly work tout court. As Stefan Collini (2017) has recently noted, developing “a different way of judging value” in scholarly work requires beginning with the understanding that “it has to be judged and not measured ” (xii). That process of exercising judgment about scholarly work and its impact requires not only a close engagement with the work itself but also a knowledge of the community with which it interacts. For this reason, academic personnel processes have long relied upon the assessment conducted by specialists within the field, who are able both to present their readings of the work under review and to contextualize the work within its community of practice. In this, we see a mode of assessment that both the survey and the focus group highlighted as particularly important in qualitative fields, in which narrative itself becomes an analytic form. Both the deans and a large percentage of the survey respondents noted their interest in the ways that scholars might be empowered to “tell a story” about their research and the importance of doing so beyond the walls of the academy, and both groups similarly pointed to the need for evaluators to understand the context within which that story is being told.

All of which is to say: it is no accident that the altmetrics provider ImpactStory has chosen a narrative metaphor in naming their service, which “helps researchers explore and share the online impact of their research” (ImpactStory). The numbers that metrics provide can serve as elements of that story, as evidence supporting claims about its impact, but they cannot stand in for the narrative itself, or for the key bits of contextualization and interpretation that only narrative can provide. What would best serve the evaluation of scholarship in the humanities and social sciences is likely not more — or even more accurate — metrics, but instead better ways of describing what impact really looks like in those fields, how scholars create that impact, and how that impact might best be cultivated.

Two projects are currently at work on these issues, taking very different approaches to thinking about the need for richer, more qualitative understandings of the value of research in these fields. The Quality and Relevance in the Humanities (QRiH) project is intended to facilitate compliance with the Standard Evaluation Protocol (SEP) implemented in the Netherlands, but it does so by providing a tool that helps researchers to conduct a self-assessment that “takes the form of a narrative in which research efforts and results, including its societal impact, can be described in relation to one another” (QRiH: SEP evaluation). The instrument brings together the indicators that are required to be reported by the assessment panels with that of the contextualization and explication provided by the researchers themselves, thus potentially leading to a richer potential evaluation framework for scholarship that is not adequately represented by available metrics. The QRiH was launched in June 2017; we hope that an evaluation of this project’s outcomes might help guide other organizations and institutions as they consider the ways that metrics are used in their own evaluation protocols.

Operating from a somewhat different perspective is the Humane Metrics in Humanities and Social Science (HuMetricsHSS) initiative. This initiative, begun in 2016 at the Triangle Scholarly Communication Institute and funded through the end of 2018 by the Andrew W. Mellon Foundation, is working to rethink the indicators of excellence in humanities and social science research from the ground up, attempting to “create and support a values-based framework for understanding and evaluating all aspects of the scholarly life well-lived and for promoting the nurturing of these values in scholarly practice” (HuMetricsHSS: About). The idea behind HuMetricsHSS is to “reverse engineer” evaluation by starting not (as is so often the case) by what can measured technologically but rather by asking scholars themselves to identify the practices of scholarship that enrich their own work and that connect it to the broader public and then to develop metrics that can properly reflect and enhance those values. The initiative is conducting a series of workshops designed to interrogate the original values proposed by the research team — collegiality, community, equity, openness, and quality — and expand that list to encompass other core values in the academy and to begin to develop “indicators of excellence” for two scholarly objects (syllabi and annotations) that are not currently rewarded. The first workshop was held 5–7 October 2017. (For insight into the process and a thoughtful analysis of the outcomes of that workshop, see Trott.) Further workshops will be held in March and November 2018 and will invite further input by scholars into the process and practice of developing “humane metrics.”

Projects such as these may point the way toward more generative modes of assessment, relying as they do on analytics that do not simply turn a quantitative lens from the journal-level to the article-level or from a singular focus on citations to a broader examination of the ways that the products of scholarly research move through academic and public networks. Both of these shifts have been important in the development of richer, alternative metrics for understanding the impact of research. But neither really admits the more qualitative modes of evaluation that form the core methodologies of most humanities and many social science fields. Metrics, in whatever form, can only provide certain kinds of evidence, which must be contextualized and interpreted to have meaning, and that work of contextualization and interpretation requires, as Collini (2017) has noted, not only measurement but judgment. We would add that it also requires narrative, the ability to tell a story about the progress of a career, the values that underwrite it, and the goals that determine its own particular markers for success. QRiH and HuMetricsHSS, in different ways, present means of refocusing assessment practices on values and goals — on the qualitative and the narrative. But much work still needs to be done. It is our hope that many others will take up the challenge of developing approaches that move beyond merely what can be counted to instead reward what should truly count.

ACKNOWLEDGMENTS

This study was funded by an Open Society Foundations Information Program Grant, awarded to the Modern Language Association. We are grateful to all those who participated in this study: the deans who met with us in person, the faculty and administrators who completed our initial questionnaire, and those scholars from around the world who took the online survey and provided us with such thoughtful feedback.

REFERENCES

Archambault, Éric, Étienne Vignola-Gagne, Grégoire Côté, Vincent Larivière, and Yves Gingras. “Benchmarking scientific output in the social sciences and humanities: The limits of existing databases.” Scientometrics 68.3 (2006): 329–342.

Bornmann, Lutz. “Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics.” Journal of Informetrics 8.4 (2014): 895–903. [Preprint]

Burrows, Roger. “Living with the h-index? Metric assemblages in the contemporary academy.” The Sociological Review 60.2 (2012): 355–372. [Preprint]

Collini, Stefan. Foreword. The Slow Professor: Challenging the Culture of Speed in the Academy. By Maggie Berg and Barbara K. Seeber. Toronto: U of Toronto P, 2017. ix–xiii. (ISBN: 978-1442645561)

Collini, Stefan. What Are Universities For? New York: Penguin, 2012. (ISBN: 978-1846144820)

de Rijcke, Sarah, and Alexander Rushforth. “To intervene, or not to intervene; is that the question? On the role of scientometrics in research evaluation.” Journal of the Association for Information Science and Technology 66.9 (2015): 1954–1958. (doi:10.1002/asi.23382)

de Rijcke, Sarah, Paul F. Wouters, Alex D. Rushforth, Thomas P. Franssen, and Björn Hammarfelt. “Evaluation practices and effects of indicator use — A literature review.” Research Evaluation 25.2 (2016): 161–169. (doi:10.1093/reseval/rvv038)

Flaherty, Colleen. “Refusing to be measured.” Inside Higher Education, 11 May 2016. Web. Accessed 15 May 2016.

Goel, Sharad, Duncan Watts, and Daniel G. Goldstein. “The structure of online diffusion networks.” EC 2012: Proceedings of the 13th ACM Conference on Electronic Commerce (2012): 623–638.

Gruber, Thorsten. “Academic sell out: How an obsession with metrics and rankings is damaging academia.” Journal of Marketing for Higher Education 24.2 (2014): 165–177. [Preprint]

Hammarfelt, Björn. “Using altmetrics for assessing research impact in the humanities.” Scientometrics 101.2 (2014): 1419–1430. [Preprint]

Haustein, Stefanie, and Vincent Larivière. “The use of bibliometrics for assessing research: Possibilities, limitations and adverse effects.” Incentives and Performance: Governance of Research Organizations. Ed. Isabell Welpe, Jutta Wollersheim, Stefanie Ringelhan, and Margit Osterloh. New York: Springer, 2015. 121–139. [Preprint]

Hicks, Diana. “Performance-based university research funding systems.” Research Policy 41.2 (2012): 251–261. (doi:10.1016/j.respol.2011.09.007)

Hoecht, Andreas. “Quality assurance in UK higher education: Issues of trust, control, professional autonomy and accountability.” Higher Education 51.4 (2006): 541–563. (doi:10.1007/s10734-004-2533-2)

Hug, Sven E., Michael Ochsner, and Hans- Dieter Daniel. “Criteria for assessing research quality in the humanities: A Delphi study among scholars of English literature, German literature and art history.Research Evaluation 22.5 (2013): 369–383.

HuMetricsHSS. About. Web. Accessed 20 Oct. 2017.

ImpactStory. About. Web. Accessed 20 Oct. 2017.

Kousha, Kayvan, and Mike Thelwall. “Can Amazon.com reviews help to assess the wider impacts of books?” Journal of the Association for Information Science and Technology 67.3 (2016): 566–581. [Preprint]

Laudel, Grit, and Jochen Gläser. “Tensions between evaluations and communication practices.” Journal of Higher Education Policy and Management 28.3 (2006): 289–295. [Preprint]

Leisyte, Luidvika, and Jay R. Dee. “Understanding academic work in a changing institutional environment.” Higher Education: Handbook of Theory and Research, Vol. 27. Ed. John C. Smart and Michael B. Paulsen. Dordrecht: Springer, 2012. 123–206. (doi:10.1007/978-94-007-2950-6_3)

Nederhof, Anton J. “Bibliometric monitoring of research performance in the social sciences and the humanities: A review.” Scientometrics 66.1 (2006): 81–100. (doi:10.1007s11192-006-0007-2)

Nederhof, Anton J., Rolf A. Zwaan, Renger E. De Bruin, and P. J. Dekker. “Assessing the usefulness of bibliometric indicators for the humanities and the social and behavioural sciences: A comparative study.” Scientometrics 15.5 (1989): 423–435. (doi:10.1007/BF02017063)

Pontille, David, and Didier Torny. “The controversial policies of journal ratings: Evaluating social sciences and humanities.” Research Evaluation 19.5 (2010): 347–360. [Preprint]

Priem, Jason, Dario Taraborelli, Paul Groth, and Cameron Neylon. Altmetrics: A Manifesto, 26 Oct. 2010. Web. Accessed: 15 May 2016.

Quality and Relevance in the Humanities (QRiH). SEP Evaluation. Web. Accessed: 5 Oct. 2017.

Scott, Nick. “Altmetrics are the central way of measuring communication in the digital age but what do they miss?” Weblog post. The Impact Blog. The London School of Economics and Political Science, 17 Dec. 2012. Web. Accessed: 15 May 2016.

Thelwall, Mike, and Maria M. Delgado. “Arts and humanities research evaluation: No metrics please, just data.” Journal of Documentation 71.4 (2015): 817–833. [Preprint]

Trott, Adriel. “HuMetricsHSS: Can (should) we develop humane metrics for the humanities?” Weblog post. The Trott Line. 8 Oct. 2017. Web. Accessed: 27 Oct. 2017.

Willmott, Hugh. “Journal list fetishism and the perversion of scholarship: Reactivity and the ABS list.” Organization 18.4 (2011): 429–442. (doi:10.1177/1350508411403532)