There is a deluge of research papers; it is impossible to keep up. Everyone knows this and tackles it in their own way — plowing either systematically or randomly through as many new publications as they can, or giving up and just searching for papers when needed. I think it’s important to be aware of new work, and to encounter ideas and approaches that are outside of one’s normal field of view. Unexpected papers seed new ideas, and finding them is often exciting. I try to be systematic about this, and a few months ago I decided to keep track of how many announcements of new papers I get — I’ve never actually counted before. As with other counting or time management exercises I’ve done, the results are somewhat dismal. There’s no actual conclusion I reach from this exercise, but perhaps it will be useful to people trying to figure out how to manage their reading, or to new students unaware of the landscape of science.
There are a lot of papers
Before getting to my own counts, I’ll briefly point to the evidence that there are a lot of papers out there, in case anyone is unaware. Over 2.5 million scientific papers are published per year (source; see also this), and the number grows exponentially with growth rate of over 3%. By other measures the present growth rate is 8-9%, corresponding to a doubling time of total scientific output of about 9 years. Driving this is a massive increase in the number of scientists. By one estimate, 90% of all the scientists that ever lived are alive today.
Perhaps this means that this is truly a golden age, but an unpleasant consequence is that keeping up with all this output is exhausting, impossible, or both. For some of the broad fields I’m involved in, here are some graphs of PubMed counts for articles published each year for a few relevant title/abstract searches:
The graph above shows “big” topics, like “gut microbiome” (5000 papers/year). Of course, most of these papers are not very relevant. Even if we decide to focus more narrowly on topics that intersect some of my lab’s projects, the numbers are large:
We build and use light sheet fluorescence microscopes, for example (> 150 papers/year) and are exploring immune system dynamics using zebrafish as a model organism (> 200 papers/year), to list just two of many subjects for which it would be good to be aware of insights made by others, just as I hope other people will be aware of our lab’s findings.
There are many ways to become aware of new papers: recommendations from colleagues, keyword alerts from various databases, and specific searches, for example. Keyword alerts and specific searches are great for diving into a topic, but they rarely lead to big surprises — one does, after all, know what one is searching for.
Random papers
Beyond narrow searches, one wants to be aware of new, unexpected, or unfamiliar things. There’s a lot of fascinating stuff out there, and running into it is also useful — it spurs new ideas for experiments and projects, and even new research directions. I’m happy that I’ve switched fields several times, and that I work on things like biophysical studies of the gut microbiome that are very underexplored, neither of which would have happened if I hadn’t read random things. (I similarly advocate for random library book reading.)
To come up with new ideas and to get a broad sense of what’s out there, an approach I use is to scan the Tables of Contents of various journals. I subscribe to the email-delivered forms of several of these. Of course, most of these papers are uninteresting, but there are the occasional gems. How many articles are there? I decided to keep track for two months.
From July 9 through September 10, I logged the number of articles in each of the emailed Tables of Contents I received, as well as the number of abstracts I actually read and the number of papers I actually read. The journals were mostly general ones (Science, Nature, PNAS (physical and biological science sections), eLife, Cell, Langmuir, Biophysical Journal, and Nature Methods) along with the preprint server bioRxiv (Biophysics, Ecology, Microbiology). There’s a bit of double-counting, since some of these send notices of accepted papers that also are later listed in the normal tables of contents. The total number of titles I read was 2995 (i.e. 1500/month), of which I read 204 of their abstracts (100/month), and 97 full papers (50/month). (“Read” in the context of full papers is anything between reading the intro and figures to actually reading the full paper.)
bioRxiv makes up 40% of the papers hitting my inbox, though only 25% of the abstracts I read. Nature Methods had the highest fraction of papers I actually read, 15% compared to the overall average of 3%.
Lessons?
As mentioned, I don’t think there are any deep lessons from this exercise, though I’m glad I did it. I had a vague sense that there are a lot of papers in the journals I try to pay attention to, but I wouldn’t have accurately estimated beforehand how large the number is.
I didn’t keep track of time spent, but I would guess that the titles in the Tables of Contents take several minutes per day to read. This isn’t much, but I find it takes a lot of focus, and since each article I click the abstract of will take up more time, I often find myself dreading the arrival of a new Table of Contents in the mail.
About reading entire papers: added on to the “random” 50 or so per month are of course others that I specifically search for. In general I read fewer papers than I would like to, and fewer than I think I need to — there’s nowhere near enough time.
As noted, the flow of new papers is only getting larger, so keeping up will be even more impossible. Everyone I’ve met agrees with this. In countless conversations, every active scientist states that it’s impossible to keep up with the literature. As a result, we don’t. As this gets worse, articles will simply become background noise like the conversations in a cafe that we tune out unless particular words strike us, or unless we particularly pay attention to a particular voice. This isn’t good — science succeeds because it builds on prior work, and unawareness of prior work will lead to a lot of redundant and wasted effort.
Can anything be done about this? I can imagine various scenarios:
- Reducing the number of scientists. I hold the unfashionable view that this would be good for science, which I should elaborate on sometime, but it’s not going to happen anytime soon.
- Reducing the number of papers, especially mediocre papers, that are published per scientist. Everyone agrees that this would be great, but since publication is used (for good reason) as a measure of productivity, this is also not going to happen anytime soon.
- Not really keeping up with, or evaluating, papers except those published by people one knows, or by famous / influential people. This is already happening, and it’s dismaying. One’s Twitter activity shouldn’t be the determinant of one’s impact. This also contributes to the increasing concentration of skills and resources at “top tier” places, strengthening a feedback loop.
- Accepting the notion that papers are “noise,” with rarer and more substantial outputs like books being the outputs that one really studies and cares about. In principle review papers fill this role, though in some fields (like microbiome research) the ratio of review articles to primary articles is approaching 1.
Perhaps I’ll revisit this in a few years and comment on whether anything has changed!
— Raghuveer Parthasarathy, November 27, 2019
Today’s illustration
A few immune cells that went into a schematic illustration for a grant proposal. Yes, it’s pretty minimal, but unfortunately I haven’t drawn much recently.
For what it’s worth, I have attempted to combat this by dividing and conquering among members of my research group. We each have ~3 assigned journals and report on any interesting papers from them at biweekly literature group meetings. We’ve been maintaining this over a few years. This helps and is more tenable than me searching all of the journals. However, I do miss things of interest (as the students’ interest and perception of relevance is different than mine – thought that also means I learn about papers I might have otherwise overlooked). Google Scholar is also pretty astute at picking some relevant papers from other journals. But I agree that the problem of vast literature is too much to handle!
Dividing the task of keeping track of journals among the group is a good idea! Maybe I’ll try it. I don’t think it would actually save me any appreciable amount of time — having people recap their findings would, I would guess, take longer than me reading the journal contents — but it could be educational for everyone involved.