Two interesting papers have recently come out on arXiv about citation and readership habits in the high-energy physics community. The first to appear, arXiv:0906.5418 by me and two of my colleagues at CERN, discussed the large number of citations that papers posted to arXiv get, compared to those in HEP that are not posted to arXiv. While this is, at least in part, attributable to selection effects, we found two striking pieces of data:
1) Papers on arXiv are cited before they are published in journals. In fact 20 percent of the citations that articles receive in their first two years occur during the time before publication.
2) Physicists using SPIRES to access the literature are presented with a choice of clicking on the arXiv version or the journal version. For one month we looked at how often users clicked on each of these links for articles that were both submitted to arXiv and published, and found that 80% of the clicks went to arXiv. Another large set of users use arXiv directly, and thus more than 80% of readers in HEP prefer arXiv versions to published versions, when given a choice.
Together, these points make it clear that researchers in HEP don't use journals to communicate scientific ideas. They may notice a paper is published, and they certainly value the peer-review and other functions provided by the journals, but they don't communicate using the journals; instead they use arXiv, which is much faster.
The second paper , by Ginsparg and Haque, examines how the position of an article in the daily arXiv listings affects the number of citations it gets. Papers are listed on arXiv in the order they are submitted, with each new round of submissions starting at 4 p.m. EST daily. Since most physicists communicate via arXiv, and the arXiv lists new submissions daily, the position of the paper on these lists affects the citation count. Papers at the top of the list get more citations.
Ginsparg and Haque show that the boost is due to two effects. One is due purely to the higher visibility, and is about 50 percent or so, depending on the subfield. The other effect is a self-selection effect, i.e. researchers who know their paper is good want it to be first. They will work hard to submit it at 4:00:01 p.m. to get it on the top. This effect is of a similar size to that of the visibility effect.
So, by studying the systems that HEP uses to communicate (arXiv, SPIRES, and journals) we see that physicists in HEP are quite savvy about the communication tools they use and the ways those tools work. Researchers understand that work on arXiv is unpublished initially, but will eventually be peer-reviewed, and are willing to cite it during the interim period. They understand that the versions on arXiv are generally updated to match the final journal article, and prefer to use the consistent interface and access provided by arXiv. They understand that the arXiv daily listings are important and heavily read, thus they are willing to go to some lengths to submit papers right at 4:00:01 to get them on the top of the lists.
What new features will these communication systems provide that HEP scientists will adopt? Stay tuned. I think there are more exciting things to come...