We investigated the similarities of pairs of articles that are cocited at the different cocitation levels of the journal, article, section, paragraph, sentence, and bracket. Our results indicate... Show moreWe investigated the similarities of pairs of articles that are cocited at the different cocitation levels of the journal, article, section, paragraph, sentence, and bracket. Our results indicate that textual similarity, intellectual overlap (shared references), author overlap (shared authors), proximity in publication time all rise monotonically as the cocitation level gets lower (from journal to bracket). While the main gain in similarity happens when moving from journal to article cocitation, all level changes entail an increase in similarity, especially section to paragraph and paragraph to sentence/bracket levels. We compared the results from four journals over the years 2010–2015: Cell, the European Journal of Operational Research, Physics Letters B, and Research Policy, with consistent general outcomes and some interesting differences. Our findings motivate the use of granular cocitation information as defined by meaningful units of text, with implications for, among others, the elaboration of maps of science and the retrieval of scholarly literature. Show less
Boyack, K.W.; Eck, N.J. van; Colavizza, G.; Waltman, L. 2017
We report characteristics of in-text citations in over five million full text articles from two large databases – the PubMed Central Open Access subset and Elsevier journals – as functions of time,... Show moreWe report characteristics of in-text citations in over five million full text articles from two large databases – the PubMed Central Open Access subset and Elsevier journals – as functions of time, textual progression, and scientific field. The purpose of this study is to understand the characteristics of in-text citations in a detailed way prior to pursuing other studies focused on answering more substantive research questions. As such, we have analyzed in-text citations in several ways and report many findings here. Perhaps most significantly, we find that there are large field-level differences that are reflected in position within the text, citation interval (or reference age), and citation counts of references. In general, the fields of Biomedical and Health Sciences, Life and Earth Sciences, and Physical Sciences and Engineering have similar reference distributions, although they vary in their specifics. The two remaining fields, Mathematics and Computer Science and Social Science and Humanities, have different reference distributions from the other three fields and between themselves. We also show that in all fields the numbers of sentences, references, and in-text mentions per article have increased over time, and that there are field-level and temporal differences in the numbers of in-text mentions per reference. A final finding is that references mentioned only once tend to be much more highly cited than those mentioned multiple times. Show less