Skip to main content
Log in

Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In this position paper, we comment on various approaches to the delineation of scientific fields or domains, a typical prerequisite for a wide class of bibliometric studies. There is growing evidence that this meso-level, between micro targets of typical IR and large disciplines handled by macro-level bibliometric studies, takes full advantage of hybrid approaches. Firstly, delineation tasks gain to combine the a priori thinking of traditional IR, which typically involves clearly targeted expectations, and the a posteriori thinking of bibliometric mapping, where the decisions are built on external structuring of the domain in a wider context. The combination of the two ways of thought is far from new, with IR increasingly building on bibliometric networks for query expansion, and bibliometrics building on IR for evaluating and refining its outcomes. Secondly, delineation benefits from the multi-network perspective, which gives different representations of the scientific topics, usually all the more converging than the objects are dense and well separated. Focusing on two basic networks—words and citations—various sequences or combinations of operations are discussed. Bibliometrics and IR, especially when properly combined in multi-network approaches, provide an efficient toolbox for studies of domains delimitation. It should be recalled however that the context of such studies is often loaded with policy stakes that ask for cautious supervision and consultation processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Amongst Thomson-Reuters nomenclatures, the "subject categories" of SCI classification allow for overlaps mostly in terms of journals or journals sections.

  2. For a typology of IR models and the perspective of the "cognitive actor", see Ingwersen and Järvelin (2005).

  3. Assume articles B and A share the theoretical background and C and A share the domain of application. In bibliographic coupling articles B and C may both be attracted by A on quite different semantic aspects, while without epistemic relation. The argument is already found in Martyn (1964). Even mitigated by statistical aggregation, it expresses the cost associated to the statistical efficiency of bibliometric clustering. The use of hard-clustering, simple and fast, worsens this limitation. An overlapping technique might classify A, once with B, once with C. IR scholars warned against the holistic character of several mapping techniques, source of noise including for query expansion purposes.

  4. we are indebted to an anonymous referee for stressing this point.

  5. Even Latourian citations or negative citations do not add much noise to co-citation topics.

  6. High-precision is expected from a strategy focused on strong forms—heavy intersections—with check of the specific words/references of the native clusters c and w; high-recall is expected from a strategy based on the full content of c and w clusters with strong overlaps. Intermediary strategies can focus on intuitive groupings of areas along the diagonal sequence.

  7. for example by ruling out papers without a given number or proportion of specific references.

References

  • Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD, 207.

  • Ahlgren, P., & Colliander, C. (2009). Document-document similarity approaches and science mapping: Experimental comparison of five approaches. Journal of Informetrics, 3(1), 49–63.

    Article  Google Scholar 

  • Archambault E., Beauchesne O. H., & Caruso J. (2011) Towards a multilingual, comprehensive and open scientific journal ontology, in Proceedings 13th ISSI Conference, Durban, South Africa.

  • Barabasi, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.

    Article  MathSciNet  Google Scholar 

  • Bassecoulard, E., & Zitt, M. (1999). Indicators in a research institute: A multi-level classification of scientific journals. Scientometrics, 44(3), 23–345.

    Article  Google Scholar 

  • Benzecri, J. P. (1973) La place de l’a priori, Encyclopedia Universalis, 17, Organum, 11–24.

  • Benzecri, J. P., et al. (1981). Pratique de l’analyse des données : Linguistique et lexicologie. Paris: Dunod.

    MATH  Google Scholar 

  • Bergstrom, C. (2007). Eigenfactor: Measuring the value and prestige of scholarly journals. College & Research Libraries News, 68(5). www.ala.org/ala/acrl/acrlpubs/crlnews/backissues2007/may2007/eigenfactor.cfm.

  • Blair, D. C. (2003). Information retrieval and the philosophy of language. Annual Review of Information Science and Technology, 37, 3–50.

    Article  Google Scholar 

  • Blondel V. D., Guillaume J. L., Lambiotte R., & Lefebvre E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), 10008.

  • Börner, K., Chen, C. M., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual Review of Information Science and Technology, 37, 179–255.

    Article  Google Scholar 

  • Börner, K., Glänzel, W., Scharnhorst, A., & van den Besselaar, P. (2011). Modeling science: studying the structure and dynamics of science. Scientometrics, 89, 347–348.

    Article  Google Scholar 

  • Bornmann, L., & Daniels, H. D. (2008). What do citation counts measure? A review of studies on citation behavior. Journal of Documentation, 64(1), 45–80.

  • Boyack, K. W., Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? JASIST, 61(12), 2389–2404.

  • Boyack, K., & Klavans, R. (2013). Creation of a highly detailed, dynamic, global model and map of science, forthcoming in JASIST. doi:10.1002/asi.22990.

  • Boyack, K., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. JASIST, 64(9), 1759–1767.

    Article  Google Scholar 

  • Braam, R. R., Moed, H. F., & Van Raan, A. F. J. (1991). Mapping of science by combined co-citation and word analysis. I Structural aspects. JASIS, 42(4), 233–251.

    Article  Google Scholar 

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and Isdn Systems, 30(1–7), 107–117.

    Article  Google Scholar 

  • Cadot M., & Lelu, A. (2011). Combining Explicitness and Classifying Performance via MIDOVA Lossless Representation for Qualitative Datasets. International Journal on Advances in Software, 5(1–2), 1–16.

  • Callahan, A., Hockema, S., & Eysenbach, G. (2010). Contextual co-citation: Augmenting co-citation analysis and its applications. JASIST, 61(6), 1130–1143.

    Google Scholar 

  • Callon, M., Courtial, J. P., Turner, W. A., & Bauin, S. (1983). From translations to problematic networks: An introduction to co-word analysis. Social Science Information, 22(2), 191–235.

    Article  Google Scholar 

  • Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemistry. Scientometrics, 22(1), 155–205.

    Article  Google Scholar 

  • Carayol, N., & Roux, P. (2009). Knowledge flows and the geography of networks: A strategic model of small world formation. Journal of Economic Behavior & Organization, 71(2), 414–427.

    Article  Google Scholar 

  • Carpineto, G., & Romano, C. (2012). A survey of automatic query expansion in information retrieval. ACM-CSUR, 44(1), 1.

    Article  Google Scholar 

  • Chavalarias, D., & Cointet, J. P. (2013). Phylomemetic patterns in science evolution—The rise and fall of scientific fields. PLoS ONE, 8(2), e54847.

    Article  Google Scholar 

  • Chen, C. M. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. JASIS, 57(3), 359–377.

    Article  Google Scholar 

  • Chen, C. M., Ibekwe-Sanjuan, F., & Hou, J. (2010). The structure and dynamics of co-citation clusters: A multiple-perspective co-citation analysis. JASIST, 61(7), 1386–1409.

    Article  Google Scholar 

  • Cronin, B. (1984). The citation process; The role and significance of citations in scientific communication (p. 103). London: Taylor Graham.

    Google Scholar 

  • de Beaver, D., & Rosen, R. (1979). Studies in scientific collaboration. Part II. Scientific co-authorship, resarch productivity and visibility in the French Scientific Elite, 1799–1830. Scientometrics, 1(2), 133–149.

    Article  Google Scholar 

  • Deerwester, S., Dumai, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. JASIST, 41(6), 391–407.

    Article  Google Scholar 

  • Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? JASIST, 59(1), 51–62.

    Article  Google Scholar 

  • Eom, Y. H., & Fortunato, S. (2011). Characterizing and modeling citation dynamics. PLoS ONE, 6(9), e24926. doi:10.1371/journal.pone.0024926.

    Article  Google Scholar 

  • Garfield, E. (1967). Primordial concepts, citation indexing and historio-bibliography. Journal Library History, 2, 235–249.

    Google Scholar 

  • Garfield, E., & Sher, I. H. (1993). Keywords-Plus(Tm) -Algorithmic derivative indexing. JASIST, 44(5), 298–299.

    Article  Google Scholar 

  • Garfield, E., Pudovkin, A. I., & Istomin, V. S. (2003). Why do we need algorithmic historiography? JASIST, 54(5), 400–412.

    Article  Google Scholar 

  • Gilbert, G. N. (1977). Referencing as persuasion. Studies of Science, 7, 113–122.

  • Gilbert, N. (1997). A simulation of the structure of academic science. Sociological Research Online, 2(2), 3. http://www.socresonline.org.uk/socresonline/2/2/3.html.

    Google Scholar 

  • Glänzel, W., & Czerwon, H. J. (1996). A new methodological approach to bibliographic coupling and its application to the national, regional and institutional level. Scientometrics, 37(2), 195–221.

    Article  Google Scholar 

  • Glänzel, W., & Schubert, A. (2003). A new classification of the science fields and subfields designed for scientometric evaluation purposes. Scientometrics, 56(3), 357–367.

    Article  Google Scholar 

  • Gläser, J., Lange, S., Laudel, G., & Schimank, U. (2010). The Limits of Universality: How field-specific epistemic conditions affect authority relations and their consequences. In R. Whitley, J. Gläser, & L. Engwall (Eds.), Reconfiguring knowledge production: Changing authority relationships in the sciences and their consequences for intellectual innovation (pp. 291–324). Oxford: Oxford University Press.

    Chapter  Google Scholar 

  • Ingwersen, P. (1996). Cognitive perspectives of information retrieval interaction: Elements of a cognitive IR theory. Journal of Documentation, 57(6), 715–740.

    Article  Google Scholar 

  • Ingwersen, P., & Järvelin, K. (2005). The turn: Integration of inversion seeking and retrieval in context (p. 436). Berlin: Springer.

    Google Scholar 

  • Janssens, F., Glanzel, W., & De Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607–631.

    Article  Google Scholar 

  • Jardine, N., & van Rijsbergen, C. J. (1971). The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, 7, 217–240.

    Article  Google Scholar 

  • Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14, 10–25.

    Article  Google Scholar 

  • Kostoff, R. N., delRio, J. A., Humenik, J. A., Garcia, E. O., & Ramirez, A. M. (2001). Citation mining: Integrating text mining and bibliometrics for research user profiling. JASIST, 52(13), 1148–1156.

    Article  Google Scholar 

  • Larivière, V., Archambault, E., & Gingras, Y. (2008). Long-term variations in the aging of scientific literature: from exponential growth to steady-state science (1900–2004). JASIST, 59(2), 288–296.

  • Larsen, B. (2002). Exploiting citation overlaps for information retrieval: Generating a boomerang effect from the network of scientific papers. Scientometrics, 54(2), 155–178.

    Article  Google Scholar 

  • Latour, B. (1987). Science in action: How to follow Scientists and Engineers through society. Cambridge: Harvard University Press.

    Google Scholar 

  • Laurens, P., Zitt, M., & Bassecoulard, E. (2010). Delineation of the genomics field by hybrid citation-lexical methods: Interaction with experts and validation process. Scientometrics, 82(3), 647–662.

    Article  Google Scholar 

  • Lelu, A. (1994). Clusters and factors: Neural algorithms for a novel representation of huge and highly multidimensional data sets. In E. Diday & Y. Lechevallier (Eds.), New approaches in classification and data analysis (pp. 241–248). Berlin: Springer.

    Chapter  Google Scholar 

  • Leydesdorff, L., & Cozzens, S. E. (1993). The delineation of specialties in terms of journals using the dynamic journal set of the science citation Index. Scientometrics, 26, 133–154.

    Article  Google Scholar 

  • Liu, S., & Chen, C. M. (2013). The differences between latent topics in abstracts and citation contexts of citing papers. JASIST, 64(3), 627–639.

    Article  Google Scholar 

  • Liu, X., Yu, S., Janssens, F., Glänzel, W., Moreau, Y., & De Moor, B. (2010). Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. JASIST, 61(6), 1105–1119.

    Google Scholar 

  • Marshakova, I. V. (1973). Document coupling system based on references taken from science citation Index (in Russian). Nauchno-TeknicheskayaInformatsiya, Ser. 2 6.3.

  • Martyn, J. (1964). Bibliographic coupling. Journal of Documentation, 20(4), 236.

    Article  Google Scholar 

  • Mc Cain, K. W. (1983). The author co-citation structure of macroeconomics. Scientometrics, 5(5), 277–289.

    Article  Google Scholar 

  • McCain, K.W. (1989). Descriptor and citation retrieval in the medical behavioral sciences literature: Retrieval over-laps and novelty distribution. JASIS, 40(2), 110–114.

  • Morris, S. A., Yen, G., Wu, Z., & Asnake, B. (2003). Time line visualization of research fronts. JASIST, 54(5), 413–422.

    Article  Google Scholar 

  • Mullins, N. C., Hargens, L. L., Hecht, P. K., & Kick, E. L. (1977). The group structure of co-citation clusters: A comparative study. American Sociological Review, 42, 552–562.

    Article  Google Scholar 

  • Mutschke, P., & Quan-Haase, A. (2001). Collaboration and cognitive structures in social science research fields: Towards socio-cognitive analysis in information systems. Scientometrics, 52(3), 487–502.

    Article  Google Scholar 

  • Mutschke, P., Mayr, P., Schaer, P., & Sure, Y. (2011). Science models as value-added services for scholarly information systems. Scientometrics, 89, 349–364.

    Article  Google Scholar 

  • Narin, F., Pinski, G., & Gee, H. H. (1976). Structure of the biomedical literature. Journal of the American Society for Information Science, 27(1), 25–45.

    Article  Google Scholar 

  • Narin, F., & Noma, E. (1985). Is technology becoming science? Scientometrics, 7(3), 369–381.

    Article  Google Scholar 

  • Noyons, E. C. M. (1999). Bibliometric mapping as a science policy and research management tool. Leiden: Leiden University DSWO Press.

    Google Scholar 

  • Palacios-Huerta, I., & Volij, O. (2004). The measurement of intellectual influence. Econometrica, 72(3), 963–977.

    Article  Google Scholar 

  • Pao, M. L. (1993). Term and citation retrieval -a field-study. Information Processing and Management, 29(1), 95–112.

    Article  Google Scholar 

  • Papadimitriou, C., Raghavan, P., Tamaki H. & Vempala S. (1998). Latent semantic indexing: A probabilistic analysis, PODS Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART symposium on principles of databases systems. 159–168.

  • Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing and Management, 12, 297–312.

    Article  Google Scholar 

  • Polanco, X., Grivel, L. & Royauté, J. (1995). How to do things with terms in informetrics : Terminological variation and stabilization as science watch indicators. In M. Koenig (Ed.), Proceedings of the 5th ISSI Intl Conference (River Forest IL, June 7-10, 1995) 435–444: Learned Information, Medford NJ.

  • Price, D. J. de Solla. (1965). Networks of scientific papers. Science, 149(3683), 510–515.

  • Price, D. J. de Solla. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306.

  • Rafols, I., Porter, A. L., & Leydesdorff, L. (2010). Science overlay maps: A new tool for research policy and library management. JASIS, 61(9), 1871–1887.

    Article  Google Scholar 

  • Ritchie A., Robertson S. & Teufel S. (2008) Comparing citation context for information retrieval, CIKM’08, Proceedings 17th ACM Conference on Information and knowledge management 213–222.

  • Rocchio, J. (1971). Relevance feedback in information retrieval. In G. Salton (Ed.), The smart retrieval system: Experiments in automatic document processing (pp. 313–323). Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Ross, N. C. M., & Wolfram, D. (2000). End user searching on the Internet: An analysis of term pair topics submitted to the Excite search engine. JASIST, 51(10), 949–958.

    Article  Google Scholar 

  • Rosvall, M., & Bergstrom, C. (2008). Maps of information flows reveal structures in complex networks. PNAS, 105, 1118.

    Article  Google Scholar 

  • Roth, C., & Cointet, J. P. (2010). Social and semantic coevolution in Knowledge. Social Networks, 32(1), 16–29.

    Article  Google Scholar 

  • Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. JASIST, 41(4), 288–297.

    Article  Google Scholar 

  • Scharnhorst, A., Börner, K., & van den Besselaar, P. (Eds.). (2012). Models of science dynamics: Encounters between complexity theory and information sciences (Understanding Complex Systems). Berlin: Springer.

    Google Scholar 

  • Small, H. (1973). Co-citation in the scientific literature : A new measure of the relationship between two documents. JASIS, 24(4), 265–269.

    Article  Google Scholar 

  • Small, H. (1980). Co-citation context analysis and the structure of paradigms. Journal of Documentation, 36(3), 183–196.

    Article  MathSciNet  Google Scholar 

  • Small, H. (2011). Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics, 87(2), 373–388.

    Article  Google Scholar 

  • Teufel S., Siddharthan A. & Tidhar D. (2006) Automatic classification of citation function, Proceedings EMNLP ‘06 Proceedings 2006 Conference on Empirical Methods in Natural Language Processing.

  • van den Besselaar, P., & Heimeriks, G. (2006). Mapping research topics using word-reference co-occurrences: A method and an exploratory case study. Scientometrics, 68(3), 377–393.

    Article  Google Scholar 

  • Waltmann, L., & van Eck, N. (2012). A new methodology for constructing a publication-level classification system of science. JASIS, 63(12), 2378–2392.

    Article  Google Scholar 

  • Watts, C., & Gilbert, N. (2011). Does cumulative advantage affect collective learning in science? An agent-based simulation, Scientometrics, 89(1), 437–463.

    Google Scholar 

  • White, H. D., & Griffith, B. C. (1981). Author co-citation: A literature measure of intellectual structure. JASIS, 32(3), 163–172.

    Article  Google Scholar 

  • Zitt, M., & Bassecoulard, E. (1996). Reassessment of co-citation methods for science indicators: Effect of methods improving recall rates. Scientometrics, 37(2), 223–244.

    Article  Google Scholar 

  • Zitt, M., & Bassecoulard, E. (2006). Delineating complex scientific fields by an hybrid lexical-citation method: An application to nanosciences. Information Processing and Management, 42(6), 1513–1531.

    Article  Google Scholar 

  • Zitt, M., Ramanana-Rahary, S., & Bassecoulard, E. (2005). Relativity of citation performance and excellence measures: From cross-field to cross-scale effects of field-normalisation. Scientometrics, 63(2), 373–401.

    Article  Google Scholar 

  • Zitt, M., Lelu, A., & Bassecoulard, E. (2011). Hybrid citation-word representations in science mapping: Portolan charts of research fields? JASIST, 62(1), 19–39. doi:10.1002/asi.21440.

    Article  Google Scholar 

  • Zitt M., & Small, H. (2008). Modifying the journal impact factor by fractional citation weighting: The audience factor. JASIST, 59(11), 1856–1860.

Download references

Acknowledgments

The author thanks Alain Lelu, Université de Franche-Comté and Loria, Nancy, Elise Bassecoulard, formerly Inra-Lereco, and anonymous referees, for helpful remarks; Patricia Laurens and Antoine Schoen, ESIEE, Marne la Vallée, for permission to use the genomics map, from our previous co-work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michel Zitt.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zitt, M. Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation. Scientometrics 102, 2223–2245 (2015). https://doi.org/10.1007/s11192-014-1482-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-014-1482-5

Keywords

Navigation