Full Project Details

ProQuest is a content aggregator and research and learning hub for students, librarians, researchers and literature specialists. The students on the AI ProQuest team will deliver an end-to-end system that can reliably suggest and assist users in exploring relevant documents by utilizing a knowledge graph built off citations and other data from ProQuest’s corpus, which contains over one billion documents.

Abstract:
ProQuest is a content aggregator and research and learning hub for students, librarians, researchers and literature specialists. This project will focus on creating an end-to-end system that can process a user’s search and provide them with a recommendation of seminal papers and authors in that field. The system achieves this by leveraging a knowledge graph built from citation mappings and other information in published literature. We will use publicly available and ProQuest-proprietary datasets to help achieve this goal. An example:

If a user tries to search for “Covid-19 transmission modes” in a medical database, can we leverage the knowledge graph to recommend

    • relevant documents that are highly cited in the domain and/or relevant authors that are highly cited in the domain
    • relevant authors that are highly cited in the domain

Some exciting challenges in this project include the following:

  1. Normalizing authors (John.D vs John Deere vs D.John vs Deere John)
  2. Short text topic modelling
  3. Optimizing graph algorithms for massive graphs
  4. Addressing scalability concerns for large-scale deployment

Approaches we will consider may include (but will not be limited to):

  • Language Models
  • Transformers
  • Clustering
  • Recurrent Neural Networks
  • Conditional Random Fields (CRF)
  • Seq-2-Seq
  • Long-Short Term Memory (LSTM)
  • Combinatory Categorical Grammars (CCG)
  • Ensemble Techniques

The student team will deliver an end-to-end system/engine that can reliably suggest and assist users in forming “better” search queries.

More information