ProQuest is a content aggregator and research and learning hub for students, librarians, researchers and literature specialists. The students on the AI ProQuest team will deliver an end-to-end system that can reliably suggest and assist users in exploring relevant documents by utilizing a knowledge graph built off citations and other data from ProQuest’s corpus, which contains over one billion documents.
Abstract:
ProQuest is a content aggregator and research and learning hub for students, librarians, researchers and literature specialists. This project will focus on creating an end-to-end system that can process a user’s search and provide them with a recommendation of seminal papers and authors in that field. The system achieves this by leveraging a knowledge graph built from citation mappings and other information in published literature. We will use publicly available and ProQuest-proprietary datasets to help achieve this goal. An example:
If a user tries to search for “Covid-19 transmission modes” in a medical database, can we leverage the knowledge graph to recommend
- relevant documents that are highly cited in the domain and/or relevant authors that are highly cited in the domain
- relevant authors that are highly cited in the domain
Some exciting challenges in this project include the following:
- Normalizing authors (John.D vs John Deere vs D.John vs Deere John)
- Short text topic modelling
- Optimizing graph algorithms for massive graphs
- Addressing scalability concerns for large-scale deployment
Approaches we will consider may include (but will not be limited to):
- Language Models
- Transformers
- Clustering
- Recurrent Neural Networks
- Conditional Random Fields (CRF)
- Seq-2-Seq
- Long-Short Term Memory (LSTM)
- Combinatory Categorical Grammars (CCG)
- Ensemble Techniques
The student team will deliver an end-to-end system/engine that can reliably suggest and assist users in forming “better” search queries.
More information