ProQuest is a content aggregator and research and learning hub for students, librarians, researchers and literature specialists. The students on the AI ProQuest team will deliver an end-to-end system that can reliably suggest and assist users in exploring relevant documents by utilizing a knowledge graph built off citations and other data from ProQuest’s corpus, which contains over one billion documents.
Abstract:
ProQuest is a content aggregator and research and learning hub for students, librarians, researchers and literature specialists. This project will focus on creating an end-to-end system that can process a user’s search and provide them with a recommendation of seminal papers and authors in that field. The system achieves this by leveraging a knowledge graph built from citation mappings and other information in published literature. We will use publicly available and ProQuest-proprietary datasets to help achieve this goal. An example:
If a user tries to search for “Covid-19 transmission modes” in a medical database, can we leverage the knowledge graph to recommend
- relevant documents that are highly cited in the domain and/or relevant authors that are highly cited in the domain
- relevant authors that are highly cited in the domain
Some exciting challenges in this project include the following:
- Normalizing authors (John.D vs John Deere vs D.John vs Deere John)
- Short text topic modelling
- Optimizing graph algorithms for massive graphs
- Addressing scalability concerns for large-scale deployment
Approaches we will consider may include (but will not be limited to):
- Language Models
- Transformers
- Clustering
- Recurrent Neural Networks
- Conditional Random Fields (CRF)
- Seq-2-Seq
- Long-Short Term Memory (LSTM)
- Combinatory Categorical Grammars (CCG)
- Ensemble Techniques
The student team will deliver an end-to-end system/engine that can reliably suggest and assist users in forming “better” search queries.
More Information
Natural Language Processing (2 Students)
Specific Skills: Experience/Interest in Machine Learning –EECS 595/597/543; SI 561/760 (or equivalent self-learning). If applying for this role, highlight a project (outside of class assignments) that you’ve used NLP in your personal statement.
Likely Majors: CS (ALL), EE (ALL), MATH, Any other with appropriate experience
Machine Learning / Data Science (2 Students)
Specific Skills: Experience/Interest in Machine Learning –EECS 445/453/492/505/545/551 (or equivalent self-learning). If applying for this role, highlight a project (outside of class assignments) that you’ve used ML in your personal statement.
Likely Majors: CS (ALL), EE (ALL), MATH, DATA (ALL), MIDAS, SI (MS), Any other with appropriate experience
General Programming (2 Students)
Specific Skills: Solid programming experience –EECS 281 (or equivalent). Highlight your favourite project in your personal statement!
Likely Majors: CS (ALL), DATA (ALL)
Sponsor Mentors
Kevin Hastie
Director of Technology, ProQuest
Over 20 years of experience as a software engineer, architect, technology leader and A.I. advocate.
Executive Mentor
Roger Valade
VP of Engineering, ProQuest
Senior technology leader with extensive experience in enterprise and application architecture, software development and methodology (with an emphasis on agile), strategic planning, project and program management, offshoring in China and India, and change management. Former positions include VP, Technology for a $200M publishing company; VP, Technical Solutions for a J2EE consultancy; and Architect at General Motors. Have managed teams of up to 105 people and budgets of nearly $20M.
Faculty Mentor
Kerby Shedden, Ph.D.
Kerby Shedden is an Associate Professor of Biostatistics and an Associate Professor of Statistics in the College of Literature, Science and the Arts (LSA). He received a Ph.D. in Statistics from UCLA in 1999. His research focuses on developing and evaluating methods for analyzing high dimensional and complex data including dimension reduction, feature extraction, modeling, and inference. In addition to basic research he also collaborates on a number of life science projects in which complex data sets arise. This includes collaborations with members of the UM Cancer Center Cancer Genetics Program and Biostatistics Core, the UM College of Pharmacy, and the UM Addiction Research and Depression centers.
Course Substitutions: Honors, ChE Elective, CS MDE/Capstone, CE MDE, Data Science Capstone, EE MDE, IOE Senior Design
Internship/Summer Opportunity: Students will be guaranteed an interview for a 2021 internship. The interviews will take place between January 1 and March, 2021.
Citizenship Requirements: This project is open to all students.
IP/NDA: Students will sign standard University of Michigan IP/NDA documents.