Sentiment Analysis and Machine Learning for Understanding Researcher Affect
This project will focus on creating a system that identifies the key emotions (e.g. discord, frustration, elation, surprise, boredom) which are most prominent in research discourse. The ProQuest Sentiment Analysis student team will deliver an end-to-end system/engine that reliably extracts author sentiment from ProQuest content.
The Sponsor, ProQuest, is a content aggregator and research and learning hub for students, librarians, instructors, and researchers. New methods for exploring and analyzing large amounts of text data are changing the way our users and buyers access and analyze our content. Accurate affect analysis of historic and contemporary research is critical for: 1) Buyers who are identifying valuable potential content for their collections and 2) Researchers and students for whom sentiment vectors are valuable in their Machine Learning and NLP research.
This project will focus on identifying the key emotions (e.g. discord, frustration, elation, surprise, boredom) which are most prominent in research discourse. The team will use publicly available and ProQuest-proprietary datasets, including Dissertations, EBooks, Journal Articles and Newspapers. There are several Machine Learning and Natural Language Processing challenges to this task. These challenges include:
● Identifying the key affective states in research debates and discourse.
● Mapping discrete affective states to valance and arousal vectors.
● Creatively designing machine learning features to exploit and augment results from underlying sentiment analysis methods.
● Developing techniques that are robust to varying content types.
● Addressing scalability concerns for large-scale unsupervised learning tasks.