Automatic Recognition of Historical Newspaper Content
ProQuest has recently acquired 80 million page images of historical newspapers that cover local and global news coverage back through the eighteenth century. We now wish to automatically decompose those page images into their constitutive headlines and article blocks to make them more useful for information retrieval purposes. This team will create, train and deploy a machine learning pipeline capable of breaking those newspaper page images into their constitutive headlines and article blocks.
More Information: 2017-proquest-automatic-recognition-project
Students who successfully match to this project team will be required to sign the following two documents in January 2017:
- Skill level All levels
- Students 5-7
- Likely Majors CS, DATA, ECE, EE, MICDE, MIDAS
- Course Substitutions Honors, CSE-G, MIDAS, MICDE, Data Science Capstone, ECE Cognate
- IP & NDA Required? Yes
- Summer Opportunity Interview Guaranteed
Machine Learning / Machine Vision (2-3 Students)
Machine vision / optical character recognition (OCR) skills, Experience with image processing, Experience with machine learning, Familiarity with text analysis methods for document clustering
- Likely Majors: EE, ECE, Robotics, Data Science, MIDAS, MICDE
Programming (2-3 Students)
Collaborative programming experience. Software development: Solutions in any language, though Python is preferred for the machine learning components of the task
- Likely Majors: CSE/CS-LSA
Faculty Mentor: Brent Griffin
Graduate Student Research Assistant, EECS
I have industry experience in many aspects of engineering. As an intern, I have performed quality assurance at Kawasaki, reliability engineering at Spirit AeroSystems, and product engineering at Nebraska Boiler. As a full-time employee, I have designed and developed hardware-in-the-loop flight simulators at Cessna Aircraft and biotechnology research instruments at LI-COR Biosciences.
Sponsor Mentor: Douglas Duhaime
Douglas Duhaime is a Text and Data Mining Product Manager at ProQuest. He came to ProQuest after spending several years as a doctoral researcher studying applications of data mining and machine learning within the domain of historical research.
Sponsor Mentor: Roger Valade
Vice President of Engineering
Roger Valade is the Vice President of Engineering at ProQuest. He came to ProQuest as a Senior technology leader with extensive experience in enterprise and application architecture, software development and methodology (with an emphasis on agile), strategic planning, project and program management, offshoring in China and India, and change management.
For More Information About This Sponsor, Visit Their Website (ProQuest).