ProQuest has recently acquired 80 million page images of historical newspapers that cover local and global news coverage back through the eighteenth century. We now wish to automatically decompose those page images into their constitutive headlines and article blocks to make them more useful for information retrieval purposes. This team will create, train and deploy a machine learning pipeline capable of breaking those newspaper page images into their constitutive headlines and article blocks.
More Information: 2017-proquest-automatic-recognition-project
Students who successfully match to this project team will be required to sign the following two documents in January 2017: