The Statistics Online Computational Resource (SOCR) designs and disseminates educational materials, web-services, and advanced methods and tools in probability, statistics, machine learning, health analytics, and “Big Data” – very large datasets that are difficult to analyze and interpret in meaningful ways using classical probability/statistical methods. This team will:

(1) Enhance SOCR analysis toolbox and visualization components with an emphasis on Big Biomedical and Neuroscience Data. The toolbox will be designed to run in a web browser and enhance the visual presentation and interpretation of Big Data. The creation of the toolbox will allow many more researchers (including students) to learn about, appreciate, contribute, and apply complex analytics to their work, making Big Data much easier to turn into “big results.”

(2) Develop advanced AI/ML data analytics to address specific biomedical, healthcare, neuroimaging-genetics, and other applications.

(3) Expand the novel Spacekime Analytics method for mathematical representation, statistical inference, and computational prediction of large longitudinal information.

Meeting time and location:

For academic credit, our MDP course is classified as a hybrid course but will mainly meet remotely, following university public health informed guidelines. In the past, the entire SOCR team meets once each semester (we will meet in-person as health-informed practice allows), twice each month typically on Fridays or Tuesdays at 8 am –9 am ET via BlueJeans video conference (distance synchronous communication), weekly in smaller project-specific team sessions, and coordination asynchronously of all progress, challenges, and interactions via G-Drive. Each subteam arranges a convenient time to meet and work together following university guidelines. A two-term commitment will begin January 2021.

Team organization:

Each SOCR sub-team is coached by an experienced graduate student that reports to the faculty PI. Sub-teams are mostly focused around developing the mathematical foundations, building particular algorithms, and designing statistical approaches for addressing applications. The sub-teams are flexibly structured to promote creativity, provide opportunity for student growth, and nurture team-science. We have the following project sub-teams: SOCRAT, CBDA, DataSifter, Data Analytics, Data Science Fundamentals (see SOCR website). As students develop experience, they should expect increasing responsibility on assignments with multiple parts of the SOCR Lab.

More information

First-year undergraduates through masters graduate students are welcome to apply, and all will be encouraged to stay on the team for more than the two-semester minimum. Leadership roles are available in the lab, and experienced students will be a natural fit for these positions as their knowledge grows over time.

Programming (3 Students)

Preferred Skills: HTML5, JavaScript, Web-based functional development, Intuitive UI/UX design, Experience with Adobe Illustrator, Canvas, and/or R/Python a plus

Likely Majors: Computer Science (All), School of Information (SI), ANY

Analytics (3 Students)

Preferred Skills: Statistical modeling, high-throughput data analytics, machine learning, R/Python

Likely Majors: Statistics, Data Science, MIDAS, Biostatistics, Bioinformatics, Math, Computer Science (All)

Methods (DataSifter & CBDA) (4 Student)

Preferred Skills: Technical math-background, R-computing

Likely Majors: Statistics, Data Science, MIDAS, Biostatistics, Bioinformatics, Math, Computer Science (All)

Data Science Fundamentals (3 Student)

Preferred Skills: Students with strong mathematics and physics background and significant computational R-programming skills. Strong motivation and interests in graduate-level fundamentals of data science principles are necessary. Trainees will work directly with the PI. Students should be familiar with information measures, entropy KL divergence, ODEs/PDEs, Dirac’s bra-ket operators. Review the website.

Likely Majors: Physics, math or engineering background

Apprentice Level (4 Students)

Preferred Skills: Interest in project material, willingness to develop skills. Open to first-and second-year undergraduate students ONLY.

Likely Majors: Computer Science (All) Statistics, Biostatistics, Bioinformatics, Math, Physics, Engineering, School of Information (SI)

Faculty Sponsor

Ivo Dinov
Professor, Computational Medicine and Bioinformatics, Health Behavior and Biological Sciences; and Associate Director for Education and Training, Michigan Institute for Data Science

Dr. Dinov is a professor of Computational Medicine and Bioinformatics, Health Behavior and Biological Sciences at the University of Michigan, the director of the Statistics Online Computational Resource (SOCR), and an associate director for Education and Training of the Michigan Institute for Data Science (MIDAS). Dr. Dinov develops advanced mathematical models for representation, scientific computing, statistical analysis and interactive visualization of multi-dimensional, multimodal and informatics biomedical data (Big Data). With expertise in human brain imaging, statistical computing and high-throughput distributed data processing, he approaches biomedical and health science research from the perspective of Big Data applications in nursing informatics, multimodal biomedical image analysis, and distributed genomics computing.

Students: 17 – 30

Likely Majors: Bioinformatics, Biostatistics, Computer Science (All), Data Science, Engineering, MIDAS, Math, Physics, School of Information (SI), Statistics, Any

Summer Opportunity: Summer research fellowships may be available for qualifying students.

Citizenship Requirements: This project is open to all students on campus.

IP: Students who successfully match to this project team will be required to sign an Intellectual Property (IP) Agreement prior to participation in January 2021.

Course Substitutions: Honors