The Statistics Online Computational Resource (SOCR) analyzes “Big Data,” which are very large, heterogeneous, time-varying, multisource, and incomplete datasets that are difficult to interpret and model in meaningful ways using classical probability, statistical, or algorithmic approaches. The SOCR team designs and disseminates educational materials, web-services, and advanced data science methods and tools in probability, statistics, machine learning, and health analytics. This research team will:

(1) Enhance SOCR analysis toolbox and visualization components with an emphasis on Big Biomedical and Neuroscience Data. The toolbox will be designed to run in a web browser and enhance the visual presentation and interpretation of Big Data. The creation of the toolbox will allow many more researchers (including students) to learn about, appreciate, contribute, and apply complex analytics to their work, making Big Data much easier to turn into “big results.”

(2) Implement powerful, modern, and portable webapps (HTML5/JavaScript/Rshiny/Rmarkdown/Jupiter) that can be used to model various interesting processes, enable exploratory and quantitative data analyses, and facilitate the understanding of high-dimensional and complex information.

(3) Develop advanced AI/ML data analytics, e.g., compressive big data analytics, statistical obfuscation techniques, and Bayesian approaches to address specific biomedical, healthcare, neuroimaging-genetics, and other applications.

(4) Expand the novel Spacekime Analytics method for mathematical representation, statistical inference, and computational prediction of large longitudinal information.

More details are provided on the SOCR Research website (

Meeting time and location: For MDP academic credit, MDP-SOCR R&D courses (e.g., ENGR 255/355, NURS 995, ENG 455/599) are hybrid – we will mainly meet face-to-face occasionally, and very often coordinate synchronously via SOCR Zoom-channel, and asynchronously via Cloud services. The entire SOCR team meets once each semester (in-person, with Zoom streaming), twice each month typically on Fridays or Tuesdays at 8 am – 9 am ET via SOCR-Zoom channel (distance synchronous communication), weekly in smaller project-specific team sessions, and coordination asynchronously of all progress, challenges, and interactions via G-Drive. Each sub-team arranges a convenient time to meet and work together following university guidelines. Annual, two-term enrollment commitments begin each January.

Team organization

Each SOCR sub-team is coached by an experienced student that reports to the SOCR faculty and the PI. Sub-teams are mostly focused around developing the mathematical foundations, building particular algorithms, and designing statistical approaches for addressing applications. The sub-teams are flexibly structured to promote creativity, provide opportunity for student growth, and nurture team-science. We have the following project sub-teams: SOCRAT, CBDA, DataSifter, Data Analytics, Data Science Fundamentals, Spacekime analytics, (see SOCR website). As students develop skills and build confidence, they should expect increasing responsibility on assignments with multiple parts of the SOCR Lab.


First-year undergraduates through Masters students are welcome to apply, and all will be encouraged to stay on the team for more than the two-semester minimum. Leadership roles are available in the lab, and experienced students will be a natural fit for these positions as their knowledge grows over time.

Students apply to a specific role on the team as follows:

Programming (3 Students)

Preferred Skills: HTML5, JavaScript, Web-based functional development, Intuitive UI/UX design, Experience with Adobe Illustrator, Canvas, and/or R/Python a plus

Likely Majors: Computer Science (CSE/CS-LSA), School of Information (SI), ANY

Analytics (3 Students)

Preferred Skills: Statistical modeling, high-throughput data analytics, machine learning, R/Python

Likely Majors: Statistics, Data Science (SI), MIDAS, Biostatistics, Bioinformatics, Math, Computer Science (CSE/CS-LSA)

Methods (DataSifter & CBDA) (4 Students)

Preferred Skills: Technical math background, AI/ML, R-computing

Likely Majors: Statistics, Data Science (SI), MIDAS, Biostatistics, Bioinformatics, Math, Computer Science (CSE/CS-LSA)

Data Science Fundamentals (3 Students)

Preferred Skills: Students with strong mathematics and physics background and significant computational R-programming skills. Strong motivation and interests in graduate-level fundamentals of data science principles are necessary. Trainees will work directly with the PI. Students should be familiar with information measures, entropy KL divergence, ODEs/PDEs, Dirac’s bra-ket operators. Review the website.

Likely Majors: Physics, math or engineering background

Apprentice Researcher (4 Students)

Requirements: Interest in project material, willingness to develop skills. Open to first- and second-year undergraduate students ONLY.

Likely Majors: Computer Science (CSE/CS-LSA) Statistics, Biostatistics, Bioinformatics, Math, Physics, Engineering, School of Information (SI)

Faculty Sponsor

Ivo Dinov
Professor, Computational Medicine and Bioinformatics, Health Behavior and Biological Sciences; and Associate Director, Michigan Precision Health, Education and Training Workgroup.

Dr. Dinov is a professor of Computational Medicine and Bioinformatics, Health Behavior and Biological Sciences at the University of Michigan, the director of the Statistics Online Computational Resource (SOCR), and an associate director at the Michigan Precision Health, Education and Training Workgroup. Dr. Dinov develops advanced mathematical models for representation, scientific computing, statistical analysis and interactive visualization of multi-dimensional, multimodal and informatics biomedical data (Big Data). With expertise in human brain imaging, statistical computing and high-throughput distributed data processing, he approaches biomedical and health science research from the perspective of team-science Big Data applications in informatics, multimodal biomedical image analysis, distributed genomics computing, complex-time representation of longitudinal data, spacekime analytics, and health analytics.

Number of Students: 17 – 30


Summer Opportunity: Some Summer research internships may be available for qualifying students

Citizenship Requirements: This project is open to all students on campus

IP/NDA: Students who successfully match to this project team will be required to sign an Intellectual Property (IP) Agreement prior to starting participation in January.

Course Substitutions: Honors

More information is available at

How to Apply: Full MDP project list & application information can be found here: