EPSRC logo

Details of Grant 

EPSRC Reference: EP/R014507/1
Title: Learning Sparse Features from 4D fMRI Data for Brain Disease Diagnosis
Principal Investigator: Lu, Dr H
Other Investigators:
Researcher Co-Investigators:
Project Partners:
University of Oxford
Department: Computer Science
Organisation: University of Sheffield
Scheme: First Grant - Revised 2009
Starts: 01 January 2018 Ends: 30 June 2019 Value (£): 100,730
EPSRC Research Topic Classifications:
Artificial Intelligence
EPSRC Industrial Sector Classifications:
Healthcare Information Technologies
Related Grants:
Panel History:
Panel DatePanel NameOutcome
05 Sep 2017 EPSRC ICT Prioritisation Panel Sept 2017 Announced
Summary on Grant Application Form
Machine learning endows computers with the ability to learn from data to help solve real-world problems. Due to the growth of big data, machine learning methods have become increasingly important tools in a wide range of applications including bioinformatics, computer vision, economics, and medicine. This project investigates machine learning for extracting useful information from fMRI data to help clinicians make more accurate diagnoses for certain brain diseases and develop more effective treatments for them.

Currently, deep learning is the most popular machine learning method. However, it has highly complex architectures and needs vast amounts of data to learn a huge number of parameters. This leads to difficulties when the number of data examples available (n) is very small compared to the number of features in each data example (p), which is the "large p, small n" problem. Indeed, Geoff Hinton, the godfather of deep learning, said recently: "One problem we still haven't solved is getting neural nets to generalise well from small amounts of data".

Most existing solutions for the "large p, small n" problem represent data as vectors. With growing data dimensionality, such vector-based methods become inadequate for severe "large p, small n" problems, e.g., machine learning on fMRI data. fMRI data are sequences of 3D volumes, i.e., 4D data. They are noisy, big, and multidimensional, making comprehensive manual analysis infeasible and machine learning challenging. A typical whole-brain fMRI scan sequence has tens of millions features (voxel measurements), with a file size over 100MB. For such data, even a simple linear basis needs tens of millions parameters (deep learning will need far more) but in practice we often only have sequences for dozens of individuals available in a particular fMRI study due to high cost.

Therefore, we aim to develop a new machine learning method for severe cases of "large p, small n" for multidimensional data such as whole-brain fMRI. We will take a tensor-based approach, where a tensor refers to a multidimensional array. Tensor-based methods have a much smaller number of parameters than vector-based ones. For typical whole-brain fMRI data above, a tensor-based multilinear basis needs only a few hundreds parameters, several orders of magnitude smaller than those needed by a vector-based, linear basis. We will generalise the state-of-the-art sparse feature learning methods for vector input to tensor-based ones for tensor input.

This will be the first study to learn sparse features directly from tensor representations of multidimensional data in a scalable and interpretable way. We will apply our algorithms to a large fMRI dataset on attention deficit hyperactivity disorder (ADHD) to accomplish two major tasks: prediction and interpretation. Firstly, we will detect ADHD and classify its subtypes via a small number of automatically selected voxels. Secondly, collaborating with a brain imaging expert, we will analyse the connectivity of brain regions corresponding to selected voxels to interpret the classification results, gain insights, and identify biomarkers to assist clinicians in further diagnosis and treatment. Our results will be fully reproducible with the dataset in the public domain and our software to be released as open source. The success of this project will advance the state-of-the-art of machine learning and provide a new enabling software tool to applications with severe "large p, small n" problems such as medical imaging with high-cost scanners (e.g., MRI or 3D mammography machines) and translational bioinformatics with big genomic data.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.shef.ac.uk