Novel technique for advanced data processing and analysis

Statisticians at the National University of Singapore (NUS) have introduced a new technique that marks a significant step forward in capturing and analysing complex data patterns more effectively and accurately. This could pave the way for advancements in various fields of research such as single-cell RNA sequencing.

The team, led by Associate Professor Yao Zhigang together with Research Fellow Su Jiaji from the Department of Statistics and Data Science under the NUS Faculty of Science, pioneered a novel method for effectively estimating low-dimensional manifolds, a generalisation and abstraction of the notion of a curved surface, hidden within high-dimensional data. This approach not only achieves cutting-edge estimation accuracy and convergence rates but also enhances computational efficiency through the utilisation of deep Generative Adversarial Networks (GANs).

This research was done in collaboration with Professor Yau Shing-Tung at Tsinghua University. Their findings have been published as a methodology paper in PNAS on 24 January 2024.

Overcoming the challenges of data analysis

The introduction of manifold fitting represents a significant advancement in the field of data processing and analysis, in addressing the shortcomings of previous approaches.

Conventional approaches in data processing and analysis tend to oversimply the representation of data, and hence often fail to capture the intricate, complex patterns present in high-dimensional data spaces, such as image databases, genomics, social media data, financial data, and data gathered via Internet of Things (IoT) sensor networks.

Manifold learning techniques have been developed to overcome these challenges by focusing on the intrinsic geometric structures of the data. However, existing manifold learning methods lack robustness and often give rise to inaccuracies and inefficiencies in data analysis. The NUS team came up with a novel technique to address this gap.

"By accurately fitting manifolds, we can reduce data dimensionality while preserving crucial information, including the underlying geometric structure. This represents a major leap in data analysis, enhancing both accuracy and efficiency. By providing a solution that overcomes the limitations of previous methods, our research paves the way for enhanced data analysis and offers valuable insights for diverse applications in the scientific community," said Assoc Prof Yao.

Applications in RNA sequencing and biodata analysis

The novel manifold fitting method has potential applications in areas such as RNA sequencing and the processing of biological data. Single-cell RNA sequencing data is inherently noisy, with disruptions stemming from biological variability and technical inaccuracies that can skew gene expression analysis and complicate assessments of cell similarity, especially within diverse populations. Traditional methods, including advanced deep learning techniques, often falter in precisely delineating cell relationships amidst this pervasive noise. In response, the NUS researchers introduced an innovative pipeline framework aimed at refining clustering accuracy and enhancing data visualisation in scRNA-seq research. 

The manifold fitting can also be integrated with deep learning to create a unified, low-dimensional representation of multi-modal biological data. This integration is expected to enhance the precision and effectiveness of disease prediction models, particularly for complex neurological conditions. By reducing data dimensionality while maintaining its essential features, a more holistic view of disease mechanisms can be offered which would advance the field of personalised medicine. 

Looking ahead, Assoc Prof Yao’s research team are continuing to develop the new framework to process more complex data, in collaboration with the research team at Tsinghua University.

Read more here.