Somatic Mutation Analysis of Multiple Cancer Types Using TCGA Data

  • Liyang Liu
Keywords: Cancer, Data Analysis, Sparse Linear Regression, Gene Expression Data, Somatic Mutation Data


With the rapid development of sequencing technology and the accumulation of massive biomedical data, it has
created a more favorable and necessary condition for the study of the occurrence, development and metastasis
mechanism of cancer. How to effectively mine and process these large data has become the focus of scientific
research and attracted more and more researchers and medical experts attention. In cancer research, cancer cells
need to overcome the stress of survival caused by chronic hypoxia and/or ROS accumulation in order to survive.
When ROS levels exceed the cell's antioxidant capacity, the cell enters the state of oxidative stress. When some
gene mutations occur to replace specific cell activities, cancer cells will further deteriorate. Therefore, oxidative
stress and gene mutation are closely related to the occurrence and development of cancer, and are an important
driving factor for cancer initiation. Electron flux is a component that describes oxidative stress during cancer
development. The analysis of electron flux, that is, the analysis of redox flux, can describe the state of redox.
But quantifying electron flux is not as intuitive as metabolic flux. Therefore, in the complex process of cancer
occurrence and development, quantitatively describing the level of oxidative stress is a difficult problem in
cancer informatics research. At the same time, it provides a very important reference value for choosing correct
and effective methods of prevention and treatment. In order to ensure the universality of the analysis results,
transcriptome data, gene mutation data and clinical data of eight different cancer types were selected to
quantitatively describe and analyze the oxidative stress level of cancer, and the problems of high latitude, high
redundancy and low information content in large gene data analysis were effectively solved. For the first time,
this paper proposes to quantitatively describe the oxidative stress level of cancer by predicting the linear
correlation between gene expression level and gene mutation of gene set based on linear regression analysis. A
set of genes with biological significance was obtained to quantitatively describe the oxidative stress level of
cancer. The model is evaluated and validated. The discovery of similarities and differences among various
cancers and the corresponding gene set of cancer markers are of great practical significance for the prevention
and treatment of cancer.