---
title: "PMAPscore"
author: "Junwei Han,Yalan He,Xiangmei Li"
date: "`r Sys.Date()`"
output:
html_document:
toc: true
toc_depth: 3
toc_float: true
number_sections: true
self_contained: yes
highlight: pygments
vignette: |
%\VignetteIndexEntry{PMAPscore}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(PMAPscore)
```
# Introduction
Somatic mutations play an important role in cancer development and progression, and mutational profiling is used far more commonly than other 'omics' analyses in clinical practice.And nowadays immunotherapy has become a powerful clinical strategy for treating cancer.Tumor mutation burden (TMB) is a potential biomarker in immunotherapy.However, it is still insufficient to only rely on TMB to predict the response of immunotherapy.This package attempts to develop a new pathway-based mutation accumulate perturbation score (PMAPscore) which can be used as an auxiliary factor of TMB to further improve the accuracy of immunotherapy response prediction,efficiently using somatic mutations files in from either cBioProtal sources or any in-house studies as long as the data is in MAF format. Besides, we develop a multiple machine learning method using sample PMAPscore profiles to identify dysregulated cancer-specific dysregulated signal pathways, which can be a biomarker of prognostic and predictive for cancer immunotherapy.
# MAF field requirements
MAF files contain many fields ranging from chromosome names to cosmic annotations. However, most of the analysis in PMAPscore uses the following fields.
- Mandatory fields:
**Hugo_Symbol,**
**Entrez_Gene_Id,**
**Variant_Classification,**
**Tumor_Sample_Barcode.**
Complete specification of MAF files can be found on [NCI GDC documentation page](https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/).
# Installation
```{r echo = TRUE, results = 'hide',eval=FALSE}
install.packages("PMAPscore")
library(PMAPscore)
```
# Overview of the package
The **PMAPscore** package is a Bioinformatics tool to identify cancer-specific dysregulated signal pathway. And **PMAPscore** functions can be categorized into mainly Visualization and Analysis modules. Each of these functions and a short description is summarized as shown below:
1.Get the mutation status of all genes in each sample.
2.Calculate the mutation cumulative score of each pathway PMAPscore.
3.Identify cancer-specific dysregulated signal pathways.
4.Calculate the risk score of each sample and classify the samples according to the threshold.
5.Visualization results:
5.1 Plot patients' Kaplan-Meier Survival Curves.
5.2 Plot the ROC curve.
5.3 Plot patient-specific dysfunction pathways' waterfall plots.
5.4 Plot the histogram of the patients' drug response.
## Get the mutation status of all genes in each sample
We downloaded patients' mutation data from the cBioPortal database in MAF format. About the mutation status of a specific gene in a specific sample, we converted MAF format data into a mutation status matrix, in which every row represents the gene and every column represents the sample. In our study, we only extract the nonsilent somatic mutations (nonsense mutation, missense mutation, frame-shift indels, splice site, nonstop mutation, translation start site, inframe indels) in protein-coding regions.The function **get_mut_status** in the **PMAPscore** package can implement the above process. Take simulated melanoma data as an example, the command lines are as follows:
```{r out.width=24}
#load the mutation annotation file
data("maf_data")
#perform the function 'get_mut_status'
mut_status<-get_mut_status(maf_data=maf_data,nonsynonymous = TRUE)
#view the first five lines of mut_status matrix
mut_status[1:5,1:5]
```
## Calculate the perturbation score of each pathway PMAPscore.
According to the mutation status in each sample and the position of gene in the signal pathway,we obtain the PMAPscore of each pathway in a single sample by accumulating an additive sum of gene/protein relationship coefficients.The function **get_pfs_score** in the **PMAPscore** package can implement the above process. The calculation formula is as follows:
###gene perturbation's score:
$$GMPscore(g_{i})=△E(g_{i})+\sum_{i=1}^N β_{ij}\frac{GMPscore(g_{j})}{N_{ds(g_{j})}}\tag{1}$$
where $△E(g_{i})$ is the $i$ th gene's mutation status(1 is mutant and 0 is non-mutant); $β_{ij}$ is the relationship between genes gi and gj,if the gene j is directly interacted with the gene i according to the pathway description, $β_{ij} = 1$, else is 0; $N_{ds(g_{j})}$ is the number of genes downstream of the gene $j$.The $GMPscore(g_{i})$ denotes the gene $i$ mutation perturbation score.The $GMPscore(g_{j})$ denotes the mutation perturbation score of upstream gene $j$ of gene $i$.
###pathway mutation perturbation's score:
$$PMAPscore(p_{i})=\frac{\sum_{k=1}^NGMPscore(g_{k})}{N_{de}}$$
where$N$ is the total number of genes in pathway $i$;$GMPscore(g_{k})$ indicates mutation the perturbation score of genes in the pathway $i$;$N_{de}$ denotes the number of genes that have mutated in the pathway $i$;$PMAPscore(p_{i})$ indicates the perturbation score of pathway $i$.
```{r results='hide'}
#Method of obtaining data
data(mut_status,gene_Ucox_res,gene_symbol_Entrez)
#calculate the pfs_score of single sample
pfs_score<-get_pfs_score(mut_status[,1:2],percent=0.03,gene_Ucox_res,gene_symbol_Entrez)
#view the first five lines of pfs_score matrix
pfs_score[1:5,1:2]
```
## Identify dysregulated cancer-specific signal pathways.
In order to identify dysregulated cancer-specific signal pathways, we develop a multiple machine learning method using pfs_score. The multiple machine learning method consists of three sequential steps, as follows:
(1) Wilcoxon test was used to tentatively identify differentially PMAPscore pathways between death and alive sample groups.
(2) Univariate Cox regression was used to further select the prognosis-related features.
(3) Finally,Lasso regression model was also performed to select the potential predictivel features.
The function **get_final_signature** in the **PMAPscore** package can implement the above process.
```{r warning=FALSE}
#load pfs_score and survival data
data(pfs_score,sur)
# filter the survival-related pathways
final_signature<-get_final_signature(pfs_score,sur)
#view the final_character
final_signature
```
##Calculate the risk score of each sample and classify the samples according to the threshold.
The function **get_sample_classification** can get the risk score of a sample according to the mutation status of the genes contained in the sample, and classify the sample by the threshold we set.
```{r warning=TRUE, paged.print=TRUE}
#Load sample mutation data
data(mut_sam,gene_Ucox,symbol_Entrez,path_cox_data,sur,path_Ucox_mul,sig)
#Perform the function `get_sample_classification`
get_sam_cla(mut_sam,gene_Ucox,symbol_Entrez,path_cox_data,sur,path_Ucox_mul,sig,cut_off=-0.986)
#class_res
```
## Visualization results
(1) The function 'get_km_survival_curve' is used to draw Kaplan-Meier survival curves based on PMP-related risk score. The risk score is generated by the signature's PMPAscore and the coefficient of multivariate cox regression. The command lines are as follows:
```{r fig.height=6, fig.width=8,warning=FALSE,results='hold'}
#Load the data
data(km_data)
#Drawing Kaplan-Meier Survival Curves.
get_km_survival_curve(km_data,cut_point,TRAIN = TRUE,risk.table=TRUE)
```
(2) The function 'get_roc_curve' is used to plot a ROC curve.
```{r fig.height=6, fig.width=8, warning=FALSE, results='hold'}
#Get the data of ROC curve
data(roc_data)
#Drawing ROC Curves
get_roc_curve(roc_data,print.auc=TRUE,main="Objective Response")
```
(3) The function 'get_Oncoplots' takes output generated by read.maf and draws a GenePathwayOncoplots
```{r fig.height=6, fig.width=8,warning=FALSE,results='hold'}
#obtain the risksciore
data(km_data)
risk_score<-km_data$multiple_score
names(risk_score)<-rownames(km_data)
cut_off<-median(risk_score)
#load the data
data(final_signature,path_gene,mut_status,maffile)
#draw an GenePathwayOncoplots
get_Oncoplots(maffile,path_gene,mut_status,risk_score,cut_off,final_signature,"Gap junction")
```
(4) The function 'get_response_plot' is used to draw a histogram of the patient's drug response.The command lines are as follows:
```{r fig.height=6, fig.width=8,warning=FALSE,results='hold'}
#Load the data
data(km_data,response)
#Drawing the histogram.
get_response_plot(km_data,response,cut_point,TRAIN=TRUE)
```