Statistics for K-mer Based Splicing Analysis

It is well acknowledged that alternative splicing module plays a crucial role to identify the variations of the RNA transcriptomes. In high-throughput short-read RNA, splicing analysis is a challenging task due to the uncertainty and time complexity of reads alignments onto genome and transcriptome. In this paper, we introduce k-mer based statistical method for splicing event analysis. The k-mer based representation avoids timeconsuming reads alignment, and the significant differential k-mers between controlled group of samples are a good indicator of existence of certain types of splicing events. We explored statistical models including t-test, DESeq and likelihood ratio test to identify statistical significant differential k-mers. We also develop a faset k-mer mapping method instead of Bowtie for identifying whether a k-mer from reads data can be matched on genome or transcriptome.