1       Introduction

For an entire transcriptome study, RNA-seq based on next-generation DNA sequencing(NGS) is widely used for several advantages compared to microarray  (Wang, et al., 2009). Microarray has several limitations such as background effect which limits the accuracy of expression measurements, particularly for transcripts present in low abundance. On the other hand, RNA-Seq has considerable advantages for examining transcriptome fine structures such as the detection of novel transcripts, allele-specific expression, and splice junctions (Zhao, et al., 2014).
One of the aims for transcriptome study is to quantify the expression level of each transcript (Mantione, et al., 2014). Expression level in the early mouse embryo allows understanding development system (Sharov, et al., 2003).  Embryonic 9.5 days and 13.5 days were compared to find DEG in the different embryonic stages.

2       Methods

2.1 Collecting data and quality control

The 5 samples of RNA-seq data were downloaded from NCBI SRA database as summarized in the Table 1. The quality of each base of the paired end reads was visualized using fastQC. Using The raw reads were trimmed by filtering out adaptor-only nucleotides with 75bp minimum length, using Trimmomatic (ver 0.32; (Bolger, et al., 2014)).

Sample
Name
Read
Length
Read
Count
GC
Q20
Ratio
Q30
Ratio
SRS385801
101
37,903,432
47
0.99214
0.95068
SRS385800
101
31,195,373
48
0.99149
0.94682
SRS373441
101
27,153,928
48
0.9919
0.94854
SRS373442
101
36,245,323
48
0.98924
0.92878
SRS373443
101
50,201,130
47
0.99232
0.95056

Table 1. Raw data information

2.2    Alignment and counting reads

The resulting reads were aligned to the human genome (GRCm38.p5) using HISAT 0.1.5-beta (Kim, et al., 2015). Analysis for differential expression across the three conditions was performed by first obtaining read counts for each gene using HTSeq (Anders, et al., 2015) .Table2 shows the alignment rate.

SampleName
OverallAlignmentRate
Aligned Zero time
Aligned one time
Multiple Alignment
SRS385801
95.72%
4.28%
67.37%
28.35%
SRS385800
96.08%
3.92%
71.02%
25.06%
SRS373441
96.48%
3.52%
71.89%
24.59%
SRS373442
97.01%
2.99%
69.09%
27.93%
SRS373443
96.67%
3.33%
70.82%
25.84%
Table 2. Alignment rate

2.3    Differential Expressed gene(DEG) analysis


Differentially expressed genes were identified using edgeR Bioconductor package (Robinson, et al., 2010). This is based on Generalized linear model(GLM) which is used for the analysis of RNA-seq data by considering gene expression as a negative binomial. We used an FDR<0.05 significance cutoff for multiple testing adjustment (Benjamini and Hochberg, 1995).

2.4 Gene ontology (GO) and KEGG pathway analysis


GO database classifies genes and transcripts according to the three categories of Biological Process(BP), Cellular Component(CC) and Molecular Function(MF) provides information on the function of genes. To characterize the identified genes from DEG analysis, GO-based trend test was carried out through the Fisher’s exact test. A functional enrichment analysis was performed by the Database for Annotation, Visualization and Integrated Discovery (DAVID) (Huang, et al., 2007).
 The Kyoto Encyclopedia of Genes and Genomes (KEGG) are the major recognized pathway-related databases that record information on how genes are networked in various organisms. The functional implications of these modules were then examined by using DAVID to identify statistically significant KEGG Pathway terms.

Results

3.1 Statistical test for DEG



Multi-dimensional plot indicated that samples are distinguished by embryo date (Fig 1).  After checking the distance of samples, the statistical test for DEG was performed. The top 10 DEGs by FDR value were visualized (Fig 2).  ENSMUSG00000022037, whose gene symbol is Clu, showed higher abundance in 9.5 day compared to 13.5 day. ENSMUSG00000073294, whose gene symbol is AU022751, showed higher abundance in 13.5 day compared to 9.5 day. Total 1855 DEG were  selected by  FDR<0.05 significance cutoff.




Fig 1. Multi-dimensional plot of  five embryo samples.





Fig 2. Boxplot for relative abundance of expression



3.2 GO and KEGG PATHWAY analysis

Using the list of 1855 DEG, we investigated enriched function of DEG (Table 3). KEGG pathway showed the pathway related to the DEGs. Biosynthesis of antibiotics were significantly enriched in DEG. The formation of immune system is likely to be formed between 9.5 and 13.5 days. In biological process, the meiotic cell cycle was significantly enriched. Meiosis is a specialized type of cell division that reduces the chromosome number by half. Considering characteristic of embryo cell, these results reflect the biological function in development.


KEGG Term
Count
%
PValue
mmu01130:Biosynthesis of antibiotics
46
2.68
6.09E-10
mmu01200:Carbon metabolism
24
1.4
2.84E-05
mmu00010:Glycolysis / Gluconeogenesis
17
0.99
3.72E-05
mmu01100:Metabolic pathways
138
8.05
4.15E-05
mmu01230:Biosynthesis of amino acids
18
1.05
9.15E-05

GOTerm : Biological process
Count
%
PValue
GO:0051321~meiotic cell cycle
37
2.158693
2.36E-16
GO:0030154~cell differentiation
117
6.826138
1.57E-11
GO:0007140~male meiosis
15
0.875146
4.81E-09
GO:0007275~multicellular organism development
134
7.81797
6.97E-09
GO:0007283~spermatogenesis
67
3.908985
1.50E-08

Table 3. GO and KEGG PATHWAY analysis




References
Anders, S., Pyl, P.T. and Huber, W. (2015) HTSeq—a Python framework to work with high-throughput sequencing data, Bioinformatics, 31, 166-169.
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the royal statistical society. Series B (Methodological), 289-300.
Bolger, A.M., Lohse, M. and Usadel, B. (2014) Trimmomatic: a flexible trimmer for Illumina sequence data, Bioinformatics, 30, 2114-2120.
Huang, D.W., et al. (2007) The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists, Genome biology, 8, R183.
Kim, D., Langmead, B. and Salzberg, S.L. (2015) HISAT: a fast spliced aligner with low memory requirements, Nature methods, 12, 357-360.
Mantione, K.J., et al. (2014) Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq, Medical science monitor basic research, 20, 138.
Robinson, M.D., McCarthy, D.J. and Smyth, G.K. (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data, Bioinformatics, 26, 139-140.
Sharov, A.A., et al. (2003) Transcriptome analysis of mouse stem cells and early embryos, PLoS Biol, 1, e74.
Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a revolutionary tool for transcriptomics, Nature Reviews Genetics, 10, 57-63.
Zhao, S., et al. (2014) Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells, PloS one, 9, e78644.