1. Blastn
A,T,G,C와 같은 염기서열 데이터를 염기서열 데이터에 매칭 시켜 비슷한 서열을 찾아냄.
*-outfmt '6 qseqid qstart qend sseqid sstart send qcovs sseq bitscore'
를 통해 bitscore를 출력할수 있다. Bitscore가 높을 수록 더 유사도가 높다. 때문에 추후에 bitscore로 결과물을 정렬시킬 수 있다.
명령어:
makeblastdb -in Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.cdna.fa -dbtype nucl
blastn -query /scratch/3.scATAC_flo/11.Bulk_ATACseq_Comparison/Maizev3/Zea_mays.AGPv3.22.cdna.all.fa -db /scratch/3.scATAC_flo/0.Reference/Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.cdna.fa -outfmt '6 qseqid qstart qend sseqid sstart send qcovs sseq bitscore' -num_threads 40 -out /scratch/3.scATAC_flo/11.Bulk_ATACseq_Comparison/BlastResult.txt
결과 파일:
GRMZM2G356204_T01 1 2342 Zm00001eb210180_T001 90 2431 100 GGTTTTAGCCTCCTCCGATGCAGCCGCCTCGCCGCGCGCTTGTCACATCCCTCCTCCGCCTCCGCTCCTTCTCCTCCATTGCCTATCCTCATCCCTACCCACCCGCGCCCCTGCGACGCCACCAGTTCGTCGCCGACCCCACCACCTCCACAAACCGTGGTATCGTCGGGGGCATCGGCGGCGTCGGTTCCGGGAACGGGAACCTCTTGGACCCGACGCAGCTCCTCCGCGATGACCCGGTGGCAATCACCGCTTCCCTTTGGGTATCCTCCTTCCGCGCCGCCGCCTCCACCTCCAGCTCTTGTACCCCCACACCGCCGCAGCCACTCACCCCCTTCCTCTCTCGCCTGGAGCTGTGGGTGCTCGCCTACCAGAAGGCGTACGCTGACGAGACCGGCTCCTATCTGCCGCGCTCCTCCATCCCGGCCTCCACGCTCGCCTCCCTCCTAACGCTCCGCAACGCCGTACTCGACGGCCGCTTCCGCTTCGGCAACCGCCTCACCCCCATCCTCCAGTCCCCGCGCGCCGCCAACGCGCCGGACCCTGCCACCCTCTCCAAGCGCAAGCTCCGCGCCCTCCTTACAACCCCCGGCCCATCGCCCTTCCAGGACCGC
2.Blastp
단백질 서열 데이터를 단백질 서열 데이터에 매칭 시켜 비슷한 서열을 찾아냄.
명령어:
makeblastdb -in ./11.Bulk_ATACseq_Comparison/Maizev5/Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.protein.fa -dbtype prot blastp -query ./11.Bulk_ATACseq_Comparison/Maizev3/Zea_mays.AGPv3.22.pep.all.fa -db ./11.Bulk_ATACseq_Comparison/Maizev5/Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.protein.fa -outfmt '6 qseqid qstart qend sseqid sstart send qcovs sseq bitscore' -num_threads 40 -out /scratch/sb14489/3.scATAC_flo/11.Bulk_ATACseq_Comparison/BlastPResult.txt
GRMZM2G356204_P01 1 740 Zm00001eb210180_P001 1 740 100 MQPPRRALVTSLLRLRSFSSIAYPHPYPPAPLRRHQFVADPTTSTNRGIVGGIGGVGSGNGNLLDPTQLLRDDPVAITASLWVSSFRAAASTSSSCTPTPPQPLTPFLSRLELWVLAYQKAYADETGSYLPRSSIPASTLASLLTLRNAVLDGRFRFGNRLTPILQSPRAANAPDPATLSKRKLRALLTTPGPSPFQDRVV
3. 결과파일 1:1 매칭 Python 코드
infile = open(Path+"BlastPResult.txt","r")Dic = {} for sLine in infile: sList = sLine.strip().split("\t") BitScore = float(sList[8]) Q = sList[0] T = sList[3] Dic.setdefault(Q,{}) Dic[Q].setdefault(BitScore,"") Dic[Q][BitScore] = T infile.close() outfile = open(Path+"OneToOne_V3andV5.txt","w") Dic2 = {} for V3Gene in Dic: ScoreList = list(Dic[V3Gene].keys()) #print(ScoreList) max_value = max(ScoreList) #max_index = number_list.index(ScoreList) V5Gene = Dic[V3Gene][max_value] #print(V5Gene) Dic2[V5Gene.split("_")[0]] = V3Gene.split("_")[0] #print(Dic2) Out = open(Path+"FinalFiles.txt","w") BedFile = open("Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1_OnlyGene.bed","r") for sLine in BedFile: sList = sLine.strip().split("\t") GeneName =sList[3] if GeneName in Dic2: Out.write("\t".join(sList[0:3])+"\t"+GeneName+"/"+Dic2[GeneName]+"\n") else: Out.write(sLine) Out.close() BedFile.close()