1. Blastn

 A,T,G,C와 같은 염기서열 데이터를 염기서열 데이터에 매칭 시켜 비슷한 서열을 찾아냄.

*-outfmt '6 qseqid qstart qend sseqid sstart send qcovs sseq bitscore' 
를 통해 bitscore를 출력할수 있다. Bitscore가 높을 수록 더 유사도가 높다. 때문에 추후에 bitscore로 결과물을 정렬시킬 수 있다.

명령어: 
makeblastdb -in Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.cdna.fa -dbtype nucl
blastn -query /scratch/3.scATAC_flo/11.Bulk_ATACseq_Comparison/Maizev3/Zea_mays.AGPv3.22.cdna.all.fa -db /scratch/3.scATAC_flo/0.Reference/Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.cdna.fa -outfmt '6 qseqid qstart qend sseqid sstart send qcovs sseq bitscore' -num_threads 40 -out /scratch/3.scATAC_flo/11.Bulk_ATACseq_Comparison/BlastResult.txt

결과 파일:
GRMZM2G356204_T01       1       2342    Zm00001eb210180_T001    90      2431    100     GGTTTTAGCCTCCTCCGATGCAGCCGCCTCGCCGCGCGCTTGTCACATCCCTCCTCCGCCTCCGCTCCTTCTCCTCCATTGCCTATCCTCATCCCTACCCACCCGCGCCCCTGCGACGCCACCAGTTCGTCGCCGACCCCACCACCTCCACAAACCGTGGTATCGTCGGGGGCATCGGCGGCGTCGGTTCCGGGAACGGGAACCTCTTGGACCCGACGCAGCTCCTCCGCGATGACCCGGTGGCAATCACCGCTTCCCTTTGGGTATCCTCCTTCCGCGCCGCCGCCTCCACCTCCAGCTCTTGTACCCCCACACCGCCGCAGCCACTCACCCCCTTCCTCTCTCGCCTGGAGCTGTGGGTGCTCGCCTACCAGAAGGCGTACGCTGACGAGACCGGCTCCTATCTGCCGCGCTCCTCCATCCCGGCCTCCACGCTCGCCTCCCTCCTAACGCTCCGCAACGCCGTACTCGACGGCCGCTTCCGCTTCGGCAACCGCCTCACCCCCATCCTCCAGTCCCCGCGCGCCGCCAACGCGCCGGACCCTGCCACCCTCTCCAAGCGCAAGCTCCGCGCCCTCCTTACAACCCCCGGCCCATCGCCCTTCCAGGACCGC
 

2.Blastp
단백질 서열 데이터를 단백질 서열 데이터에 매칭 시켜 비슷한 서열을 찾아냄.

명령어:
makeblastdb -in ./11.Bulk_ATACseq_Comparison/Maizev5/Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.protein.fa -dbtype prot
blastp -query ./11.Bulk_ATACseq_Comparison/Maizev3/Zea_mays.AGPv3.22.pep.all.fa -db ./11.Bulk_ATACseq_Comparison/Maizev5/Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.protein.fa -outfmt '6 qseqid qstart qend sseqid sstart send qcovs sseq bitscore' -num_threads 40 -out /scratch/sb14489/3.scATAC_flo/11.Bulk_ATACseq_Comparison/BlastPResult.txt
결과파일:
GRMZM2G356204_P01       1       740     Zm00001eb210180_P001    1       740     100     MQPPRRALVTSLLRLRSFSSIAYPHPYPPAPLRRHQFVADPTTSTNRGIVGGIGGVGSGNGNLLDPTQLLRDDPVAITASLWVSSFRAAASTSSSCTPTPPQPLTPFLSRLELWVLAYQKAYADETGSYLPRSSIPASTLASLLTLRNAVLDGRFRFGNRLTPILQSPRAANAPDPATLSKRKLRALLTTPGPSPFQDRVV


3. 결과파일 1:1 매칭 Python 코드
infile = open(Path+"BlastPResult.txt","r")
Dic = {} for sLine in infile: sList = sLine.strip().split("\t") BitScore = float(sList[8]) Q = sList[0] T = sList[3] Dic.setdefault(Q,{}) Dic[Q].setdefault(BitScore,"") Dic[Q][BitScore] = T infile.close() outfile = open(Path+"OneToOne_V3andV5.txt","w") Dic2 = {} for V3Gene in Dic: ScoreList = list(Dic[V3Gene].keys()) #print(ScoreList) max_value = max(ScoreList) #max_index = number_list.index(ScoreList) V5Gene = Dic[V3Gene][max_value] #print(V5Gene) Dic2[V5Gene.split("_")[0]] = V3Gene.split("_")[0] #print(Dic2) Out = open(Path+"FinalFiles.txt","w") BedFile = open("Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1_OnlyGene.bed","r") for sLine in BedFile: sList = sLine.strip().split("\t") GeneName =sList[3] if GeneName in Dic2: Out.write("\t".join(sList[0:3])+"\t"+GeneName+"/"+Dic2[GeneName]+"\n") else: Out.write(sLine) Out.close() BedFile.close()