Hi Ricardo
As you already know the basic procedure, i did not include explicative images or how to upload the files and will just indicate the steps. If you have any additional doubt call or email me and will explain you more details personally.
STEP 1: MATERIALS TO DOWNLOAD
First of all, download the bundle resource hg38 and deposit it in the folder where you are going to do the analysis.
here your have the link
Basically you have to do the following
To access the bundle on the FTP server, use the following login credentials in your favorite FTP client (for instance filezilla)
Or you can simply click on this link via browser,
ftp: //gsapubftp-anonymous@ftp.broadinstitute.org/bundle/
If you are asked for a password, leave it in blank
The bundle directory contains five subdirectories, one for each build of the human genome that we have resources for: b36, b37, hg18, hg19 and hg38 (aka GRCh38). Be aware that the hg38 resource set is provided as-is, and its contents may still be incomplete.
Go to the page of hg38 and download everything except the beta folder. Notice that there are some compressed files.
then you are ready for the analysis.
STEP 2: QUALITY AND PREPROCESSED.
you already knows how to do this with the interfaces for FastQC for quality analysis and prinseq and cutadapt for preprocessing. It works as previously done.
STEP 3: MAPPING
If you want to do it with Bowtie do it as in previous cases but use the reference genome you download from the bundle
Homo_sapiens_assembly38.fasta.gz
Homo_sapiens_assembly38.fasta.fai
POST PROCESSING STEPS
STEP 4: MARK DUPLICATES
follow the following path in VariantSeq
Postprocessing -> Picard Tools -> mark duplicates
As an option you tell it to create index = true
STEP 5: RE-ALING AROUND INDELS
follow the following path in VariantSeq
Postprocessing -> GATK Tools -> Indel Local Realigment
In the “known sites files” field put the two vcf files of the bundle
Mills_and_1000G_gold_standard.indels.hg38.vcf
1000G_phase1.snps.high_confidence.hg38.vcf
You do not need to put options, let the tool work by default
STEP 6: BSQR
The same as in previous cases
path on VariantSeq
Postprocessing -> GATK Tools -> BSQR
In the field of “Known sites files” you can put these three vcf bundle
dbsnp_146.hg38.vcf
1000G_phase1.snps.high_confidence.hg38.vcf
hapmap_3.3.hg38.vcf.gz
again, you do not need to put options, let the tool work by default
ALING STEPS
7) CALL OF VARIANTS
path on VariantSeq
Variant Calling -> Mutect2
Same as always but read this web first for details
as you do not have tumor-normal pairs nor panel of normals select Tumor only mode
In options put the bundle file wgs_calling_regions.hg38.interval_list in options (–L intervals)
7) RECALIBRATION OF VARIANTS BY QUALITY VALUE (VQSR)
path on VariantSeq
Variant Filtering -> Variant Quality Score Recalibration
There are three types of training resources, training sites, truth sites and known sites. According to this put these files below where correspond in the interface
dbsnp_146.hg38.vcf
1000G_phase1.snps.high_confidence.hg38.vcf
hapmap_3.3.hg38.vcf.gz
Mills_and_1000G_gold_standard.indels.hg38.vcf
1000G_omni2.5.hg38.vcf
8) ANNOTATION OF VARIANTS
Path on Variantseq
annotation -> VEP