Category: Customer support

NGS

Check genomic DNA

DNA extracted from complex matrices is often hard to quantify due to the presence of contaminants such as organic compounds. Moreover DNA coming from many matrices is difficult to be amplified, due to the low concentration or to the presence of genomic DNA of different origins or inhibiting compounds.

Save time and money! Check your DNA before the shipment.

We suggest you to use the following protocol, with a highly purified Taq Platinum HiFi (Invitrogen) with no E. coli DNA contaminants and tailed primers.

You can use a different Taq polimerase but it should be free from bacterial DNA and HiFi. You can use not-tailed primers but keep in mind that they will be more efficient than tailed ones.

Tailed primers for 16S rRNA V3-V4 (Takahashi et al. 2014):

Pro341F: 5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNBGCASCAG -3′
Pro805R: 5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACNVGGGTATCTAATCC -3′

Tailed primers for ITS2 region (White et al. 1990):

ITS3f: 5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGCATCGATGAAGAACGCAGC-3′
ITS4r: 5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCCTCCGCTTATTGATATGC-3′

MIXVolume (ul);
Buffer2,5
MgSO4 (50 mM)1
dNTPs mix (10 mM)0,5
Primer forward (10 uM)1
Primer reverse (10 uM)1
Taq0,2
DNA-free water13,8
Genomic DNA (3-10 ng/ul)5

The PCR cycle is the following:

1 cycle:       94°C for 1′;
25 cycles:   94°C for 30”, 55°C for 30”, 68°C for 45”;
1 cycle:       68°C for 7′.

Load 5 μl of the resulting PCR on agarose gel 1,5%: the lowest band concentration accepted is 5 ng/ul.

PCR1_ok

 

< PCR bands

< primers

 

ATTENTION!!!

Using different PCR conditions could give you results that are not reproducible in our lab.

We suggest to test different DNA concentration and send to us the sample providing the best results. 

 


NGS

Universal ITS primers

For Fungi Kingdom identification we use the following primers from White et al. (1990):

ITS3f -GCATCGATGAAGAACGCAGC

ITS4r -TCCTCCGCTTATTGATATGC

This primers amplify the ITS2 region, about 300 bp long.

 

Note:
we also privide the primers for ITS1 region from White et al. (1990) but it is not our standard service:

ITS1f – TCCGTAGGTGAACCTGCGG

ITS2r – GCTGCGTTCTTCATCGATGC


NGS

De novo bacterial genome: bioinformatic output

Our standard pipeline for the analysis of genome performs the following steps:

  1. Quality check and filter of the raw data (FASTQ format)
  2. De novo assembly of the reads, producing the contigs (in FASTA format)
  3. Gene prediction: Identification of putative coding regions (ORFs) and putative non coding genes (rRNAs, tRNAs…) producing a tabular output in GFF format
  4. Automatic annotation of the predicted genes in GBK (GenBank), GFF and FASTA format.

Read More


NGS

Downloading data from Cloud Storage

MEGA.nz is an encrypted storage provider we use to deliver our NGS data to customers. At the end of the sequencing project, you’ll receive the download link to access your data directly from the web browser.

Before clicking the link, you should first install the free plug-in available for Mozilla Firefox and for Google Chrome, the only browsers fully supported by the system (click on the appropriate link to download the plug-in).

Once the plug-ins are installed, you can enter the link in the address bar of Firefox or Chrome and view the data available for download. The interface is user-friendly and allow to select all or part of the data to be downloaded.

You can have an overview of the process watching this video.

 


NGS

16S Standard Analysis output

In the standard BMR pipeline, R1 and R2 raw reads are merged with FLASH software v.1.2.11 and quality filtered (Q>30). Reference-based UCLUST algorithm is used to pick the OTUs at 97% of similarity. OTUs are identified against Greengenes database v.13_8, collected in the .biom file and filtered at 0,005% abundance (Bokulich NA, et al., 2013) to eliminate spurious OTUs that are present at low frequency.

We deliver an archive via the cloud storage provider Mega . The most important file in the folder is an .html file that links to an interactive web page with sunburst, pie charts per sample and links for taxonomy table and .fastq reads download. An example of the output is available here.


NGS

16S/ITS Basic Offer output

We deliver raw reads in FASTQ format, using the cloud storage provider Mega. Raw reads are compressed in .gz format and divided into R1 and R2 archives (forward and reverse reads). Example:

ID12345_S53_L001_R1_001.fastq.gz
ID12345_S53_L001_R2_001.fastq.gz

An example of one raw read taken from the ID12345_S53_L001_R1_001.fastq.gz archive (decompressed) is provided in the following lines.

@M02007:29:000000000-AGKFV:1:1101:12161:1439 1:N:0:53
 TCCTACGGGAGGCAGCAGTGGGGAATCTTGTGCAATGGGGGCAACCCTGCCACAGCGCCGCCGCGTGTGTGACTCCTTCCCTCTGTTCTTTCACCTCTTTCATTCTTTCATCCACCCTCTTCCCCAACCCCTTTCTCTATTTACCTTCCCTCCAGAGGACCCCCCCCCCCCCTCCGTCCCCGCCGCCCCTCTCATACTTCTTCTCCTTCCGTTATTCTGTTTCTCTTCTCTTTCAGCGCTCCTTGGCGCCCCCTCCAGTCAGATCTTTCCGTCCTCGCCTCTCCCCGGTCCTTTTCTCCTT
 +
 BCCC@F:C7FG+FFEGGFGFGDGGCEFEGCF,ED9FEGC,+8+,:CBE,,:,C,,C++8+6@+8+8+8,C,,,,,,,,9:49B,,,<<,<,,,:5,95,95,,,,,,,,,,,,,,,4++8,5<?=,+8++++,,78,7,88,,<,,9,8,33>6,,,+,++5+35***4**43<=*3*3;**1**/**12*0*+030**+*+*+*2+++0*02*).0+*++0+++0**+1**+)+)+((*.(.+*(()()((*(,(0(/*).))())))*0)7304,*,()((,0)()))(0*))).4)*/

NGS

16S Advanced Analysis output

Overview

Advanced pipeline provides the same pre-processing steps and OTUs identification of the Qiime-based closed-reference analysis. Qiime 1.9.1 provides the alpha- and beta-diversity analysis, building rarefaction plots, distance matrices, UPGMA trees, 3D PCoA plots and providing statistical analysis of experimental group comparison. A partial example of the output is available here.

Methods

Advanced analysis is performed using the Qiime core_diversity_analysis with default parameters.

Multivariate analysis, including the ANOSIM and Adonis tests, is done running Qiime compare_categories. ANOSIM is a nonparametric statistical test that compares ranked beta-diversity distances between different groups and calculates a p-value and a correlation coefficient by permutation. Adonis multivariate analysis is instead included into ANOVA tests and verifies if variation within a category is smaller or greater than variation between categories. It calculates a pseudo F-value, a p-value, and a correlation coefficient (R^2). Significant p-values must be interpreted together with their R2 values to infer biological meanings from the results. Analyses are performed among the required groups and on different metrics: weighted and unweighted.

MatagenomeSeq is used to calculate specie differential abundance between experimental groups. It is a R package, implemented also in Qiime 1.9.1, performing:

  • OTU table normalization
  • Fisher’s test to evaluate the significance of the comparison
  • Level of OTU-category correlation and its significance tests with p-value correction through False Rate Discovery (FDR).

Requirements

To perform the Advanced Analysis, we need to know the experimental parameter for sample clustering in order to make statistical group comparison. The treatment column should provide at least 2 groups containing at least 5 samples. One table (mapping file) is considered as a single Advanced Analysis.

The following table is an example of what the customer should provide us by e-mail before sample shipment.

#SampleID BarcodeSequence   LinkerPrimerSequence  Treatment      Reverseprimer            Description

88078

CTRL

MicrobIT9

88085

HI10

MicrobIT19

88092

HI20

MicrobIT7

88098

HI30

MicrobIT1

Using this mapping file, a pairwise analysis is performed among samples of the four groups: CTRL (88078-88082), HI10 (88083-88088), HI20 (88089-88094) and HI30 (88095-88098).

More information about Qiime 1.9.1 beta-diversity analysis is available here.

 


NGS

RNA-Seq standard analysis and example output

Differentially expressed genes

The standard RNA-Seq service is designed for Differentially Expressed (DE) genes identification and is performed on sets of biological replicates (minimum 3 per condition, recommended 5 or more). To perform the bioinformatic analysis we need:
  • A reference genome and annotation (see below)
  • A list of experimental conditions (e.g. “control” and “treated”) and the sample belonging to them.
For each sample sequenced we perform:
  1. FastQC, adaptor trimming With this step, we evaluate the average quality of reads at each base position.
  2. Alignment against the reference genome With this step we map reads using hisat2 to the reference genome, downloaded from Ensembl, producing a  BAM file for each sample.
  3. Raw reads counts We count the number of reads mapped within each feature (annotation of the reference, from the same Ensembl list) using featureCounts or similar software.
  4. Collection of raw counts in a matrix
  5. Differential Expression Analysis In this step, we use the edgeR package to evaluate if a gene is significantly over- or under-expressed.

What can I choose to receive?

  • Raw data only – You’ll receive the sequencing output, i.e. the FASTQ files demultiplexed
  • Alignments and counts – Raw data plus steps 1-3  from the previous list
  • Differentially Expressed Genes – Raw data plus steps 1-5 from the previous list

What kind of files will I receive?

  • Sequencing data – Sequences produced with the Illumina NextSeq 500, in FASTQ format, usually compressed with GZip. These files are usually big, some to several Gigabytes each. They are the raw output of the experiment and it’s very important to safely archive them. We highly recommend that you backup them: this means having two independent copies of the data (included in all packages). You can download an example from the ENCODE project: Reads in FASTQ format (8,0 Gb)
  • Alignments – The output of the sequence alignment step is given in the BAM format, a compressed version of the SAM file (included in Alignments and Counts and Differentially Expressed Genes package). You can download an example from the ENCODE project: Alignment in BAM format.
  • Raw counts matrix – This is a simple text file containing a first column with all the genes included in the annotation, plus as many columns as the sequenced. It’s a file that usually has about 30.000 lines. The first lines looks like:

      ID1039508 ID1039509 ID039512 ID1039513 ID1039516
    ENSG00000000003 679 448 873 408 1138
    ENSG00000000005 0 0 0 0 0
    ENSG00000000419 467 515 621 365 587
    ENSG00000000457 260 211 263 164 245
    ENSG00000000460 60 55 40 35 78
    ENSG00000000938 0 0 2 0 1
  • edgeR output – The edgeR analysis will produce three files: under.txt and over.txt, a list of underexpressed and overexpressed genes, and DE.txt, a whole matrix reporting for each gene the log of fold change in the two conditions (logFC), the log of counts per million of reads (logCPM) and the corrected p-value. Please, refer to the edgeR Manual PDF for further details (included only in theDifferentially Expressed Genes package). The first lines of the DE.txt file look like:

      logFC logCPM PValue FDR
    ENSG00000275852 0.01352 -2.59210 0.85455 1
    ENSG00000261140 -0.28245 -1.38742 0.44214 1
    ENSG00000236211 -3.76650 -0.64187 2.22330 e-21 5.46551 e-19

What to do next

Once you receive the analysis of differentially expressed genes, you can analyze and annotate it using any tools taking as input a list of gene IDs, as the DAVID online tool. With DAVID you can, for example, annotate your results and perform Gene Set Enrichment Analysis (GSEA) to understand if a functional category is enriched in your list of over- or under-expressed genes if compared with the background (the whole transcriptome).