CRISPRon(v1.0): CRISPR-Cas9 guide efficiency prediction

Help


The CRISPRon webserver is dedicated to the design of guide RNAs (gRNAs) for the CRISPR/Cas9 system. The CRISPRon model predicts the efficiency of CRISPR/Cas9-gRNAs in cleaving a target site on the DNA (on-target). Using the webserver, you can find possible CRISPR/Cas9 targets from an input sequence and obtain the predicted Cas9 cleavage effiency of the matching gRNA (indel frequency obtained at approx. 3 days after gRNA delivery).

Once potential gRNAs and corresponding target sites have been selected in CRISPRon, they can be submitted to out webserver for off-target assessment CRISPRoff directly from the CRISPRon results page (visit the CRISPRoff webserver for more info on off-targets). The gRNA specificity score computed by CRISPRoff, which is a measure of how well the gRNA binds to its on-target considering possible genome-wide off-targets, can then be imported back in the CRISPRon results page for a final selection of the gRNAs. We recommend to select gRNAs with maximum predicted efficiency and specificity. Note that the sequences reported as "targets" are the same of the gRNAs plus the PAM: targets are here written in 5'-3' direction as the gRNAs, while the actual strand targeted by the gRNA is the opposite (3'-5').

The webserver can be used with three different kinds of input: a genomic range, a gene name, or a custom input sequence. For the first two cases a target genome must be specified, whilte for the latter the given sequence may or may not map to a target genome.

Format for INPUT-1: Targeting a genomic region


First, select one of the listed genomes in the drop down menu. If the genome you are interested in is not listed, please use the "Format for INPUT-3" described below. Then, type in a genomic region in the dedicated form, and press submit. The genomic region should be specified in the following format: chrN:start-end, where chrN is the identifier of the chromosome, while start and end are the genomic ranges, as in the example below.

chr5:73,164,226-73,170,298

Format for INPUT-2: Targeting a gene


First, select one of the listed genomes in the drop down menu. Then, select a gene in the form by typing its name (a drop-down menu will appear after 3 characters), and press submit. In this mode, only exons (coding exons or UTRs) will be considered as targets. If you wish to target introns you must use one of the other input options. If either the genome or the gene is not listed, you should use the "Format for INPUT-3" and manually enter your target sequence as explained below. In addition to targets in exons, targets in an additional 300 nt sequence up-stream and down-stream of each transcript are also shown in the output. Input example:

SLC10A4

Format for INPUT-3: Targeting a custom input sequence


Paste in a target DNA sequence in plain or in fasta format as shown in the examples below and press submit. If your sequence is from one of the organisms listed in the drop down menu, it is recommended to select that organism in the form, even if your input sequence doesn't map perfectly to the target genome.
Example of sequence in plain:

ATCGTTGCGTACGGTACGTCCTGACGTAGGGCACGCTCGATCGAGTTCGGACCTGTAGGGATCGAGGCTTGTACGGACC
TCACGATCGATCCCGATCGGAATGC

Example of sequence in fasta format:

>myseq_id
ATCGTTGCGTACGGTACGTCCTGACGTAGGGCACGCTCGATCGAGTTCGGACCTGTAGGGATCGAGGCTTGTACGGACC
TCACGATCGATCCCGATCGGAATGC
Hint: how to get a DNA sequence. A possible way to obtain the DNA sequence of a given region or gene is to search for it in the UCSC genome browser and then use View->DNA. If the region contains a mutation in your subject, edit the sequence accordingly before submitting it to CRISPRon (see "Examples of input/output" below).

Input criteria. The input must contain only the following five canonical bases 'A,a', 'C,c', 'G,g', 'T,t' and 'U,u' or unknown bases 'N,n'. Targets that include unknown bases are omitted from the output. Predictions are made for a possible target site only if at least 30 nt made of 4 nt + target (20 nt) + PAM (NGG) + 3 nt are present in the given sequence.

Email and custom job name


It is not necessary that you provide an email address or a custom job name. If you do provide an email address we will inform you by email when the computation for your job finishes. The email will include the optional job name (if specified, "crispron" otherwise) and a link that you can use to access the results for your job. The results are stored on the server for 14 days; after that, your results and your email address are deleted from the server.

Examples of input/output


Here we show the input-output for a search done using a custom sequence mapping in part to the human genome assembly hg38. The output of the other input types is similar, but extra care is needed when the input sequence is different from the reference assembly.

Input

For this example we input a custom sequence, which consists in a portion of the IGF1R gene carrying the rs1409058783 G>A mutation at position chr15:98707586. The DNA sequence in a region of 100 nt left and right the mutation (below shown in bold) was retrieved from the UCSC and manually edited as follows:

Wild-type sequence:
CTGTATTATTGTTTGGAAAATAGTTTAAAAATTATTTCCTTCTAACTGAGACGTTTACCCTCTTGTCTCCCTTCAGTCT
GCGGGCCAGGCATCGACATCCGCAACGACTATCAGCAGCTGAAGCGCCTGGAGAACTGCACGGTGATCGAGGGCTACC
TCCACATCCTGCTCATCTCCAAGGCCGAGGACTACCGCAGCTAC
Sequence carrying rs1409058783 G>A mutation:
CTGTATTATTGTTTGGAAAATAGTTTAAAAATTATTTCCTTCTAACTGAGACGTTTACCCTCTTGTCTCCCTTCAGTCT
GCGGGCCAGGCATCGACATCCACAACGACTATCAGCAGCTGAAGCGCCTGGAGAACTGCACGGTGATCGAGGGCTACC
TCCACATCCTGCTCATCTCCAAGGCCGAGGACTACCGCAGCTAC

As explained above, the input is provided to the webserver by selecting the hg38 genome and pasting the target sequence in the appropriate field.

>hg38_dna
CTGTATTATTGTTTGGAAAATAGTTTAAAAATTATTTCCTTCTAACTGAGACGTTTACCCTCTTGTCTCCCTTCAGTCT
GCGGGCCAGGCATCGACATCCACAACGACTATCAGCAGCTGAAGCGCCTGGAGAACTGCACGGTGATCGAGGGCTACCT
CCACATCCTGCTCATCTCCAAGGCCGAGGACTACCGCAGCTAC

Output

The output consists in an interactive view implemented in the IGV (Integrative Genoimc Veiwer) browser and a table of results, in which details are reported for each predicted target that can be cleaved with efficiency >= 50%. The results can also be downloaded as a zip folder by clicking on "Download results". The folder will include the results table in csv format and bed files of the predicted targets. Note: you can open the csv table in Excel, just remember to save it as xlsx (Excel file) if you edit it with colors, font changes, filters etc. Below instructions on how to navigate through the results in the IGV interactive view and in the table view are given. To see the results on the UCSC genome browser instead, click on "View in UCSC".

The IGV interactive view


An example of the IGV browser view is displayed in the figure below, where the genome view is at the top and the region view is at the bottom. The genome view is relative to where your custom sequence maps in the genome, while the region view is relative to your custom sequence. You can select one of the two views by clicking on the tabs located in the top left margin of the IGV view.

In IGV you can: If the given sequence matches a region in the specified genome then the matching region is shown in the genome view. Note that this view shows only the reference sequence, and does not include eventual mutations present in the sequence given as input. Mutations are instead visible in the region view, in which coordinates are relative to the input sequence. The region view does not contain annotations (transcript, repeatmasked nucleotides...).

Target regions, which represent potential gRNAs, are represented by horizontal coloured bars. The efficiency is reflected in the color of the bar, which follows a color scale from yellow to blue. The directionality is shown by arrows within the bars. Additional information can be obtained by clicking on a bar. Each target has a unique identifier, which takes the form of m_xx or p_xx where m and p signal the strandness (+ or -) and xx represents the start position of the target within the given region.


CRISPRon output view

The table view


The table view shows the details of all "Best targets", which are those targets that can be cleaved with efficiency >= 50% (a table including all possitble targets can be obtained in csv format by clicking on "Download results"). The table can be sorted by clicking on the header of a column. The direction of the sorting will then be indicated by the triangles next to the column's name. By default, the table is sorted by target coordinates.
You can filter the table by typing a filter in the first row. The table can be filtered by:

Click on the id of a target/gRNA to jump to its location on the IGV view! NB: the gRNA has the same sequence of the target, excluded the PAM. This is because targets in the table are written in 5'-3' direction, which is the same as the gRNA; the actual strand targeted by the gRNA is the opposite (3'-5').

CRISPRon output table

Obtaining specificity scores


By default, specificity scores are Un-computed. This is because computing the specificity of a gRNA is computationally very expensive, as the full genome needs to be searched to find potential off-target sites, which then need to be evaluated. Therefore, the specificity is comuted only for selected gRNAs (targets). Please follow these instructions to obtain the specificity scores (see also the figure below):