Supplementary data: Spatially conserved regulatory elements identified within human and mouse Cd247 gene using high-throughput sequencing data from the ENCODE project
Sachin Pundhir, Tine Dahlbæk Hannibal, Claus Heiner Bang-Berthelsen, Anne-Marie Karin Wegener, Flemming Pociot, Dan Holmberg and Jan Gorodkin
Outline
A) The gene - Cd247
B) Materials and methods
A) The gene - Cd247 
- Type 1 diabetes (T1D) is a complex disease caused by genetic and environmental factors.
- CTLA-4 is a negative regulator of T cell activation and constitutes one of the genes that have been associated with human and murine T1D.
- Recent study by Lundholm et al., 2010 have demonstrated a mutant allele of gene, Cd247 that impairs T cell activation, resulting in deficient CTLA-4 expression.
- Cd247 gene encodes for a transmembrane protein that is important for assembly and expression of TCR/CD3 complex on the surface of T lymphocytes.
- Cd247 gene is transcribed from chr1:167718812-167807407 in Mouse and encodes for 5 alternativley spliced transcripts.
B) Materials and methods 
- In-silico:
All the in-silico analysis were performed on human genome assembly (hg19, Feb, 2009) and mouse genome assembly (mm9, Jul. 2007) using publicly available ENCODE data tracks accessible through UCSC genome browser and described here after- GENCODE Gene Annotation Tracks in human (Biotype: sense_intronic).
- Long RNA-seq from ENCODE/CSHL track in mouse (display mode: dense).
- ENCODE Transcription Factor ChIP-seq Peaks and Signal based on Uniform processing pipeline track in human (display mode: dense; Select all cell lines for two factors; first CTCF and second POLR2A).
- Transcription Factor Binding Sites by ChIP-seq from ENCODE/LICR in mouse (display mode: dense; Select all cell line for two factors: first CTCF and second Pol2).
- ENCODE Histone Modification Tracks in human (display mode: dense; select "Chromatin State Segmentation by HMM from ENCODE/Broad" and "Histone Modifications by ChIP-seq from ENCODE/Stanford/Yale/USC/Harvard" sub-tracks).
- ENCODE Histone Modification Tracks in mouse (display mode: dense; select all cell lines across two factors; first H3K4me1 and second H3K4me3).
- Comparison of read count at Cd247 gene and the ~28kb region within its first intron across multiple mouse tissues:
using UCSC table browser, we downloaded long RNA-seq alignment tracks corresponding to Adrenal, Colon, Duodenum, genfatpad, heart, kidney, lgint, liver, lung, magland, ovary, smint, spleeen, stomach, subcfatpad, testis and thymus (both replicates 1 and 2) in SAM format. Next using SAMtools, we computed the total read count that mapped to the positive strand of Cd247 gene and the ~28kb region within the first intron - Enrichment of Transcription factor binding sites (TFBS) in human:
using UCSC table browser, we selected 'peakSeq' tracks- hub_4607_SydhGm12878Ctcf20StdAlnRep0peakSeq, and;
- hub_4607_SydhGm12878Pol2lggmusAlnRep0peakSeq.
Select 'all fields from selected table' from the output format and upon clicking 'get output', we selected the region with the highest signal (chr1:167411351-167411818 and chr1:167424041-167425330, respectively). - Enrichment of Transcription factor binding sites (TFBS) in mouse:
using UCSC table browser, we selected 'peakSeq' tracks cooresponding to- wgEncodeLicrTfbsThymusCtcfMAdult8wksC57bl6StdPk, and;
- wgEncodeLicrTfbsThymusPol2MAdult8wksC57bl6StdPk.
Since, the ENCODE/LICR did not have a q-value associated with the TFBS sites. Two independent tracks- wgEncodePsuTfbsMelCtcfMImmortalC57bl6InputPk corresponding to the first region (chr1:167783553-167784553); and,
- wgEncodePsuTfbsMelPol24h8UImmortalC57bl6InputPk corresponding to the second region (chr1:167774825-167775825)
UCSC table browser was used to download or analyze the ENCODE data tracks corresponding to the genomic coordinates of Cd247 gene.