Table of Contents

Below are bibliography notes related to the use of RAD tags - (§), (§§) and (§§§) denotes papers of particular interest, and (Rev) denotes reviews. Please note that this bibliography is not exhaustive!

Miller (2007) - Genome Research (§§§)

Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers

  • Summary
    • Eric A. Johnson's group at the University of Oregon
    • Definition of RAD tags: "Restriction site associated DNA (RAD) tags are a genome-wide representation of every site of a particular restriction enzyme by short DNA tags"
    • Here, looking for polymorphism that disrupt restriction sites (like RFLP, AFLP) but for many more loci than the previous methods.
    • Not sequencing but use of a micro-array platform
    • "RAD markers […] allow high-throughput, high-resolution genotyping in both model and nonmodel systems"
  • Interest of RAD tags
    • Genetic mapping (recombination mapping, association mapping) needs "high-throughput, cost-effective methods to identify and genotype polymorphisms"
    • Previous methods:
      • allele-specific oligonucleotide microarrays (thousands of SNPs scored per hybridization)
      • direct hybridization of labelled genomic DNA to short oligonucleotide arrays
      • single feature polymorphisms (SFP) can be discovered and typed in a similar way
    • Those methods need costly preliminary steps (sequencing, oligonucleotide synthesis) and are not easily feasible for non-model organisms
  • Principle
    • Very good illustration in Figure 1.
    • Digest - link tags - shear DNA - purify tags - release tags - type tags
  • Demonstration
    • Mapping of a breakpoint on a Drosophila chromosome
    • Identification of the major lateral plate locus in three-spine stickleback
  • Discussion
    • Here RAD markers use the restriction site as the polymorphic marker (no sequencing)
    • Easy to produce microarray type of genotyping resource
    • Bulk genotyping is possible
    • If markers location is determined, genetic mapping is possible

Miller (2007) - Genome Biology (§)

RAD marker microarrays enable rapid mapping of zebrafish mutations

  • Developed a RAD marker microarray resource for the zebrafish
  • Bulk segregant approach to map mutations
  • Assay for individual RAD markers in pooled populations

Lewis (2007) - Genetics

Baird (2008) - PLoS One (§§§)

Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers

  • RAD tags associated with Illumina sequencing for SNP discovery
  • SNP identified by sequenced polymorphism and disrupted restriction sites
  • Marker density can be tuned by the choice of the enzymes (and their number)
  • Samples can be multiplexed by barcoding
  • Useful with or without a reference genome

Hohenlohe (2010) - PLoS Genetics (§§§)

Population Genomics of Parallel Adaptation in Threespine Stickleback using Sequenced RAD Tags

A major paper about sequenced RAD tags.

  • Use natural populations, not lab crosses
  • Genome scan of nucleotide diversity and differentiation in 5 natural populations of threespine stickleback
  • Illumina-sequenced RAD tags
  • Identify and type more than 45000 SNPs in each of 100 individuals
  • Genomic regions with balancing and divergent selection, consistent across populations
  • Identify the genes in regions of interest (regions of parallel differentiation across independent populations)
  • Methods:
    • Genome is available
    • Barcoded sample
    • Kernel smoothing and permutation testing

Hohenlohe (2011) - Molecular Ecology Resources (§)

Next-generation RAD sequencing identifies thousands of SNPs for assessing hybridization between rainbow and westslope cutthroat trout

  • Nonmodel organism (no genome available)
  • Salmonid fish, several whole-genome duplication
  • RAD sequencing to identify SNP loci with fixed allelic differences between introduced rainbow trout and native westslope cutthroat trout
  • Use 24 barcode-labelled individuals
  • Use Stacks for analysis (genotyping)

Emerson (2011) - PNAS (§§)

Resolving postglacial phylogeography using high-throughput sequencing

  • Fine-scale phylogenetic relationships between recently diverged (from a refugium after the recession of the Laurentide ice sheet 22000-19000 BP) populations of pitcher plant mosquito
  • Discussed also in Puritz 2012 ("Next-Generation Phylogeography: A Targeted Approach for Multilocus Sequencing of Non-Model Organisms") in which they use 454 sequencing:

    "However, few of [the NGS] technologies have been applied in phylogeographic studies, with the exception of Emerson et al. (42) that used restriction-site-associated DNA tags (RAD tags) (43,44) to determine the evolutionary relationship between recently diverged populations of pitcher plant mosquitos. This methodology can genotype multiple populations at thousands of SNP loci simultaneously, but have limited ability to survey large sample sizes within populations because of the cost. For example, Emerson et al. genotyped 21 different population at ,13,627 different SNPs but with only 6 individuals per population the authors were forced to restrict their analysis to 3,741 SNPs ‘‘that were fixed within at least two populations and were variable among populations’’ and generate one consensus sequence per locus per population (42). Our targeted 454 sequencing methodology offers a compromise with the ability to sequence a reasonable sample size (20 individuals) from one population for 10 different nDNA loci (,400 bp each) in 1/16th of a plate of 454 sequencing, at lower cost than a RAD tag analysis."

Etter (2011) - PLoS One

Local De Novo Assembly of RAD Paired-End Contigs Using Short Sequencing Reads

  • Use paired-end sequencing of RAD fragments to build contigs on the side which is cut by random shearing
  • Identify SNP and determine haplotype in threespine stickleback
  • Produce overlapping contigs of several hundred nucleotides in E. coli and threespine stickleback
  • A circularization step can allow a local assembly of up to 5 kb

Etter (2011) - Methods in Molecular Biology (Rev)

SNP Discovery and Genotyping for Evolutionary Genetics Using RAD Sequencing

  • Review chapter about RAD genotyping (RAD with Illumina sequencing) or RAD-seq

Davey (2011) - Briefings in Functional Genomics (Rev)

RADSeq: next-generation population genetics

  • Review about RAD-seq and its use for population genomics
  • Interest for non-model organisms, wild populations and ecological population genomics

Davey (2011) - Nature Reviews Genetics (Rev) (§)

Genome-wide genetic marker discovery and genotyping using next-generation sequencing

  • Review about NGS methods that use restriction enzyme digestion to reduce the genome complexity
  • Include reduced-representation sequencing using reduced-representation libraries (RRLs) or complexity reduction of polymorphic sequences (CRoPS)
  • Include RAD-seq
  • Include low coverage genotyping

O'Rourke (2011) - Genetics

Rapid Mapping and Identification of Mutations in Caenorhabditis elegans by Restriction Site-Associated DNA Mapping and Genomic Interval Pull-Down Sequencing

  • Forward genetic screens in C. elegans (association between phenotypes and mutated genes)
  • Use RAD-seq for rapid mapping of the mutations
  • Use genomic interval pull-down sequencing (GIPS) to capture and sequence large portions of a mutant genome (ca. Mb)

Pfender (2011) - Theor. Appl. Genet. (§)

Mapping with RAD (restriction-site associated DNA) markers to rapidly identify QTL for stem rust resistance in Lolium perenne

  • Use RAD-seq to produce maps and identify QTL (in association with simple sequence repeats (SSR) and sequence-tagged sites (STS))
  • Reads from parental plants (2) and from F1 (188)
  • High variability in the number of reads per F1 (14000 to 935000)

Chong (2011) - Bioinformatics

Chutimanitsakun (2011) - BMC Genomics

Construction and application for QTL analysis of a Restriction Site Associated DNA (RAD) linkage map in barley

  • Use RAD protocol to identify markers, build a linkage map and identify QTL
  • Low number of RAD markers (530) but obtain results comparable with high density EST-based SNP map

Rowe (2011) - Molecular Ecology (Rev)

RAD in the realm of next-generation sequencing technologies

  • Review of RAD as a form of genotyping-by-sequencing
  • Applications in genetic mapping, QTL, phylogeography, population genomics

Flight (2012) - Integrative and Comparative Biology (§)

Genetic Variation in the Acorn Barnacle from Allozymes to Population Genomics

  • Use allozymes, microsatellites, mtDNA and RADseq
  • RADseq on pooled samples (20 individuals per pool) of 3 populations (2 US, 1 UK)
  • Use SOAPdenovo assembly
  • Fst outlier analysis (using unbiased estimates and empirical distribution, with the 1% tail of the distribution being the outliers)

Hohenlohe (2012) - Phil. Trans. R. Soc. B (§§)

Extensive linkage disequilibrium and parallel adaptive divergence across threespine stickleback genomes

  • Study patterns of local and long-distance linkage desequilibrium (LD) across oceanic and freshwater populations of threespine stickleback
  • Look for association between LD and signatures of divergent selection
  • Assess the role of recombination rate variation in generating LD patterns
  • Reuse RAD-seq data from Hohenlohe 2010 (PLoS Genetics) and also generate new data with a modified RAD-seq protocol on 87 F2 individuals and the parents and F1 individuals
  • Align raw reads to reference genome with Bowtie
  • Use likelihood methods to infer the genotype at each nucleotide

Houston (2012) - BMC Genomics (§)

Characterisation of QTL-linked and genome-wide restriction site-associated DNA (RAD) markers in farmed Atlantic salmon

  • RAD-seq on two Atlantic salmon families
  • Offspring were classified into resistant and susceptible to a disease, and for each family DNA from the parents and 7 offsprings of each phenotype was sequenced by RAD-seq in multiplexed pools
  • Use of pedigreed samples allowed to distinguish segregating SNPs from putative paralogous sequences (recent genome duplication in salmonids)
  • 50 SNPs linked to QTL and a subset used for high-throughput genotyping across large commercial populations of disease-challenged fry

Peterson (2012) - PLoS One (§§§)

Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species

  • Introduce the double-digest method
  • Combinatorial barcoding for larger number of individuals per multiplex
  • Detailed pipeline of reads analysis
  • Also compare the observed size distribution with in silico digestion and statistical models of size selection

Rubin (2012) - PLoS One (§§)

Inferring Phylogenies from RAD Sequence Data

  • Simulate RAD-seq data collection for interspecific phylogeny reconstruction in Drosophila, mammals and yeast
  • Develop a workflow to test if informative data can be gained from RAD-seq and used for accurate reconstruction of "known" phylogenies
  • RAD-seq seems promising for phylogenies in younger clades (e.g. 60 My vs 100 My) in which enough orthologous restriction sites are retained across species

Bruneaux, Johnston (2012) - Molecular Ecology (§)

Molecular evolutionary and population genomic analysis of the nine-spined stickleback using a modified restriction-site-associated DNA tag approach

  • Double digest RAD-seq on pooled samples of ninespine sticklebacks
  • Mapping back to the related threespine stickleback genome
  • Issues with low coverage per population, resulting in data being pooled per ecotype
  • Kernel smoothing of differentiation values along the genome
  • Permutation testing and gene ontology enrichment test

Miller (2012) - Molecular Ecology

A conserved haplotype controls parallel adaptation in geographically distant salmonid populations

  • Laboratory crosses and NGS to study the genetic architecture of parallel adaptation of rapid development rates in two salmon populations
  • Find parallel genetic mechanism and conservation of haplotype
  • Use custom Perl scripts to process the data and Novoalign for alignment

Bourgeois (2013) - Molecular Ecology Resources

Mass production of SNP markers in a nonmodel passerine bird through RAD sequencing and contig mapping to the zebra finch genome

  • RAD-seq with Illumina HiSeq2000 on nonmodel passerine bird (paired ends)
  • 6 pools of 18-25 individuals on one lane
  • 600000 contigs, of which 386000 mapped back to zebra finch genome
  • 80000 SNPs mapped

Gautier (2013) - Molecular Ecology (§§§)

Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping

  • Mathematical analysis showing that next-generation sequencing methods applied on DNA pools from diploid individuals give at least the same accuracy as individual-based analyses for SNPs
  • Remains true when unequal contributions of each individual to the final pool
  • Introduce effective pool size and Bayesian model to estimate it
  • Application provided to assess the accuracy of allele frequency estimates
  • Illustration with theoretical examples and real data sets
  • "[NGS of DNA pools] allows comparison of genome-wide patterns of genetic variation for large numbers of individuals in multiple populations"

DeFaveri (2013) - Molecular Ecology Resources (§§)

Characterizing genic and nongenic molecular markers: comparison of microsatellites and SNPs

  • Compares patterns of genetic diversity and divergence on various geographical scales for SNP vs microsatellites and genic vs non-genic regions
  • Marker type and location affec estimates of diversity and divergence
  • Between-lineage divergence higher with SNP but within-lineage divergence similar for both marker types
  • Broad-scale population structure resolved by both markers, but only microsatellites can delimit fine-scale population-structuring

Hecht (2013) - Molecular Ecology (§§)

Genome-wide association reveals genetic basis for the propensity to migrate in wild populations of rainbow and steelhead trout

  • RAD-seq in two wild populations of migratory steelhead and resident rainbow trout
  • genome-wide association study using 189 individuals, uniquely barcoded and distributed in five libraries, single reads, 100 bp
  • 504 SNP markers associated with propensity to migrate both within and between populations

Catchen (2013) - Molecular Ecology (§)

The population structure and recent colonization history of Oregon threespine stickleback determined using restriction-site associated DNA-sequencing

  • Study with populations which predate the last glacial maximum
  • Use the STRUCTURE program and reconstruct phylogeographic history
  • Results support the a break between coastal and inland populations, introgressive hybridization in coastal populations and a recent expansion in central Oregon
  • "Our results exhibit the power of next-generation sequencing genomic approaches such as RAD-seq to identify both historical population structure and recent colonization history"

Hale (2013) - Genes Genomes Genetics (§)

Evaluating Adaptive Divergence Between Migratory and Nonmigratory Ecotypes of a Salmonid Fish, Oncorhynchus mykiss

  • Association population genomics and association approaches enables to reveal the genetic basis of variable phenotypic traits, including in nonmodel species, if population structure does not overlap with the phenotypic differences (cf. Susan and Tutku paper about sea-wintering in Atlantic salmon)
  • RAD-seq on rainbow trout to study the divergence between migratory and non-migratory ecotypes
  • Can map some SNP markers using previous linkage maps that used RAD tags
  • Some SNPs fall into regions which contain QTL for migratory-related traits
  • Also use annotation of genome regions linked to the significant SNPs

Catchen STACKS (2013) - Molecular Ecology (§§§)

Stacks: an analysis tool set for population genomics

  • STACKS package: genotype-by-sequencing, SNP-by-SNP analysis, sliding window across a genome

Orsini (2013) - Molecular Ecolgy (Rev) (§)

Evolutionary Ecological Genomics

Review about evolutionary ecological genomics, initiated at the ESEB meeting 2011. It mentions RAD among other methods.

Hess (2013) - Molecular Ecology (§§)

Population genomics of Pacific lamprey: adaptive variation in a highly dispersive species

  • No strict homing behaviour but genetic heterogeneity and morphological and behavioural diversity
  • RAD-seq from 518 individuals (7 libraries of 96 individually barcoded samples)
  • 4439 SNPs, 162 potentially adaptive as detected by outlier tests (formed four groups of linked loci)
  • MATSAM software to associate them with various traits or geography
  • some SNPs aligned with gene region using the genome browser available for sea lamprey
  • Introduction: present the issues with Fst and population structure, refer to Bierne 2011, Le Corre and Kremer 2012, Excoffier and Lischer 2010, Narum and Hess 2011.
  • "Genomic scans using highly dispersive species may be one way to overcome this challenge, because minimal population structure provides a less noisy neutral back- ground of genetic divergence upon which adaptive markers can be readily distinguished as outliers (Pe′rez- Figueroa et al. 2010)."

Davey (2013) - Molecular Ecology (§)

Special features of RAD Sequencing data: implications for genotyping

  • RAD-seq associated with sequencing-by-synthesis method: stochastic count data, requires sensitive analysis to develop and genotype markers accurately
  • Exist several sources of bias not addressed by current genotyping tools (fragment bias, restriction site heterozygosity and PCR GC cnotent bias)
  • Study the effect of those biases on current analysis tools
  • Biases must be taken into account, but most RAD loci can be accurately genotyped by existing tools

McCormack (2013) - Molecular Phylogenetics and Evolution (Rev) (§)

Applications of next-generation sequencing to phylogeography and phylogenetics

  • Discuss methods of genome complexity reduction for phylogeography and phylogenetics
  • Discuss which tools to use and how to analyze this kind of data

Nevado (2014) - Molecular Ecology (§)

Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics

  • Guidelines for estimating variability and neutrality tests from NGS data on nonmodel species
  • Compare performances of genotype calling, genotype-haplotype calling and direct estimation
  • In the end, "[s]tudies without species-specific reference genome should thus aim for low read depth and avoid genotype calling whenever individual genotypes are not essential. Otherwise, aiming for moderate to high depth at the expense of number of individuals, and using genotype–haplotype calling, is recommended."

Hemmer-Hansen (2014) - Biological Bulletin (Rev)

Population Genomics of Marine Fishes: Next-Generation Prospects and Challenges

  • Review about the methods and challenges for population genomics in fishes

Schweyen (2014) - Biological Bulletin (§)

Detection and Removal of PCR Duplicates in Population Genomic ddRAD Studies by Addition of a Degenerate Base Region (DBR) in Sequencing Adapters

  • Present the issue of PCR duplicates in the ddRAD protocol (in paired-ends but single digest, the PCR duplicates can be detected with the 2nd read)
  • Propose an approach with a degenerate base region in the sequencing adapter to identify and remove the PCR duplicates
  • In a test, 34% of a ddRAD library were PCR duplicates distributed over 20% of the loci
  • Disproportionate amount of PCR duplicates in 5% of the loci
  • Not a problem for general parameter inference, but outlier loci detetion could be improved by ddRAD

Stuart (2014) - Science (§)

Rapid evolution of a native species following invasion by a congener

  • Detailed section in Methods (supplementary material) about the RAD-seq part
  • 6 nt barcodes to multiplex 192 individuals per lane (total 384 individuals), single-end sequencing Illumina HiSeq
  • Demulitplex the reads, filter barcode and correct restriction site with STACKS
  • Align reads against a reference genome with Bowtie2 (discard reads which map to more than one location)
  • Diploid genotypes called using a maximum likelihood approach (Catchen 2013 Stacks, Hohenlohe 2010 parallel evolution sticklebacks)
  • Neighbor-joining phylogenetic network with SplitsTree4
  • Pairwise Fst between populations

Puritz (2014) - PeerJ (§§)

dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

  • Pipeline for variant calling using paired-end RAD-seq data
  • Relies on existing software (one version on FreeBayes, one on GATK)

Andrews and Puritz paper and responses (2014) - Molecular Ecology (Rev) (§§§)

A set of three articles about the different RAD approaches in Molecular Ecology:

  • Andrews' original paper, Recent novel approaches for population genomics data analysis
    • Review about NGS and population genomics
    • Section about RAD-seq (in which they mention the issue with PCR duplicates)
    • Interesting section about genotype calling
    • "Luikart quoted Steve O’Brien (St. Petersburg State University) who recently said that >90% of the cost of WGS is the bioinformatics, emphasizing the need for scripting and programming skills."
  • Puritz's response, Demystifying the RAD fad
    • Disagree with the view that the original RAD protocol (Miller 2007, Baird 2008) minimize the impact of PCR duplicates compared to other RAD methods
    • Present additional biases in RAD-seq that should be considered when choosing a protocol
    • Review pros and cons of the different RAD protocols
  • Andrews' reply, Trade-offs and utility of alternative RADseq methods: Reply to Puritz et al.
    • Agree with Puritz that researchers must understand thoroughly the different RAD protocols and analysis before choosing one
    • Address some key points they find unclear or potentially misleading in their evaluation of techniques

Heliconius Genome Consortium (2012) - Nature (§)

Butterfly genome reveals promiscuous exchange of mimicry adaptations among species

  • Study hybridization and introgression in butterfly
  • RAD tags sequencing for linkage mapping to assign and order genome sequences onto the chromosomes
  • RAD resequencing to build a robust phylogenetic tree

Keller (2013) - Molecular Ecology

TODO read

Parchman (2013) - Molecular Ecology

TODO read

Heffelfinger (2014) - BMC Genomics (Rev) (§)

Flexible and scalable genotyping-by-sequencing strategies for population studies

  • Review methods of genome complexity reduction by restriction enzymes to perform genotyping-by-sequencing
  • Modify the protocol by using blunt-end enzymes so that Illumina adaptors can be used directly, dual barcoding system for multiplexing and bead-based library preparation for in-solution size selection without the need for columns and gels
  • "the sequenced portion of the genome was adaptable by selecting enzymes based on motif length, complexity, and methylation sensitivity"

Wu (2010) - BMC Genomics

SNP discovery by high-throughput sequencing in soybean

  • Pooled DNA fragment reduced representation library
  • Solexa short read sequencing to identify SNPs