A single-nucleotide polymorphism-based - Semantic Scholar

A single-nucleotide polymorphism-based - Semantic Scholar

Erschienen in: Molecular Ecology Resources ; 15 (2015), 2. - S. 295-305 https://dx.doi.org/10.1111/1755-0998.12307 A single-nucleotide polymorphism-b...

504KB Sizes 0 Downloads 14 Views

Erschienen in: Molecular Ecology Resources ; 15 (2015), 2. - S. 295-305 https://dx.doi.org/10.1111/1755-0998.12307

A single-nucleotide polymorphism-based approach for rapid and cost-effective genetic wolf monitoring in Europe based on noninvasively collected samples ROBERT H. S. KRAUS,* BRIDGETT VONHOLDT,† BERARDINO COCCHIARARO,* VERENA HARMS,*‡ € € € HELMUT BAYERL,§ RALPH K UHN,§¶ DANIEL W. F ORSTER,** J ORNS FICKEL,** CHRISTIAN ROOS†† and C A R S T E N N O W A K * *Conservation Genetics Group, Senckenberg Research Institute and Natural History Museum Frankfurt, D-63571 Gelnhausen, Germany, †Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA, ‡Senckenberg Museum of Natural History G€orlitz, PF 300154, 02806 G€orlitz, Germany, §Molecular Zoology Unit, Research Department Animal €nchen, Hans-Carl-von-Carlowitz-Platz 2, D-85354 Freising, Germany, ¶Wildlife and Sciences, Technische Universit€at Mu Conservation Ecology and Molecular Biology Program, Department of Fish, New Mexico State University, Box 30003, MSC 4901, Las Cruces, NM 88003-8003, USA, **Department of Evolutionary Genetics, Leibniz-Institute for Zoo and Wildlife Research, Alfred-Kowalke-Str. 17, D-10315 Berlin, Germany, ††Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg 4, D-37077 G€ottingen, Germany

Abstract Noninvasive genetics based on microsatellite markers has become an indispensable tool for wildlife monitoring and conservation research over the past decades. However, microsatellites have several drawbacks, such as the lack of standardisation between laboratories and high error rates. Here, we propose an alternative single-nucleotide polymorphism (SNP)-based marker system for noninvasively collected samples, which promises to solve these problems. Using nanofluidic SNP genotyping technology (Fluidigm), we genotyped 158 wolf samples (tissue, scats, hairs, urine) for 192 SNP loci selected from the Affymetrix v2 Canine SNP Array. We carefully selected an optimised final set of 96 SNPs (and discarded the worse half), based on assay performance and reliability. We found rates of missing data in this SNP set of <10% and genotyping error of ~1%, which improves genotyping accuracy by nearly an order of magnitude when compared to published data for other marker types. Our approach provides a tool for rapid and cost-effective genotyping of noninvasively collected wildlife samples. The ability to standardise genotype scoring combined with low error rates promises to constitute a major technological advancement and could establish SNPs as a standard marker for future wildlife monitoring. Keywords: Canis lupus, conservation, genetic monitoring, scat sampling, single-nucleotide polymorphism, single-nucleotide polymorphism chip, wildlife management

Introduction The return of large carnivore species (Enserink & Vogel 2006) to previously unoccupied habitats requires efficient monitoring to provide the data necessary for effective conservation and wildlife management. However, reliable data on species occurrences and densities are difficult to obtain, due to the rarity and elusiveness of these species (Guschanski et al. 2009; Kery et al. 2011). For this reason, molecular genotyping of noninvasively collected samples such as hair or scat material, usually performed Correspondence: Carsten Nowak, Fax: 0049 (0) 6051 61954 3118; E mail: [email protected]

with great success by microsatellite analysis, is often applied to assist traditional monitoring (Linnell et al. 2007). Microsatellites are arrays of short tandem repeats (STRs) of 1 6-bp-long DNA sequence motifs. The number of repeats in these arrays is often highly variable among individuals (Selkoe & Toonen 2006), resulting in high numbers of alleles per microsatellite locus. The statistical power and resolution of genotyping that can be achieved with few markers but many alleles made microsatellites the marker of choice for the majority of studies in population genetics and wildlife monitoring over the past decades (Schl€ otterer 2004; Selkoe & Toonen 2006). These properties have made microsatellite

Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-0-317440

296 genotyping a success particularly in wildlife and conservation genetics, but microsatellites have significant drawbacks as well. Mutation mode is complex and unclear (Schl€ otterer et al. 1998), and PCR artefacts such as stutter bands complicate automated allele calling. Noninvasively collected samples are particularly prone to high error rates (Taberlet et al. 1999) which are usually countered by applying a multiple-tubes approach (Navidi et al. 1992), or by meticulously reviewing raw data of fragment length analyses along with specific allelic ladders in forensic applications (Hellmann et al. 2006, 2007). The former of these strategies is expensive and labour intensive, while the latter is particularly time-consuming. Moreover, initial automation in microsatellite genotyping and standardisation between laboratories requires substantial efforts in nonmodel organisms. This severely complicates collaboration between working groups and cross-boundary wildlife management especially for large-range dispersers and has led to a splitting of monitoring activities across Europe for many species. Single-nucleotide polymorphisms (SNPs) have gained attention as a population genetic molecular marker (Schl€ otterer 2004). SNPs are positions in the genome of an organism constituting a stable polymorphism between individuals in a species with the minor allele segregating at a frequency of at least 1% (Brookes 1999). The use of SNPs in ecology, evolution and conservation has long been proposed because of their advantages over microsatellites, such as known and predictable mutation modes and their high abundance throughout the whole genome (Morin et al. 2004). SNP-based genetic data can be easily standardised and do not depend on the laboratory or technology used. Thus, unlike most microsatellite data sets, SNP data can be readily incorporated in shared genetic databases. A key limitation to the routine application of SNPs in wildlife genetics is the lack of genotyping technologies optimised for noninvasively collected samples. By their nature, SNP loci carry fewer alleles than microsatellites, and even though it depends on many complex characteristics such as number and population frequencies of alleles per locus, a two- to sixfold increase in the number of markers is required to offset their lower statistical power (G€arke et al. 2012); however, this depends on the question being investigated and does not rise linearly (Schopen et al. 2008). This amounts to 50 100 SNPs to provide statistical power similar to the 10 20 microsatellites used routinely in noninvasive wildlife monitoring studies. Significant improvements in SNP genotyping such as multiplexing and automation (Chen & Sullivan 2003; Black & Vontas 2007) have enabled wildlife biologists to utilise several hundreds of SNPs for studying wild populations (Willing et al. 2010; Jonker et al. 2013; Kraus et al. 2013). Multiplexed SNP genotyping mostly involves

large-scale SNP chips, which are either specifically developed for the target species (van Bers et al. 2012) or are derived from genomic resources of related model or domesticated species (Ramos et al. 2009; Pertoldi et al. 2010; vonHoldt et al. 2011; Ogden et al. 2012; Hoffman et al. 2013). Alternatively, smaller SNP panels can be genotyped in many more individuals with technologies such as Illumina Bead Arrays (Fan et al. 2006) or Sequenom MassARRAY iPLEX (Bray et al. 2001). All of these technologies require high-quality and/or high-quantity DNA samples. However, noninvasively collected samples, such as hair, scat or urine, provide DNA of particularly low quality (due to DNA degradation) and low quantity. Such samples have so far been genotyped at each SNP separately (Morin & McCarthy 2007). Thus, to genotype the necessary number of SNPs required substantial manual effort and consumed large quantities of DNA. Whole genome amplification has been discussed as one possible solution to overcome the problem of low DNA quantity (Kittler et al. 2002; Lasken & Egholm 2003), but such methods have been shown to be heavily biased towards amplification of longer fragments (Bergen et al. 2005) and can be prohibitively expensive. The bias towards amplifying longer fragments is particularly worrisome for noninvasively collected material as it favours amplification of nondegraded, that is nontarget DNA such as bacterial DNA present in the sample, or exogenous contamination with human DNA. The limited amount of target DNA extracted from noninvasively collected samples requires an economical use of DNA. Nanoscale genetic analyses on microfluidic platforms (Senapati et al. 2009) have been developed to scale down the required amounts of both expensive chemistry and precious DNA. Wang et al. (2009) introduced a platform which can reduce PCR reactions to 6 nL volumes while also offering a high level of automation. Combining this with single-plex SNP genotyping promises to be a costeffective, robust, sample material conserving and fast approach that will probably prove valuable for work involving noninvasively collected samples. In this study, we developed a 96-SNP panel for the grey wolf (Canis lupus) and provide genotyping protocols for noninvasively collected samples on the nanofluidic Dynamic Array Chip technology by Fluidigm Corp. (San Francisco, USA). Large-scale dog genomic resources (Lindblad-Toh et al. 2005; Boyko et al. 2010) have previously been applied to several wild canid species (e.g. vonHoldt et al. 2011). However, beyond the work of Seddon et al. (2005), which involved only tissue samples, a small panel of 24 SNPs, and used single-plex genotyping, no advances have been made in the development of SNP markers for noninvasive monitoring of wolf. Thus, for wolves, but also in general, the aim of our study was to

297 provide guidelines on how to mine SNP markers from published resources, develop assays and protocols for genotyping 96 high-quality SNP markers simultaneously, and evaluate genotyping performance with respect to missing data and error susceptibility.

Materials and methods

1 2 weeks, until DNA isolation with the QIAamp DNA Stool Mini Kit, and kept at 20 °C for long-term storage. We processed saliva and hairs (singly or pooled tuft hairs), stored dry at room temperature, with the QIAamp DNA Investigator Kit. Urine was collected from snow, transported to the laboratory frozen, and DNA extraction was performed according to Hausknecht et al. (2007).

Wolf samples and DNA isolation

SNP selection from Affymetrix data

DNA was extracted from several sample types: tissue (N 25), blood (N 14), scat (N 87), saliva (freshly collected, N 11, and collected from wounds of kills, N 1), urine/snow mixture (N 10), urine stains with oestrus blood (N 4) and hair (N 6) samples. Some samples were genotyped in duplicates or triplicates as internal controls: three tissue samples, five blood samples, nine scat samples, three saliva samples, one kill/ saliva sample, three urine and one oestrus blood/urine sample. For a list of samples, see Supplementary File ‘sample list.xlsx’ on the Dryad data repository. Due to routine genetic wolf monitoring in Germany (Harms et al. 2011), individual identities and sexes of all samples were known, and in many cases, familial relationships had been established using pedigree reconstruction (V. Harms, unpublished data). Although mainly German samples were available to us, we also included samples from Italy (N 8), Poland (N 4), Slovakia (N 1) and Hungary (N 1) in order that as many alleles as possible are represented within our data set. To test for cross-species amplification, we included potential wolf prey species in our analyses because their DNA is expected to be present in wolf faeces or as contaminant DNA in saliva samples from wolf kills: Eurasian beaver Castor fiber (tissue; N 4), wild boar Sus scrofa (tissue; N 3), fox Vulpes vulpes (hair; N 2), goat Capra aegagrus hircus (hair; N 2), roe deer Capreolus capreolus (hair; N 2), sheep Ovies aries (tissue; N 2), wildcat Felis silvestris (hair; N 2), and one of each cattle Bos primigenius taurus (hair), edible dormouse Glis glis (tissue), European hare Lepus europaeus (tissue), mouflon Ovis aries orientalis (tissue), and racoon Procyon lotor (tissue). All DNA extractions were carried out using Qiagen Kits (Hilden, Germany) and QIAcube 230V robotics, as per manufacturer instructions. DNA from noninvasively collected samples was isolated in a laboratory room dedicated to processing of noninvasively collected sample material (Taberlet et al. 1999). We extracted DNA from tissue, frozen and/or in ethanol, and blood on FTA cards (Smith & Burgoyne 2004), with the Qiagen DNeasy Blood & Tissue Kit and diluted DNA to 5 ng/lL, as measured on a Nanodrop ND-1000 (Thermo Scientific, Waltham, MA, USA), for further analyses. Scat samples were stored in 96% ethanol at room temperature, typically for

Single-nucleotide polymorphisms were initially selected from data on 79 European grey wolves genotyped for 60 584 SNPs on the Affymetrix v2 Canine SNP Array from the canfam2 dog genome assembly (vonHoldt et al. 2011) (Table S1, Supporting information; unfortunately these 79 samples were not available to us for later stages of this study, that is for Fluidigm genotyping with our 96 SNP panel). Treating all wolves as one population, we filtered for segregating SNPs that had at least 10% observed heterozygosity and that showed no significant deviations from Hardy Weinberg equilibrium (HWE) using PLINK (Purcell et al. 2007), which resulted in 28 369 SNPs. From these, we excluded SNPs with genotype correlations of r2 > 0.2 (also in PLINK), reducing the list to 17 299 SNPs (see Supplemental Methods in vonHoldt et al. 2011). Next, we excluded SNPs in known or hypothetical gene boundaries according to the canfam2 dog genome assembly and annotation (Lindblad-Toh et al. 2005) and picked 667 SNPs such that inter-SNP distances were at least 500 kb (final average 2.9 Mb).

Assay development and genotyping From the 667 SNP candidates, we selected 192 which were distributed on as many chromosomes as possible and designed two sets of 96 Fluidigm SNPtype assays to match the technical layout of Fluidigm’s technology for genotyping biallelic SNPs on the ‘96.96 Dynamic Array Chip for Genotyping’ (http://www.fluidigm.com). Similar to Amplifluor genotyping (see Rickert et al. 2004; Morin & McCarthy 2007), one pair of primers first amplifies the locus in which the SNP is located, followed by allele-specific internal PCR in which each allele-specific primer is fluorescently labelled. Before entering the genotyping reaction, all 96 SNP loci are preamplified in a so-called specific target amplification (STA) reaction, using 1.25 lL of template DNA (20 ng/lL by manufacturer recommendation) and all 96 locus-specific primers in the same PCR. For details of the method, see Nussberger et al. (2014). Each of the two sets was genotyped on IFCs (integrated fluidic circuits) containing the above-mentioned samples and 11 nontemplate controls (NTCs). These IFCs harbour nanoscale PCR reaction chambers with reaction

298 volumes of 6 nL, into which 96 samples and 96 SNP assays are loaded by the user. The dispersion of 96 assays by 96 samples into 9216 nano-PCR chambers is performed on Fluidigm IFC controllers (Wang et al. 2009). We modified the original genotyping protocol to accommodate for the low DNA quality and quantity of noninvasively collected samples. Initial STA products were diluted 1:10 instead of 1:100 as recommended by the manufacturer. The number of cycles were extended from 38 PCR cycles (recommended by the manufacturer; hereafter referred to as ‘c1’) to 42 cycles (‘c2’), 46 cycles (‘c3’) and 50 cycles (‘c4’) hereafter referred to as ‘genotyping treatments’. All NTCs showing significant fluorescence were invalidated manually before applying the clustering algorithm, a strategy advised by Fluidigm. Loci for which all NTCs had to be excluded were set to missing data in all samples. Note that NTC samples regularly display fluorescence signals in the absence of template DNA on the Fluidigm system. This is no point of particular concern. In the presence of template DNA, PCR competition will favour the matching target. Additionally, contamination can be excluded because those NTCs do not consistently yield genotypes across all SNP assays (Beatrice Nussberger and Fluidigm Support Service, personal communication). With careful scrutiny and exclusion of samples that fail at a large proportion of loci, it should be possible to filter out most if not all spurious genotypes. Eventually, visual inspection of genotype clustering across the four genotyping treatments favoured genotyping treatment ‘c2’ regarding the tradeoffs between cluster tightness, missing data values and error frequency. Thus, c2 was later used for calculations in the ‘final’ 96 SNP set (see Results for details).

Genotype evaluation and error calculation Across all four genotyping treatments and for every SNP, we curated cluster plots manually in a combined analysis of all 277 samples in the Fluidigm SNP GENOTYPING ANALYSIS SOFTWARE V3.1.2. Genotypes were exported into table format for further manual evaluation and genotyping error calculation. Three measures of genotyping consistency were employed: (i) general assay performance, (ii) genotyping consistency across dilution series and (iii) genotyping consistency across samples of the same individual. (i)General assay performance: for each SNP, we counted the number of samples which lacked an assigned genotype (missing data). Further, we counted how often a prey species sample was assigned a wolf SNP genotype (cross-species testing). (ii)Consistency across dilution series: we chose 23 samples of ‘good-quality’ DNA obtained from tissue or blood

(hereafter referred to as ‘reference samples’) to determine rates of missing data and genotyping consistency across four DNA concentrations: 5 (reference sample), 2, 0.5 and 0.2 ng/lL. First, means of missing data counts for all loci across the 23 reference samples were calculated for all four concentrations. Second, genotypes of samples of the three dilutions were compared to the 5 ng/lL reference sample and errors scored either as allelic dropout (i.e. an allele present in the reference sample is absent in the dilution) or false allele (i.e. an allele present in the dilution is absent in the reference). Differences in rates of missing data or errors between dilution steps were tested for statistical significance with the wilcox.exact() test in R (R Development Core Team 2009) from the ‘EXACTRANKTESTS’ package because the data contained ties and nearly all data sets had a non-normal distribution [Shapiro-Wilk test in R, function shapiro.test()]. (iii)Genotyping consistency across samples of the same individual: for 38 wolves, we had multiple samples comprising one ‘high-quality reference sample’ and at least one noninvasively collected sample. Samples were considered ‘failed’ for a specific genotyping treatment when having >25% of SNPs showing missing data. Failed samples were excluded from this analysis to avoid sample bias. Similarly, SNP assays were considered ‘failed’ when >50% of nonexcluded samples (after having applied the 25% criterion above) showed missing data; these were also not included in evaluating genotyping error (assay bias avoidance). Allelic differences between reference sample and its noninvasively collected counterparts were scored as allelic dropout or false allele.

Selection of the final 96 SNP set To achieve our target of selecting 96 reliable SNPs for the assay, we first excluded SNPs with missing data in >10% of the wolf samples from our initial 192 SNP set. Next, we excluded SNPs bearing two or more genotyping errors in the genotyping consistency check (measure iii). Combined, this led to a removal of 69 SNPs of the 192. To trim the total number of SNPs down to 96 (the number of SNPs that can be processed on the IFC), we removed the 27 SNPs with the visually lowest quality based on the SNP scatter plots of the genotyping software. Additionally, we tested this final set of 96 SNPs for their performance regarding individual identification. As a preliminary means to approach this, we estimated the probability of identity (PID) and the probability of identity among siblings (PIDsib). We searched our data of German wolves (to not inflate statistical power by inclusion of remote populations) for one sample per individual, where we chose the sample with the least missing

299 data when multiple samples were available. Then, we removed samples with missing data at more than 25% of loci (see above). PID and PIDsib were calculated in GENALEX 6.501 (Peakall & Smouse 2006).

:1l

8

8

0

0

"' 8

"'

Res ults

0

., "'c ~ g

General assay performance (i)

Genotyping consistency across dilution series (ii) Variation in missing data and genotyping error rates across loci, dilution series and genotyping treatments Table 1 Amounts of missing data across all SNPs for each treat ment. Given are medians with their 1st (Ql) and 3rd (Q3) quar tiles. Cf. Fig. 1 for graphical representation in boxplots Treatment

Ql

Median

Q3

c1 c2 c3

17.75 13 12 11.75 12

23 17 15 17 15

32 24.25 21 23 19

c4

~values

tion c2.

0

'0

Tables with raw genotype output are available via the Dryad data repository. Missing data rates per SNP were similar among genotyping treatments c1 c4 (Table 1): a median of 23 loci with missing data for c1 (1st quartile 17.75, 3rd quartile 32), 17 (Ql: 13, Q3: 24.25) for c2, 15 (Ql: 12, Q3: 21) for c3, and 17 (Q1: 11.75, Q3: 23) for c4. Albeit not significant, probably due to small samples size and large spread, there was an apparent trend for decreasing missing data from c1 to c2, but the decrease from c2 to c3 and c4 appeared rather marginal (Fig. 1). In c3, for 155 SNPs, missing data were below 10% among all wolf samples ('well-performing loci'), followed by 150 SNPs in c2, 149 SNPs in c4 and 122 SNPs incl. Cross-species testing revealed genotype calls in potential prey species. There was no obvious pattern for which taxon had the highest cross-amplification success or under which assay conditions cross-amplification was the lowest. In the absence of wolf DNA, between 22 (c4) and 53 (c1), SNPs produced a genotype in <10% of the tested 22 wolf prey species samples. However, the mean number of successfully amplifying SNPs per sample in prey species was as low as 32%. Therefore, if no wolf DNA was present in a scat sample, contamination from prey leading to spurious multilocus genotypes are sometimes generated in the absence of template DNA, but these genotypes are incomplete and so can easily be detected and removed.

c2: final set*

~

for the final set of 96 selected SNPs; genotyping condi

~

0

0

0

0

~

0

0 0 0

"'

--!-

0

a

o 8

~ 0

c1

°8

l.

-,

~



~ ~

~

'

c2

c3

o4

final

Genotyping protoool

Fig. 1 Missing data comparison among genotyping treatments (192 SNPs) and the final 96 set (measure i). Box Whisker plots display the counts of missing data per SNP across all samples (y axis); that is, a small fraction of SNPs has missing data for most samples (open circles displayed individually at higher val ues of missing data), while the bulk of SNP assays is not dis played individually because they lie within the boxes of the plot, with values well below a count of 50. c1 to c4 correspond to genotyping treatments as defined in the Methods section. Data points falling within whiskers of the plots are not dis played. The final set of 96 SNPs was evaluated under cycling condition c2.

was considerable. Although statistical tests for differences between successive dilutions yielded insignificant outcomes (Fig. 2), there was a trend towards an increase in missing data and genotyping errors in higher dilutions. When comparing genotyping treatments, we also found no evidence for increasing cycle numbers to impact the rate of missing data or genotyping errors (Fig. 2).

Genotyping consistency across samples of the same individual (iii) Among all genotyping treatments, -10% of the samples had to be discarded according to the 25% missing data criterion (see Methods). The best performing genotyping treatment was found to be c2 (89.5% of samples usable; Table 2). Without considering genotypes of unusable samples (25% missing data criterion) and unusable SNPs (50% SNP criterion, see Methods) among all possible genotypes (38 individuals x 192 SNPs 7296 genotypes), -75% of possible genotypes were called (max. 76.5% for c2). Error rates ranged between 3 and 3.5% (summary and details in Table 2).

300

a:

Fig. 2 Genotyping performance in dilu tion series (measure ii). For each PCR genotyping treatment c1 c4 as defined in the Methods section and the final set of96 SNPs (evaluated under treatment c2), we show means of missing data counts across all SNPs with standard deviations in the top panel. The darkest bars are the 5 ng/ ~tL reference samples; lighter shades of grey represent, from left to right, dilutions of the reference s ample to 2, 0.5 and 02 ng/ ~tL.ln the middle and bottom pan els, error rates are displayed between the reference sample and each of the three dilutions.

, "' iii +I s
,

0>

c

0 ....

·~

~

0

"' 0

~

~

,

"'

+I

~ <0

2

~

:5 0

a.

e

0

CD

....

"' 0

,~ .,"' 1!!., +I

~ a;

~

~

~

co CD

....

"' 0

c1

JJJ c2

e4

c3

final

Genotyping protocol

T able 2 Genotypingconsistency across samples of the same individual (measure iii) Usable

c1 c2 c3 c4 c2: final sett

samples(%)~

Genotypest

Dropout(%)

False allele(%)

Combined(%)

86.84

5503 (75.42%) 5580 (76.48%) 5492 (75.27%) 5266 (72.18%) 3301 (90.49%)

2.36 2.17 2.37 2.05 0.85

0.64 1.11 1.15 1.16 0.21

3.00 3.28 3.51 3.21 1.06

89.47 89.47 89.47 92.11

~% criterion. t That is all possible genotypes (38 samples x 192 SNPs 7296 genotypes) minus those that do not count according to the 25% sample and 50% SNP criteria (see Methods for details). NB: for the final set, the possible genotypes are 38 x 96 SNPs 3648 genotypes. ! Values for the final set of96 selected SNPs; genotyping condition c2.

Selection of the final genotyping treatment and the core 96 SNPset Overall, genotyping treatments were quite similar in their performances, but c2 showed slightly lower rates of missing data than cl. Because the ctifferences between the c2, c3 and c4 treatments were marginal, and because c2 required the smallest changes to the original Fluidigm treatment and entailed the lowest risk of creating PCR artefacts, we chose the c2 treatments to compare the full

192 with the final96 SNP set. Missing data (measure i) in the final 96 SNP set were slightly lower than in the 192 sets (median 15, Ql 12, Q3 19) but showed a strongly reduced spread (Fig. 1). We were able to select SNPs such that all 96 loci had <10% missing data in the wolf samples. Missing data and error rates in the dilution series (measure ii) were accordingly much lower for the final96 SNP set than in any of the four genotyping treatments of the 192 SNP set (Fig. 2). The final error measures in (measure iii) were consistently lower than in the

301 192 set, while more samples were usable and more genotypes could be called. Dropout errors were only detected in 0.85% of all genotypes and false alleles in 0.21%, resulting in a total error rate of 1.06% in our final 96 SNP set (Table 1). A list of names and assay configuration is available as Supplementary File ‘plate layout 96 SNPs.xlsx’ in the Dryad data repository. Statistical power could be primarily tested with genotypes of 13 German wolves. The full set of 96 loci allows discriminating individuals with PID 6.97 9 10 20 and PIDsib 1.32 9 10 10. A probability of identity of <1 in 10 000 was already reached with a combination of 25 loci for PID and 47 loci for PIDsib.

Discussion Genetic wildlife monitoring (Schwartz et al. 2007; Luikart et al. 2010) is routinely used in addition to traditional conservation and management programmes (BareaAzc on et al. 2007; Gula et al. 2009; Hausknecht et al. 2010). However, it usually requires significant sampling effort and the analysis of many samples and is therefore often both time and cost intensive (De Barba et al. 2010; Steyer et al. 2013). Noninvasive sampling features many pitfalls and difficulties, including the need for replicated genotyping to overcome low success rates and genotyping errors, despite the relatively limited quantity of adequate sample material (Taberlet et al. 1999). These issues have not been overcome in the two decades since the first implementations of ‘noninvasive genetics’ in the early 1990s (H€ oss et al. 1992; Taberlet & Bouvet 1992; Morin et al. 1993). SNP marker panels, based on extensively tested multiplex PCR sets, are increasingly tested now in human forensics (Krjutskov et al. 2009; Westen et al. 2009). Here, we present a cost-effective and feasible SNP genotyping method for noninvasively collected wildlife samples, which overcomes the often discussed problems of microsatellite analysis, such as high rates of genotyping error and the resulting need for multiple replicates. This promises to solve the often debated problems and pitfalls of noninvasive genetic monitoring, such as costly and laborious multiple replication, lack of standardisation between laboratories and consequently the lack of large-scale, cross-boundary genotype database projects for endangered wildlife. Manually performed, labour-intensive single-plex SNP genotyping of difficult DNA samples, such as noninvasively collected or old material, has been performed before in wildlife forensics (Morin & McCarthy 2007), but the reaction volumes of usually between 5 and 50 lL were of the orders of magnitudes higher than what we used here. Fluidigm SNPtype assays are designed for reaction volumes of only 6 nL, and our study represents

a proof of principle that this system works well for low-quality and low-quantity DNA such as from noninvasively collected sample material. We showed that missing data was below 10% for every SNP in our final panel, which is low compared with the often high rates of missing data observed in microsatellite-based noninvasive studies (Fickel et al. 2012; Kopatz et al. 2012). Inference and handling of genotyping errors is a commonly reported issue in the noninvasive genetic monitoring literature, and the range of genotyping error rates can be huge from sometimes very little or no error at all to nearly 50% (Broquet & Petit 2004). Assessing every detail of error rate estimation is beyond the scope of this study, but we present some examples to illustrate this point. In noninvasive microsatellite studies of wolf (Lucchini et al. 2002) or wildcat (Hartmann et al. 2013; Steyer et al. 2013), it can be >10%, or 4% in beaver (Frosch et al. 2014). Low error rates for microsatellites are usually achieved by replicate PCR. We improved genotyping accuracy greatly with our 96 SNP set (overall error rate ~1%) without the need for PCR replication. We also note that the comparison of our 192 SNP set with the final 96 SNP set is incomplete until an additional validation of our error rates with new samples in a new experiment has been carried out. It is premature to consistently compare our SNP genotyping error rate to error rates obtained by other studies. An often used genotyping system in molecular ecology studies is Illumina Bead Arrays (Fan et al. 2006; Jonker et al. 2013; Kraus et al. 2013). A recent in-depth error assessment of this technology revealed an error rate of far below 1% (Hoffman et al. 2012). This method relies on larger DNA template quantities that cannot be obtained by noninvasive sample collection. The few examples that exist for Fluidigm SNP genotyping also indicate a nearly 0% error rate, but also only when DNA template is of standard quality and quantity (Wang et al. 2009; Bhat et al. 2012). The ease of use of the presented method alone constitutes a major advantage. In the case of microsatellites, each sample requires three multiplex PCRs with four replicates each. The amount of template DNA needed for this procedure is 45.6 lL (NB: DNA isolation from scat yields large amounts of DNA due to large amounts of bacteria. For this is not target DNA, though, we do not quantify DNA in our isolates to measures such as nanograms and hence compare DNA isolate volumes rather than amounts of DNA). Therefore, precious samples are quickly used up and often not available for ascertaining unclear results or for follow-up studies. In contrast, the SNPtype method combined with Fluidigm’s nanofluidic technology requires manual pipetting of only 96 preamplified samples and 96 prepared assays onto the IFC. Full factorial dispersion of 96 9 96 9216 PCR reactions is performed fully automated. Critically, as little as 1.25 lL

302 per sample is used during STA for the first step of the SNPtype method. Source material containing very low amounts of DNA, such as single hairs (Nussberger et al. 2014), can thus be used for DNA preparation with very low volumes of final elution buffer. For microsatellites, genotyping costs include PCR chemistry and hot start polymerase, with subsequent fragment analysis on a capillary sequencer (no primers included in calculations) and replication. The analysis of one already isolated DNA sample costs 2.2 times as much as with the SNPtype method in our laboratory. Also for manual labour, the Fluidigm system is superior because of the many fewer pipetting steps. Initial assay costs (35 Euro per assay) are excluded in this estimate because the delivered amount of assays lasts for 14 400 samples (much cheaper than ordering primers for microsatellite genotyping). An order of 192 assays, to obtain 96 that work appropriately, therefore costs an initial sum of 6720 Euro. For microsatellites, such initial investment depends much on the study. For example, in a recent effort from our laboratory (Nowak et al. 2014), we tested 81 primer pairs of which 45 loci were sequenced to establish their appropriateness. Of those, 29 primers were additionally ordered as fluorescently labelled ones. Approximately, this amounted to 3890 Euro set-up cost before genotyping. Thus, the cost difference between setting up SNP assays and microsatellite assays will only be around 3000 Euro. But as explained earlier, microsatellite primers need to be reordered more often than SNP assays. Another dimension of comparison of costs is equipment. Fluidigm equipment and set-up service costs about 110 000 Euro (Fluidigm EP1 system), while a second-hand ABI 3730xl sequencer (comparable in throughput to the Fluidigm machine) for microsatellite fragment analysis costs about 70 000 Euro. Further., the restriction to run entire IFCs can also constitute a drawback if small sample numbers have to be analysed. However, there are IFC layouts available for 48 samples/48 SNPs and 192 samples/24 SNPs (www.fluidigm.com). Eventually, when running small projects on rare species, the cheaper option might therefore still be microsatellites in terms of initial investment (certainly not in terms of hands-on laboratory work). However, initial cost differences are not huge and we believe the Fluidigm system will pay off quickly, especially when considering continuous species monitoring efforts. In this study, we tested laboratory protocols for a parallel SNP genotyping platform, to adjust this system for use with noninvasively collected samples. Our approach is suitable to provide highly accurate genotypes for noninvasively collected samples. This requires that quality controls similar to ours are implemented to avoid the inclusion of qualitatively inferior samples, that is, evaluate missing data like we did (cf. our measure of genotyp-

ing consistency iii). Further, we show that sometimes the NTCs display fluorescent signals. This is intuitively wrong for a sample that actually represents a negative control. However, in the reaction set-up of the Fluidigm assays, the NTCs are mostly present to normalise fluorescence calculations. Similar to the KASPar assays from KBioscience (now LGC Genomics, Hoddesdon, UK), SNPtype assays contain a dual-FRET cassette in the master mix with its fluorophors bound on a complementary sequence to the specific assay primer tails. Thus, primer dimers can also sometimes produce a signal above the detection threshold in the absence of a PCR target DNA fragment. In preliminary tests, we have so far established that treating NTCs with an exonuclease I shrimp alkaline phosphatase (exo-sap) clean up after STA removes sufficient amounts of the unincorporated dNTPs and primers to reduce oversupplied fluorescence signals in the genotyping reaction. Next, we were advised by Fluidigm to also use NTCs that were not treated in the STA step to avoid increased levels of nonspecific fluorescence. Therefore, caution is needed when interpreting NTCs and further optimisation is required to completely resolve this issue. Further steps to identify ‘nonsense’ genotypes of samples could be principal component analysis on the genotypes, to identify genetic outliers that either did not contain sufficient amounts of target DNA or were not of the target species (Kraus et al. 2012). Assay failure rates are relatively low, and error rates are far below those reported in the literature for traditional microsatellite systems. Additionally, our method is cheaper, faster, requires less handling and offers easy standardisation between laboratories. Having now adopted and evaluated the method, details for implementation as a routine monitoring system remain to be resolved in future studies, including the integration of sex determination, SNPs specific for certain mitochondrial haplotypes or functional SNPs. Matching individual genotypes to assign samples to individuals may be possible even in the presence of missing data or error rates found in this study (Galpern et al. 2012). For instance, two samples of the same individual may display a certain number of mismatches (considering our error rate maybe in the range of 0 5%), but two samples of two different individuals should always display many more mismatches. However, detailed evaluation of how this genetic finger-printing functionality will be possible is out of scope for this study and also part of future investigations. The coming years will see a shift in genetic monitoring methods due to technological advancements at both the genotyping and the marker development level. After many years of being advocated as a superior alternative to microsatellites for many applications, we expect that SNPs will now finally make their way into routine noninvasive wildlife monitoring. Our method is applicable to

303 basically any organism once SNP markers are available. Such marker sets can be developed with relatively little effort from existing large-scale SNP chips where applicable (cf. this study), or developed de novo by exploiting next-generation sequencing technologies (Davey et al. 2011; Kraus et al. 2011; Ogden 2011; Seeb et al. 2011).

Acknowledgements This study was funded by grant SAW 2011 SGN 3 of the Leibniz Association (Germany). We thank Beatrice Nussberger for valu able discussions and suggestions, Violeta Munoz Fuentes for commenting on the manuscript, Nico Westphal, Lutz Walter and the German Primate Center (DPZ) for technical support, and Margit Stadler and Thorsten Lemker from Fluidigm for cus tomer support. Susanne Carl and Jenny Wertheimer isolated DNA; Francesca Marucco, Kristy Pilgrim and Michael Schwartz provided DNA of Italian samples. Thanks to Gesa Kluth, Ilka Reinhard (LUPUS Wildlife Consulting), Sachsisches Staatsminis terium fur Umwelt und Landwirtschaft (SMUL); Landesamt fur Umweltschutz (LAU) Sachsen Anhalt; Landesamt fur Umwelt, Gesundheit und Verbraucherschutz (LUGV) Brandenburg; Nie dersachsischer Landesbetrieb fur Wasserwirtschaft, Kusten und Naturschutz (NLWKN); Landesamt fur Umwelt, Naturs chutz und Geologie (LUNG) Mecklenburg Vorpommern for providing samples and fruitful cooperation. We are also grateful to Herman Ansorge for his continuous support and the German Federal Agency for Nature Conservation (BfN) for nonmaterial support. RHSK was also supported by the ESF funded ‘ConGe nOmics Programme’.

References Barea-Azc on JM, Virg os E, Ballesteros-Duper on E, Mole on M, Chirosa M (2007) Surveying carnivores at large spatial scales: a comparison of four broad-applied methods. Biodiversity and Conservation, 16, 1213 1230. Bergen AW, Qi Y, Haque KA, Welch RA, Chanock SJ (2005) Effects of DNA mass on multiple displacement whole genome amplification and genotyping performance. BMC Biotechnology, 5, 24. van Bers NEM, Santure AW, Van Oers K et al. (2012) The design and cross-population application of a genome-wide SNP chip for the great tit Parus major. Molecular Ecology Resources, 12, 753 770. Bhat S, Polanowski AM, Double MC, Jarman SN, Emslie KR (2012) The effect of input DNA copy number on genotype call and characterising SNP markers in the humpback whale genome using a nanofluidic array. PLoS One, 7, e39181. Black WC IV, Vontas JG (2007) Affordable assays for genotyping single nucleotide polymorphisms in insects. Insect Molecular Biology, 16, 377 387. Boyko AR, Quignon P, Li L et al. (2010) A simple genetic architecture underlies morphological variation in dogs. PLoS Biology, 8, e1000451. Bray MS, Boerwinkle E, Doris PA (2001) High-throughput multiplex SNP genotyping with MALDI-TOF mass spectrometry: practice, problems and promise. Human Mutation, 17, 296 304. Brookes AJ (1999) The essence of SNPs. Gene, 234, 177 186. Broquet T, Petit E (2004) Quantifying genotyping errors in noninvasive population genetics. Molecular Ecology, 13, 3601 3608. Chen X, Sullivan PF (2003) Single nucleotide polymorphism genotyping: biochemistry, protocol, cost and throughput. The Pharmacogenomics Journal, 3, 77 96.

Davey JW, Hohenlohe PA, Etter PD et al. (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics, 12, 499 510. De Barba M, Waits LP, Garton EO et al. (2010) The power of genetic monitoring for studying demography, ecology and genetics of a reintroduced brown bear population. Molecular Ecology, 19, 3938 3951. Enserink M, Vogel G (2006) The carnivore comeback. Science, 314, 746 749. Fan J-B, Gunderson KL, Bibikova M et al. (2006) Illumina universal bead arrays. Methods in Enzymology, 410, 57 73. Fickel J, Bubliy OA, Brand J, Mayer K, Heurich M (2012) Low genotyping error rates in non-invasively collected samples from roe deer of the Bavarian Forest National Park. Mammalian Biology, 77, 67 70. Frosch C, Kraus RHS, Angst C et al. (2014) The genetic legacy of multiple beaver reintroductions in Central Europe. PLoS One, 9, e97619. Galpern P, Manseau M, Hettinga P, Smith K, Wilson P (2012) ALLELE MATCH: an R package for identifying unique multilocus genotypes where genotyping error and missing data may be present. Molecular Ecology Resources, 12, 771 778. G€ arke C, Ytournel F, Bed’Hom B et al. (2012) Comparison of SNPs and microsatellites for assessing the genetic structure of chicken populations. Animal Genetics, 43, 419 428. Gula R, Hausknecht R, Kuehn R (2009) Evidence of wolf dispersal in anthropogenic habitats of the Polish Carpathian Mountains. Biodiver sity and Conservation, 18, 2173 2184. Guschanski K, Vigilant L, McNeilage A et al. (2009) Counting elusive animals: comparing field and genetic census of the entire mountain gorilla population of Bwindi Impenetrable National Park, Uganda. Biological Conservation, 142, 290 300. Harms V, Steyer K, Frosch C, Nowak C (2011) Wolfsforschung im Molekularlabor Senckenberg ist nationales Referenzzentrum f€ ur Wolfsgenetik [in German]. Natur Forschung Museum, 141, 174 181. Hartmann SA, Steyer K, Kraus RHS, Segelbacher G, Nowak C (2013) Potential barriers to gene flow in the endangered European wildcat (Felis silvestris). Conservation Genetics, 14, 413 426. Hausknecht R, Gula R, Pirga B, Kuehn R (2007) Urine a source for noninvasive genetic monitoring in wildlife. Molecular Ecology Notes, 7, 208 212.  Firm Hausknecht R, Szab o A, anszky G, Gula R, Kuehn R (2010) Confirmation of wolf residence in Northern Hungary by field and genetic monitoring. Mammalian Biology, 75, 348 352. Hellmann AP, Rohleder U, Eichmann C et al. (2006) A proposal for standardization in forensic canine DNA typing: allele nomenclature of six canine-specific STR loci. Journal of Forensic Sciences, 51, 274 281. Hellmann AP, Morzfeld J, Schleenbecker U (2007) The genetic fingerprint of animals and plants: DNA-analysis on biological traces from nonhuman sources [in German: Der Genetische Fingerabdruck von Tieren und Pflanzen]. Kriminalistik, 61, 109 111. Hoffman JI, Tucker R, Bridgett SJ et al. (2012) Rates of assay success and genotyping error when single nucleotide polymorphism genotyping in non-model organisms: a case study in the Antarctic fur seal. Molecular Ecology Resources, 12, 861 872. Hoffman JI, Thorne MAS, McEwing R, Forcada J, Ogden R (2013) Crossamplification and validation of SNPs conserved over 44 million years between seals and dogs. PLoS One, 8, e68365. vonHoldt BM, Pollinger JP, Earl DA et al. (2011) A genome-wide perspective on the evolutionary history of enigmatic wolf-like canids. Genome Research, 21, 1294 1305. H€ oss M, Kohn M, P€ a€ abo S, Knauer F, Schroder W (1992) Excrement analysis by PCR. Nature, 359, 199. Jonker RM, Kraus RHS, Zhang Q et al. (2013) Genetic consequences of breaking migratory traditions in barnacle geese Branta leucopsis. Molec ular Ecology, 22, 5835 5847. Kery M, Gardner B, Stoeckle T, Weber D, Royle JA (2011) Use of spatial capture-recapture modeling and DNA data to estimate densities of elusive animals. Conservation Biology, 25, 356 364.

304 Kittler R, Stoneking M, Kayser M (2002) A whole genome amplification method to generate long fragments from low quantities of genomic DNA. Analytical Biochemistry, 300, 237 244. Kopatz A, Eiken HG, Hagen SB et al. (2012) Connectivity and population subdivision at the fringe of a large brown bear (Ursus arctos) population in North Western Europe. Conservation Genetics, 13, 681 692. Kraus RHS, Kerstens HHD, van Hooft P et al. (2011) Genome wide SNP discovery, analysis and evaluation in mallard (Anas platyrhynchos). BMC Genomics, 12, 150. Kraus RHS, Kerstens HHD, van Hooft P et al. (2012) Widespread horizontal genomic exchange does not erode species barriers among sympatric ducks. BMC Evolutionary Biology, 12, Article No. 45. Kraus RHS, Van Hooft P, Megens H-J et al. (2013) Global lack of flyway structure in a cosmopolitan bird revealed by a genome wide survey of single nucleotide polymorphisms. Molecular Ecology, 22, 41 55. Krjutskov K, Viltrop T, Palta P et al. (2009) Evaluation of the 124-plex SNP typing microarray for forensic testing. Forensic Science Interna tional: Genetics, 4, 43 48. Lasken RS, Egholm M (2003) Whole genome amplification: abundant supplies of DNA from precious samples or clinical specimens. Trends in Biotechnology, 21, 531 535. Lindblad-Toh K, Wade CM, Mikkelsen TS et al. (2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature, 438, 803 819. Linnell J, Salvatori V, Boitani L (2007) Guidelines for Population Level Man agement Plans for Large Carnivores in Europe. Final Draft May 2007. A Large Carnivore Initiative for Europe report prepared for the European Commission, Rome. Lucchini V, Fabbri E, Marucco F et al. (2002) Noninvasive molecular tracking of colonizing wolf (Canis lupus) packs in the western Italian Alps. Molecular Ecology, 11, 857 868. Luikart G, Ryman N, Tallmon DA, Schwartz MK, Allendorf FW (2010) Estimation of census and effective population sizes: the increasing usefulness of DNA-based approaches. Conservation Genetics, 11, 355 373. Morin PA, McCarthy M (2007) Highly accurate SNP genotyping from historical and low-quality samples. Molecular Ecology Notes, 7, 937 946. Morin PA, Wallis J, Moore JJ, Chakraborty R, Woodruff DS (1993) Noninvasive sampling and DNA amplification for paternity exclusion, community structure, and phylogeography in wild chimpanzees. Pri mates, 34, 347 356. Morin PA, Luikart G, Wayne RK (2004) SNPs in ecology, evolution and conservation. Trends in Ecology and Evolution, 19, 208 216. Navidi W, Arnheim N, Waterman MS (1992) A multiple-tubes approach for accurate genotyping of very small DNA samples by using PCR: statistical considerations. American Journal of Human Genetics, 50, 347 359. Nowak C, Zuther S, Leontyev SV, Geismar J (2014) Rapid development of microsatellite markers for the critically endangered Saiga (Saiga tata rica) using Illuminaâ Miseq next generation sequencing technology. Conservation Genetics Resources, 6, 159 162. Nussberger B, Wandeler P, Camenisch C (2014) A SNP chip to detect introgression in wildcats allows accurate genotyping of low quality samples. European Journal of Wildlife Research, 60, 405 410. Ogden R (2011) Unlocking the potential of genomic technologies for wildlife forensics. Molecular Ecology Resources, 11, 109 116. Ogden R, Baird J, Senn H, McEwing R (2012) The use of cross-species genome-wide arrays to discover SNP markers for conservation genetics: a case study from Arabian and scimitar-horned oryx. Conservation Genetics Resources, 4, 471 473. Peakall R, Smouse PE (2006) GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Molecular Ecology Notes, 6, 288 295. Pertoldi C, W ojcik JM, Tokarska M et al. (2010) Genome variability in European and American bison detected using the BovineSNP50 BeadChip. Conservation Genetics, 11, 627 634. Purcell S, Neale B, Todd-Brown K et al. (2007) PLINK: a tool set for wholegenome association and population-based linkage analyses. American Journal of Human Genetics, 81, 559 575.

R Development Core Team (2009) R: A Language and Environment for Sta tistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available from http://www.R-project.org. Ramos AM, Crooijmans RPMA, Affara NA et al. (2009) Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology. PLoS One, 4, e6524. Rickert AM, Borodina TA, Kuhn EJ, Lehrach H, Sperling S (2004) Refinement of single-nucleotide polymorphism genotyping methods on human genomic DNA: amplifluor allele-specific polymerase chain reaction versus ligation detection reaction-TaqMan. Analytical Biochem istry, 330, 288 297. Schl€ otterer C (2004) The evolution of molecular markers just a matter of fashion? Nature Reviews Genetics, 5, 63 69. Schl€ otterer C, Ritter R, Harr B, Brem G (1998) High mutation rate of a long microsatellite allele in Drosophila melanogaster provides evidence for allele-specific mutation rates. Molecular Biology and Evolution, 15, 1269 1274. Schopen GCB, Bovenhuis H, Visker MHPW, Van Arendonk JAM (2008) Comparison of information content for microsatellites and SNPs in poultry and cattle. Animal Genetics, 39, 451 453. Schwartz MK, Luikart G, Waples RS (2007) Genetic monitoring as a promising tool for conservation and management. Trends in Ecology and Evolution, 22, 25 33. Seddon JM, Parker HG, Ostrander EA, Ellegren H (2005) SNPs in ecological and conservation studies: a test in the Scandinavian wolf population. Molecular Ecology, 14, 503 511. Seeb JE, Carvalho G, Hauser L et al. (2011) Single-nucleotide polymorphism (SNP) discovery and applications of SNP genotyping in nonmodel organisms. Molecular Ecology Resources, 11, 1 8. Selkoe KA, Toonen RJ (2006) Microsatellites for ecologists: a practical guide to using and evaluating microsatellite markers. Ecology Letters, 9, 615 629. Senapati S, Mahon AR, Gordon J et al. (2009) Rapid on-chip genetic detection microfluidic platform for real world applications. Biomicrofluidics, 3, 022407. Smith LM, Burgoyne LA (2004) Collecting, archiving and processing DNA from wildlife samples using FTAâ databasing paper. BMC Ecol ogy, 4, 4. Steyer K, Simon O, Kraus RHS, Haase P, Nowak C (2013) Hair trapping with valerian-treated lure sticks as a tool for genetic wildcat monitoring in low-density habitats. European Journal of Wildlife Research, 59, 39 46. Taberlet P, Bouvet J (1992) Bear conservation genetics. Nature, 358, 197. Taberlet P, Luikart G, Waits LP (1999) Noninvasive genetic sampling: look before you leap. Trends in Ecology and Evolution, 14, 323 327. Wang J, Lin M, Crenshaw A et al. (2009) High-throughput single nucleotide polymorphism genotyping using nanofluidic Dynamic Arrays. BMC Genomics, 10, Article No. 561. Westen AA, Matai AS, Laros JFJ et al. (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples. Forensic Science International: Genetics, 3, 233 241. Willing EM, Bentzen P, Van Oosterhout C et al. (2010) Genome-wide single nucleotide polymorphisms reveal population history and adaptive divergence in wild guppies. Molecular Ecology, 19, 968 984.

R.H.S.K., H.B., R.K., D.W.F., J.F. and C.N. designed the study, R.H.S.K. and B.vH. analysed and interpreted data, B.C. and V.H. coordinated sample collection, prepared DNA, B.C. and R.H.S.K. carried out the experiments, C.R. provided analytical reagents, and R.H.S.K. and C.N. wrote the manuscript. All authors edited and approved the final manuscript.

305

Data Accessibility Table S1 can be found online with the study; files of sample and SNP lists, and raw genotype files are deposited on Dryad under doi:10.5061/dryad.2vq52.

Supporting Information Additional Supporting Information may be found in the online version of this article:

Table S1 Samples from vonHoldt et al. (2011) with initial diver sity measures calculated over all samples and SNPs of the 17 299 SNP set (N, sample size; Ho, observed heterozygosity; He, expected heterozygosity).