Development and Application of 1536-plex Single Nucleotide Polymorphism Marker Chip for Genome Wide Scanning of Indonesian Rice Germplasm

A successful molecular breeding program requires detailed and comprehensive understanding of the diversity of rice germ-plasm and genetic base of target traits. The objective of this research was to develop the high throughput 1536-SNP chip linked to heading date and yield component traits and used it for genotyping the diverse Indonesian rice germplasm. The genotype data obtained could be used for diversity analysis and genome wide association mapping study. A 1536-SNP genome wide assay was developed using the Illumina's GoldenGate technology. The SNP markers were selected in the rice genome regions containing heading date and yield component genes or regions where the quantitative trait loci (QTLs) of the two traits were mapped. The developed custom SNP chips were then used for genotyping 467 rice accessions showing diversity in heading dates and yield components. The assay can reliably be used for diversity analysis and mapping genes associated with heading date and yield component traits. For 1536-SNP BIO-RiceOPA-1 custom chip designed, a total of 34.832 SNPs distributed in rice genome particularly in the region of heading date and yield component genes or QTLs were identified. A total of 1536-SNP were selected and confirmed to be used for genotyping analysis. Analysis performance and quality of 1536-SNP BIO-RiceOPA1 showed that 60% (918/1536) of total SNP markers had a good differentiating power in scanning the rice accessions tested (MAF > 0.2). The 1536-SNP genome wide assay Illumina's GoldenGate designed was useful for diversity analysis and could be used as SNP marker for large scale genotyping in rice molecular breeding involving Indica-Indica, Indica-Japonica and Indica-Tropical Japonica crosses.


ABSTRACT
A successful molecular breeding program requires detailed and comprehensive understanding of the diversity of rice germplasm and genetic base of target traits. The objective of this research was to develop the high throughput 1536-SNP chip linked to heading date and yield component traits and used it for genotyping the diverse Indonesian rice germplasm. The genotype data obtained could be used for diversity analysis and genome wide association mapping study. A 1536-SNP genome wide assay was developed using the Illumina's GoldenGate technology. The SNP markers were selected in the rice genome regions containing heading date and yield component genes or regions where the quantitative trait loci (QTLs) of the two traits were mapped. The developed custom SNP chips were then used for genotyping 467 rice accessions showing diversity in heading dates and yield components. The assay can reliably be used for diversity analysis and mapping genes associated with heading date and yield component traits. For 1536-SNP BIO-RiceOPA-1 custom chip designed, a total of 34.832 SNPs distributed in rice genome particularly in the region of heading date and yield component genes or QTLs were identified. A total of 1536-SNP were selected and confirmed to be used for genotyping analysis. Analysis performance and quality of 1536-SNP BIO-RiceOPA1 showed that 60% (918/1536) of total SNP markers had a good differentiating power in scanning the rice accessions tested (MAF > 0.2). The 1536-SNP genome wide assay Illumina's GoldenGate designed was useful for diversity analysis and could be used as SNP marker for large scale genotyping in rice molecular breeding involving Indica-Indica, Indica-Japonica and Indica-Tropical Japonica crosses.

INTRODUCTION
Rice is one of the most important food crops in the world. With a broad geographical adaptation and wide genetic diversity, rice is an excellent model crop for studying genetic and domestication process.
Single nucleotide polymorphisms (SNPs) are abundant and evenly distributed throughout the genome of most plant species (Yan et al. 2009). SNP discovery has begun to make large pools of SNPs to take advantage of rice genome information. While high density SNP-chip detection system will be usefull for a wide range of rice breeding programs and applications. A high throughput SNP platform will automate genotyping and allele calling which allow more efficient breeding strategies that were not feasible using previous gel-based genotyping system (Chen et al. 2011).
The successful implementation of MAS strategies is dependent on having an efficient and robust genotyping system in place (Collard et al. 2008). For many years, simple sequence repeat (SSR) markers have been the marker system of choice due to their high polymorphism rates, simple polymerase chain reaction-based system and the ability to detect on the gel electrophoresis system. SSRs have proven useful for diversity analysis (Ram et al. 2009;Zheng et al. 2011), QTL mapping (Wang et al. 2007;Wan et al. 2008) and marker-assisted breeding (Karakousis et al. 2003;Collard et al. 2008). However, the markers have difficulties in scoring precise allele sizes, running in high throughput system and reducing costs through multiplexing. Routine integration of markers into modern breeding programs requires high throughput genotyping platforms that can handle a large number of samples at a low cost (Aumann et al. 2005). Thus, a new generation of markers based on SNPs is now rapidly overtaking SSRs because of the new SNP genotyping platforms that offer multiplexed sets of markers for different applications. SNPs have the potential to greatly increase the speed and reduce the cost of molecular-marker genotyping, which makes it feasible to 'mainstream' MAS into conventional breeding programs.
The value of having a large pool of available SNPs is that a subset containing the most informative and useful SNPs can be selected for different applications and sets of germplasm. For certain applications, it may be important to select evenly a subset of SNP markers that are polymorphic between targeted germplasm groups, while others may use trait-specific functional SNPs that are diagnostic of desirable alleles. A key step in this process is validating SNPs performance with specific marker assays, since not every SNP will be able to be converted into a reliable marker across different genotyping systems. The objective of this research was to develop the high throughput 1536-SNP chip linked to heading date and yield component traits and used it for genotyping the diverse Indonesian rice germplasm. The genotype data obtained would be useful for diversity analysis and genome wide association mapping study.

Plant Materials and DNA Extraction
Four hundred and sixty seven rice accessions were used in this study. These included 29 accessions of released varieties, 34 accessions of near isogenic lines (NIL), 136 accessions of local varieties, 11 accessions of wild species and 162 accessions of breeding lines for SNP genotyping and characterization. Rice leaves were collected from a single plant of derived lines. The Thermo Scientific King Fisher Plant DNA kits were used for genomic DNA extraction (Qiagen 2011). DNA concentration was counted using a Nano Drop spectrophotometer (Thermo Scientific 2011). A minimum of 15 µl genomic DNA (50 ng µ L -1 ) was required for the GoldenGate assay. DNA was stored in TE buffer (10 mM Tris, pH 7.5: 1 mM EDTA). At least 10% of samples was duplicated within the samples to act as quality control (QC). The DNA concentration was standardized using dilution into 50 ng µ L -1 for final concentration. DNA purity was determined by using the A 260 /A 280 ratio of 1.8-2.0 (Sambrook dan Russell 2001).  Table 1.

SNP Genotyping Using the GoldenGate Reader
The current study selected subsets of previously validated and new custom designed SNPs for running as 96-and 1536-SNP multiplexed sets. The SNP sets were designed for the Illumina GoldenGate assay, which uses locus and alleles-specific oligos with cy3/ cy5 labelling to detect SNP alleles at each locus.
These custom Oligo Pool Assay (OPA) sets were then run on the Illumina platform which consists of an iSCAN reader with autoloader and GenomeStudio analysis software which can be used with a variety of chemistries for genotyping (Illumina Product Guide 2009). The genotyping is carried out in large multiplexes of 1536 SNPs, in multiples of 96 SNPs, using Illumina custom SNP panels. The GoldenGate assay is an allele-specific oligo hybridization, ligation and extension assay followed by universal PCR amplification, allowing that no amplification bias can occur. These amplification products were then bond to the 3 µ M microbeads in 32-sample bead chips and alleles were read by fluorescent readout using the iSCAN reader. The GenomeStudio software from Illumina was used for allele clustering based on the ratio of the cy3/cy5 signal intensities to call the three genotype classes. The resulting SNP calls were then reformatted for subsequent data analysis for SNP visualization using diversity analysis with Power Marker program (http://statgen.ncsu.edu/powermarker) and population structure using STRUCTURE (http:// pritch.bsd.ichicago.edu/structure.html).

SNP-Chip Developed
The summary of custom SNP designed were done based on GoldenGate Illumina assay detection systems, as a result, 1536 SNPs were selected and developed as a 1536 SNP-chip. Hereafter this custom designed was termed as BIO-RiceOPA-1.The dis-tribution of SNP marker of the 1536 SNPs and led to develop as the BIO-Rice OPA-1 is shown in Figure 1.

SNP Performance and Quality
The system using GoldenGate assay and iScan technology could genotype 96 samples and 1536 SNP markers in a single plate. All SNP genotyping data generated from Bead Array system were scored using the Illumina GenomeStudio genotyping software with a no call threshold of 0.25. Scoring of SNP genotyping data using the GenomeStudio software generally produced three clear clusters denoting the AA homozygote, BB homozygote and AB heterozygote. Panels with the normal genoplot usually showed three clear clusters, AA (homozygote), AB (heterozygote) and BB (homozygote) ( Fig. 2A and B). In addition, the SNPs showing different cluster patterns were also observed ( Fig. 2C and 2D). In this case, the genotypes of the rice samples could not be clearly clustered into the three definite clusters. The phenomenon is commonly observed in genotyping studies using SNP chip technology due to the nature of the SNPs studies in that particular rice samples.

Allele Frequency Estimates in Diverse Rice Germplasm
Allele frequency was calculated for characterizing the differentiation patterns of genetic diversity in the rice samples tested. Based on the full genotype data set, the minor allele frequency (MAF) and cluster separa-    tion of the 1536 SNPs used are shown on Figure 3 and 4. The distribution of MAF was observed in 10 continued classes from 0.05 to 0.5 with different numbers of SNPs in each MAF class. A total of 14.3% (219/1536) of the SNPs had a MAF of less than 0.15. It indicates that SNP markers with low MAF scores may not be informative for most diversity analysis. Alleles with very low frequencies generally have very little impact on large scale diversity studies and have a low probability of being polymorphic in mapping studies. Therefore, markers with higher MAF score should be valuable for screening a diverse source of rice germplasm. However, markers with low MAF score may be highly valuable in allele mining (Yan et al. 2009). Figure 3 shows that 918 (60%) of SNP markers showed MAF > 0.2 which were considered to have a good differentiating power in distinguishing the rice accessions tested. Figure 3 also shows that 224 (14.6%) SNPs demonstrated MAF > 0.4. These SNPs are excellent to be used for differentiating the rice accessions under study.
The cluster separation (displayed as a GeneTrain scores) provided by the GenomeStudio software was used to describe the separation of the three genotypic classes of each SNP marker used in this study. Figure 4 shows that the cluster separation scores were distributed between 0.1 and 1.0. Around 98,5% of the SNPs tested could be called successfully with most of the cluster separation scores of greater than 0.3% and less than 2% with separation scores of 0.1-0.2 (23 SNP markers) for the total of 467 rice accessions tested. Therefore, more than half SNP markers were well separated.
The MAF distribution and cluster separation of 1536 SNPs resulted from this study were comparable with MAF and cluster separation of rice and maize, which showed distribution scores between 0.05-0.5 for MAF and 0.3-1.0 for cluster separation, respectively (Yan et al. 2009;Chen et al. 2011). The reliable genotypic classes measured in this study suggests a high stringent SNP selection which is relevant with previous report in other plants (Akhunov et al. 2009;Grattapaglia et al. 2011). The diverse SNPs assay performed well both within and across rice varietal groups not with standing the high nucleotide variation is pursued.

Diversity and Population Structure Analysis
The most straight forward use of these SNP sets is to characterize the relatedness of a set of rice germplasm through a genetic analysis. Figure 5 shows the clustering and heat map of the 467 rice accessions based on the GoldenGate assay using the 1536 SNP chips. There were enormous variations of the genome profiling on the 467 rice accessions used in this study. The homozygote genome profiles were shown on double haploid rice lines, while the wide hybridizations rice lines samples demonstrated varied genome profiling indicating that they have heterozygote genome profiles.
To figure out the genomic distribution of the 284 individual subpopulation differences, the Bayesian cluster estimation of population structure was done using the STRUCTURE software. Ten replicates were performed for each value of K, the number of clusters considered. Each run used a burn-in period of 10.000 iterations. The best replicate giving the maximum likelihood was chosen as the final for K = 6. The result of this analysis areas are shown in Figure 6. The genetic relationship among the accessions in the 284-individual subpopulation showed that the samples were separated in two main different subpopulations (Fig. 6a), i.e. subpopulation containing Japonica-Tropical Japonica varietal groups and that containing Indica-based varietal groups. The genetic divergence observed in this study between the Indica and Japonica groups led to conclusion that these subspecies may represent independent domestication events (Sweeny and McCouch 2007).
In the first cluster, the close genetic relationship between the Japonica and Tropical Japonica subpopulations was indicated by the fact that they shared alleles on the domestication traits, in this case particularly in yield component and heading date traits. This result was comparable with the studies of Zhao et al. (2010) who analysed the genomic diversity of 395 O. sativa using 1536 SNP from the high quality MBML intersection data in the Oryza SNP project (McNally et al. 2009). They found that Japonica and Tropical Japonica groups are the outcomes of selection events from a single genetic pool that have been adapted to different climate conditions (Garris et al. 2005).
The second cluster is the subpopulation dominated by the Indica genetic background. Most of the Indonesian elit rice varieties were developed based on Indica genetic background. Indica has been introgressed into many other different varietal groups, including Japonica, Tropical Japonica and wild relatives, Oryza glaberrima and Oryza rufipogon. O. glaberrima is the African domesticated rice from the O. barthii (formarlly named O. breviligulata) ancestors (Vaughan et al. 2003), while O. rufipogon is the ancestor of O. sativa (Sweeny and McCouch 2007). They shared the same clusters because they have the same domestication traits (e.g. yield component, heading date and biotic and abiotic stresses). The population structure analysis of 284 individual rice accessions supported the same grouping as shown in Figure 6. It showed two main subpopulations, Indica (pink color) and Japonica (blue color) and one more block is Tropical Japonica (light blue color).

CONCLUSION
A 1536-SNP BIO-RiceOPA-1 custom chip was successfully designed based on selected SNP markers located in the region of rice heading date and yield component QTL. About 60% (918/1536) of SNP markers in the chip had a good differentiating power in scanning the 467 rice accessions with MAF > 0.2. The 1536-SNP chip is useful for large scale rice genotyping analysis for molecular breeding involving Indica-Indica, Indica-Javonica and Indica-Tropical Japonica genetic background.