Biological diversity can be described as the extent of variation contained within nature. The overall concept of diversity has been around for some time and is regarded as an important component for ensuring and assessing adaptation mechanisms. Given the constant and consistent need of food, domesticated animals were kept, selected and bred to fulfill these demands. During the process, specialized breeds of livestock were formed, with the desirable, but compared to their wild relatives, narrower genetic diversity. Currently, the needs and challenges within the livestock sector are similar to those of our predecessors. Ideally, we would like to have populations that are well adapted, have high fitness and production. The overall fitness could be broken down to several other aspects such as better survival of the wild and prolonged productive life of the livestock populations, good health and reproduction. These characteristics should be satisfactory in the animals’ current environment, but they should maintain this adaptation when any of the external factors change. Also, the variance within the selected population is essential in achieving the selection response and genetic gain; thus, diversity is a major parameter for genetic improvement.
In order to cope with these challenges, it is essential to monitor and maintain genetic diversity within populations, that is, to ensure allelic variation within breeds. The across breed diversity is equally important, to ensure that there is a large enough genetic distance between breeds to maximize the production potential within different environments.
Until recently, the assessment of genetic diversity relied on the existence and accuracy of pedigree records. The pedigrees were a central tool in the assessment of relatedness between individuals, and consequently, the avoidance of mating overly related individuals that could lead to occurrence of recessive disorders, decline in fertility and overall fitness. Based on the rate of inbreeding (and other factors), the effective population size could be assessed, as a comparison metric of real and idealized populations. This metric then provided a reference on the overall genetic diversity of populations. However, the problem with the pedigrees is that they neither exist nor are accurate for many of the livestock breeds, which negates or seriously hampers any effort to assess genetic diversity.
With the availability of molecular markers such as microsatellites, and more recently single nucleotide polymorphisms (SNP) and whole genome sequence data, it is possible to analyze diversity on a genomic level. The dense marker genotypes negate the necessity to have historical records, and thus, all problems associated with pedigrees. It allows a more precise assessment of relatedness accounting for Mendelian sampling, and consequently, more accurate inbreeding levels. The dense marker data also enables new types of analyses, such as more precise heterozygosity estimates and ability to pinpoint exact regions of the genomes undergoing selection.
The population structure and genetic distances established groundbreaking works of Nei (1977) and Weir and Cockerham (1984) reliant on overall heterozygosity levels that can also be more precisely computed for entire populations or in some cases for individual SNPs. In case of crossbred animals, the precise admixture levels can be computed for individual animals or even chromosomes, instead of reliance on average values.
Given its obvious advantages, the dense molecular marker data should be the method of choice to analyze livestock populations. This paper reviews how SNP based genomic markers could be used to investigate selected aspects of genetic diversity. Chapter 2 introduces the Linkage disequilibrium (LD) as a core concept to all follow up applications. In Chapter 3, the estimation of inbreeding from the classical (pedigree) perspective is introduced, as well as using dense marker data. In Chapter 4, the effective population size is discussed as one of the most often used metric in livestock diversity, with computation options based also on LD and inbreeding levels. In Chapter 5, an overview about changes of allele frequencies in a population are given and Chapter 6 concludes the paper.
Linkage disequilibrium (LD) stands for the nonrandom association between alleles of different loci. The term was first used by Lewontin and Kojima (1960). Contrary to its name, however, detecting LD does not ensure linkage nor lack of equilibrium. It could be loosely defined as the measure of connectedness between alleles, a probability to be inherited together or a correlation of occurrence between allele pairs across a given population.
The LD could be expressed using various coefficients, reviewed in Slatkin (2008). The historically first measurement unit was the “disequilibrium coefficient” defined as:
which is the difference between the frequency of gametes carrying the pair of alleles A and B at two loci (PAB) and the product of the frequencies of those alleles (PA and PB). The D’ was developed by Lewontin (1964) to make up for some of the difficulties in interpretation of D. The D’ is the ratio of D to its maximum possible absolute value, given the allele frequencies, with values ranging between -1 and 1. The extreme values mean that at least one of the possible allele combinations is missing. However, the results of the D’ could be inflated if the analyzed population is small or one of the alleles is rare.
The mean squared correlation of allele frequencies at different loci (r2) is another metric of LD that was shown to be a more robust measure of LD. It is defined as:
The r2 is the most commonly used descriptor of LD in livestock genomics. Its values range between 0 and 1, with high values indicating that the two alleles are good surrogates for each other. It was found that the mean value of r2 was determined almost entirely by the population size (N), recombination fraction (c) and time, measured proportional to N (Hill and Robertson, 1968).
The linkage disequilibrium is a function of demographic events within the species, mainly bottlenecks as a mechanism underlying selection and genetic drift, and recombination rates. The directly visible outcome of this effect is the on average higher LD between alleles close to each other and its gradual decline with increasing distance on the genome. Regarding the species-specific LD levels, the overall conclusion is that less intensively selected species show lower LD across the genome. This is confirmed by low LD levels in humans, with LD extending to only a few kilo-bases (Hinds et al., 2005). In sheep (Kijas et al., 2012; Al-Mamun et al., 2015) and goats (Brito et al., 2015; Mdladla et al., 2016), the LD levels are generally lower than other livestock due to comparably lower selection intensity. In contrast, the LD is higher in cattle (Bohmanova et al., 2010; Espigolan et al., 2013; Porto-Neto et al., 2014), pigs (Du et al., 2007) and dogs (Lindblad-Toh et al., 2005).
LD is generally estimated within the chromosome. However, LD between chromosomes and breeds can vary due to differences in recombination rates, heterozygosity, genetic drift and effects of selection between chromosomes and breeds (Qanbari et al., 2010). As the inheritance of genomic region is based on haplotypes rather than single nucleotides, one would expect higher LD within established haplotype blocks. Similarly, in an inbreeding scenario, within the runs of homozygosity (ROH) segments, the LD is increased, as any one of these continuous, homozygous segments originated from a single ancestor, shaped by recombination events. Thus, one could expect a connectedness between the higher chromosome-wise LD levels and the haplotype block structure and the ROH distribution, as shown in Al-Mamun et al. (2015).
LD is also a function of inter marker distance, as the LD between adjacent markers is generally high, but decreases quickly with increasing distance. The extent of LD decay could differ between breeds and species. When comparing taurine and indicine cattle breeds, the LD is initially higher in taurine, which is attributed to a smaller effective population size and a stronger bottleneck during breed formation. The LD decays at a faster rate in the taurine breeds, however, with generally higher background LD rates in indicine cattle (Pérez O’Brien et al., 2014; Porto-Neto et al., 2014). A relatively low LD was detected for the Frizarta sheep, with r2 values of 0.18, with average inter marker distance of 31kb. As expected, the LD was decaying with increasing distance. The rate of decay among the chromosomes was variable, with more rapid decay in shorter and slower decay in longer chromosomes (Kominakis et al., 2017).
In population genetics, LD provides information for population history and both natural and artificial selection. The LD in various distances is the basis for calculation of one of the possible measurements of effective population size (Hill, 1981; Hayes et al., 2003; Tenesa et al., 2007; Mészáros et al., 2015). Also, the elevated LD values in specific genomic regions are considered as signatures of selection. The logic underlying this strategy is straightforward. When a mutation arises, it does so on an existing background haplotype characterized by complete LD between the new mutation and the linked polymorphisms (Bamshad and Wooding, 2003). Selection against recurrent deleterious mutations also reduces variation at linked loci (Charlesworth et al., 1993). This mechanism, known as “background selection”, causes the continuous removal of linked sequences along with deleterious mutations, resulting in a reduced effective population size (Kim and Stephan, 2002).
The analysis of dense marker data is based on the assumption that SNPs are in sufficient linkage disequilibrium to regions of interests. In genome wide association studies (GWAS), the significant SNPs are assumed to be in the vicinity of the quantitative trait locus (QTL) or the gene influencing the phenotype under study. This interpretation of the power to correctly define an association on the genome lead to the concept of “useful LD” (Kruglyak, 1999), which should be r2 > 0.33 according to the suggestion of Ardlie et al. (2002) to limit the required sample size. The D’ threshold for the useful LD should be higher, due to the tendency of D’ to overestimate the magnitude of LD.
Inbreeding is a result of mating between two individuals that share at least one common ancestor. Therefore, the concepts of relatedness and inbreeding are connected to each other. Historically, both measures were evaluated based on pedigrees, based on the path coefficient methodology proposed by Wright (1922).
The offspring of two related individuals would be inbred. The inbreeding coefficient is calculated as:
where F is the inbreeding coefficient of an animal Z, aX, Y is the relationship coefficient between the parents of animal Z, n and n’ are the number of generations from sire and dam of animals Z to a common ancestor A. If the common ancestor A is inbred itself, the inbreeding coefficient FA should be worked out from the pedigree (Wright, 1922).
A different measure of inbreeding is the so-called F-statistics, mostly FIS, introduced by Wright (1951). FIS ranges between -1 and 1, with the positive indicating that the mate pair is more related than the average relatedness in the population, thus we would expect lower proportion of heterozygotes as expected from the Hardy-Weinberg proportions. Negative FIS values indicate less related individuals and excess of heterozygotes. However, there are several unexpected properties of FIS showing inbreeding around zero in closely related populations, as the average population relatedness is to be considered. Moreover, the coefficient could be negative in very small populations due to allele frequency differences between males and females (Balloux and Williams, 2004). Due to these features, Kardos et al. (2016) discourages the use of FIS as a measure of individual inbreeding in a population.
With the availability of genotype marker data, the inbreeding levels could also be considered from the genomic perspective. The genomic measure of inbreeding is most commonly described by continuous homozygous segments of the genome, called “runs of homozygosity” (ROH). Such homozygous runs are assumed to be a consequence of inbreeding; thus, the ROH are chromosome segments identical-by-descent. The inbreeding coefficient (FROH) is then calculated as the proportion of the autosomal genome covered with ROH:
where ΣLROH is the total length of all of an individual’s ROHs above a specified minimum length and Lauto is the length of the autosomal genome covered by SNPs. If the chromosomes of the analyzed organism contain centromeres, these should be excluded from Lauto because they do not contain SNPs and their inclusion might inflate estimates of autozygosity if both flanking SNPs are homozygous (McQuillan et al., 2008).
The length of ROH segments is strongly influenced by the number of generations separating the inbred individual from the common ancestor(s). Inbreeding due to recent ancestors usually generates quite long identical by descent (IBD) chromosome segments, whereas IBD segments deriving from more distant ancestors tend to be shorter on average because of a higher number of meioses, and recombination events, separating the inbred individual from the ancestor (Kardos et al., 2016). Under the assumption of genetic and physical distance of approximately 1 cM = 1 Mb, the minimum ROH lengths denote the age of inbreeding. ROH segment lengths of 4, 8 and 16 Mb refer to a common ancestor 12, 6 and 3 generations ago, respectively (Howrigan et al., 2011; Curik et al., 2014). The possibility to look for ROH segments of different length also means different sets of quality control criteria. However, there is an ongoing discussion and lack of standards on what the criteria should be. The most widely used criteria are reviewed in Peripolli et al. (2016), suggesting cautious and critical interpretation when comparing studies, analyzing the density of the SNP chip used, the minimum length of ROH, the number of genotyping errors allowed, and the minimum number of SNPs allowed in a single ROH, as they are likely to greatly affect ROH-based estimates of autozygosity.
There are several advantages of genomic measures of inbreeding. First and foremost, it is possible to calculate them solely from the genotype data, even in the absence of pedigree records. Contrary to non-genomic approaches, the inbreeding could be calculated for animals, whose pedigree is dubious, incomplete, or entirely missing (Mészáros et al., 2015). The marker-based inbreeding was also shown to be the best measure using computer simulations (Kardos et al., 2015).
Studies involving ROH are increasingly common and provide valuable information about how the genome’s architecture can disclose a population’s genetic background (Peripolli et al., 2016). In addition to determination of inbreeding levels, ROH could be used for a number of other purposes. Perhaps, the most related to inbreeding is the study of genetic regions with adverse effects. As deleterious variants tend to be recessive, and thus required to occur in homozygous state to manifest, the examination of ROH regions in affected and control animals could reveal the approximate location of the deleterious allele, as shown in Drögemüller et al. (2011). Even if the selected variant is not harmful, the selection increases homozygosity around the target locus, fixing the genomic region within the population, and creating the so-called ROH islands.
The ROH regions can also reveal some rare events on the genome, such as uniparental disomy (UPD). In short, UPD is a consequence of karyotypic anomalies during the meiosis and/or mitosis occurring independently or in combination, ultimately affecting chromosome distribution (Eggermann et al., 2015). Another event is the hemizygous deletion, occurring when a part of a chromosome is deleted, and this deletion is inherited by the offspring (Mc-Carroll et al., 2006). As the SNP genotyping algorithm is not capable to detect the partially missing genotype, the results would show as a homozygous region (Huie et al., 2002), with unusual signal intensity statistics.
The genomic perspectives of inbreeding depression in wild populations are further reviewed in Kardos et al. (2016). Current applications of ROH in livestock are reviewed in Peripolli et al. (2016). The inbreeding depression in livestock is reviewed by Leroy (2014).
The effective population size (Ne) is a theoretical concept describing any given population in terms of genetic size, rather than actual number of individuals, that is, the census population (N). Formally, the Ne could be defined as the number of individuals from a Wright-Fisher population (finite and constant population size, random mating, no mutation, no selection and non-overlapping generations) that would manifest the same extent of genetic drift and inbreeding as the population in question. The theory was initially developed and later extended by Sewall Wright (Wright, 1931). The concept of effective population size is used in the population diversity studies as a benchmark to predict the rate of inbreeding and loss of genetic variation. The ratio of effective and census population could also be used to describe the relationship between the estimated and counted population size (Ne/N). Such a ratio could be as low as 0.1, an estimate from wild populations (Frankham, 1995). The Ne/N ratio is about 0.03 in livestock (Hall, 2016), a much lower value given the high census population size in livestock originating from relatively few ancestors. In addition to the extent of genetic variation within the population, the Ne also predicts the effectiveness of spreading both beneficial and harmful alleles, with quicker spread in populations with lower effective size (Charlesworth, 2009). The Ne is affected by various factors, and it could be computed in different ways. The sex ratio in the population (Wright, 1931) is one of the most frequently used definitions of Ne, computed as:
Here, the Nf is the number of females, Nm the number of males. The less frequently occurring sex has the biggest influence on Ne. Other demographic estimates of Ne involve characteristics such as reproductive success (number of offspring), effective number of breeders or fluctuations in population size (Ardren and Kapuscinski 2003).
The inbreeding, here interpreted as the correlation between maternal and paternal alleles of an individual, could also be used to compute Ne. The change in inbreeding levels per generation (ΔF) computed either from pedigree or genomic data is used to estimate Ne as:
where r2LD is the squared correlation of allele frequencies at a pair of loci, α is 2 when the impact of mutation is considered and 1 otherwise. Variable c is the genetic distance between loci in Morgans. Assuming that the population has been constant in size, the approximation of NeLD is true for t generations ago, where t = 1/(2c) (Hayes et al., 2003). In case of unlinked loci, the recombination frequency between them is 50%, that is, c = 0.5, pointing towards contemporary effective population size (Sved et al., 2013; Waples et al., 2016). Additional factors, such as the mode of inheritance (autosomal or sex linked), population age structure, changes in population size, migration and selection processes also influence Ne (Charlesworth, 2009).
The Ne calculation based on the numbers of males and females is referenced by the Food and Agriculture Organization of the United Nations (FAO). The recommendation, as a rule of thumb, is to keep the Ne above 50 to ensure that inbreeding levels are below F = 0.01 per year. This inbreeding threshold, however, should be viewed as a short-term criterion, rather than the ultimate goal, as continuous inbreeding leads to a gradual attrition of genetic variation. A population with consistent Ne around 50 will lose about one-fourth of its genetic variation after 20 to 30 generations, and along with it, much of its capacity to adapt to the changing conditions (Barthelmes, 1983).
The 50/500 concept was proposed as a benchmark when considering risk of extinction. Populations with Ne below 50 are at an extreme risk of extinction, while those with Ne less than 500 are in a long-term risk of extinction. In a recent study, however, Frankham et al. (2014) argue that a threshold with Ne > 50 is too low to prevent inbreeding depression, and Ne > 100 should be considered instead. Furthermore, the authors argue that Ne > 1000 would be required to retain evolutionary potential. While Frankham et al. (2014) refer to wild populations with considerably higher Ne, the thresholds are stunningly high for livestock. The contradiction between the suggested and observed Ne values demonstrates well the elusive concept of the Ne. While the Ne is much lower than 1000 in all commercial livestock populations, they achieve considerable genetic gains each year, which could be compared to evolutionary potential of wild populations. The question arises where the biological limits of any given species are. With the selection based on genetic merit using conventional breeding values, genomic selection or the recent promise of new frontiers uncovered by genome editing reminds us time and time again about our poor understanding of what is possible to achieve.
Apart from the presented methods, the effective population size could be assessed by the changes of allele frequencies over time (Pollak, 1983; Waples, 1989; Jorde and Ryman, 1995) or heterozygote excess in small populations (Balloux and Williams, 2004). Further perspectives of Ne estimation are summarized for pedigree-based estimators in Caballero (1994), wildlife specific aspects in Frankham (1995), the effects of spatial-temporal stratifications in Waples (2010) and the molecular genetic approaches in Charlesworth et al. (1993), Charlesworth (2009) and Lanfear et al. (2014).
On the population level, there is an ongoing competition between the genetic drift, that is, the random fluctuation of allele frequencies, and selection, that is, the tendency to fix certain alleles in the population. While the genetic drift is likely to affect small populations by the spread of non-beneficial mutations, the fixation of selected regions of the genome in large populations likely contains beneficial effects (Hallatschek et al., 2007).
These two counteracting processes, together with inbreeding, played a central role in the domestication processes. While the genetic drift led to random changes, the partially controlled processes of relaxed natural selection, natural selection in captivity and the fully controlled artificial selection led to the formation of livestock breeds as we know them today (Mignon-Grasteau et al., 2005).
The genetic drift was described as one of the major effects on changes of allele frequencies, counteracted by selection, if this is more intensive than commonly observed in natural populations (Lacy, 1987). In livestock, we usually observe artificial selection, but many small breeds are typically vulnerable to the effects of genetic drift (Willi et al., 2006). This is because of the small Ne, which is inversely correlated with the increase of evolutionary constraints, inbreeding and drift load. From the evolutionary perspective, there are two immediate consequences for small breeds with a small Ne: 1. In breeds without genetic variation, the evolution is constrained, no matter of the intensity of selection, unless it removes the newly arisen deleterious alleles; 2. even if genetic variation is present, only small selection responses could be expected (Kristensen et al., 2015). This happens because with a decrease of Ne, the impact of genetic drift increases, and loci start to behave as neutral, when selection coefficients become smaller or equal to 1/[2Ne] (Wright, 1931). We have to note, however, that some of the major breeds, for example, Holstein, were estimated to have a strikingly low Ne around 100, without any detrimental consequences.
All livestock breeds constantly face challenges within their environment, such as diseases, climatic conditions and pressure to maintain/increase production. In order to cope with their environment, the individuals express particular traits in response to local conditions (phenotypic plasticity). When the changed conditions persist, the response is the evolution of the breed/species via genetic selection leading to adaptation. These detectable responses compared to a situation expected purely by chance are called “selection signatures” (Kim and Stephan, 2002; Pertoldi et al., 2016).
The selection pressure increases the frequency of favorable alleles that increase the chances of survival, but also the frequency of neutral alleles in linkage disequilibrium with the favorable alleles ( and Haigh, 1974; Flori et al., 2009). Theoretical and empirical research of selection signatures indicates that selection substantially affects the levels of neutral polymorphism, either acting against deleterious mutations or favoring advantageous mutations. This means that we can use patterns of polymorphism at neutral sites to detect selection acting at the molecular level (Payseur and Nachman, 2002). Population genetic theory predicts that beneficial mutations are either lost by genetic drift or increase in frequency until they eventually become fixed in a population (Schlötterer, 2003).
A selection signature tends to drastically reduce the variation within a population but will not lead to a reduction in species specific differences. Conversely, negative selection acting on multiple loci will reduce variability between species more drastically than variability within species (Nielsen, 2005). Neutral variants linked to deleterious mutations will indirectly experience selective pressure to be removed from populations. This idea, termed as background selection (Charlesworth et al., 1993), predicts that genomic regions of reduced recombination will exhibit decreased polymorphism levels. Such genetic hitchhiking (Smith and Haigh, 1974) also predicts a reduction of polymorphism in regions of low recombination.
Apart from a within population analysis, it is possible to evaluate selection signatures by comparing genome patters in multiple populations. Because background selection is a balancing process that involves recurrent deleterious mutations, all populations are expected to respond in a roughly similar fashion. Alternatively, genetic hitchhiking may involve the fixation of beneficial mutations in one population only, or the fixation of different beneficial mutations in different populations. Thus, population-specific deviations from neutrality at particular loci may identify candidate regions for genetic hitchhiking (Payseur and Nachman 2002). Analysis of these similarities or differences between the selection signatures leads to a better understanding of the genotype-phenotype map. The study of a large number of polymorphisms spread across the genome reveals aspects of the genetic structure of the population, including, in some cases, evidence of adaptive selection across the genome (Gibbs et al., 2003; Weir et al., 2005).
In animal breeding, directional selection results in loss of variation within breeds, but at the same time increases between-breed differences, given the diverse breeding goals. Selection will also increase the frequency of alleles of neutral markers in linkage disequilibrium with the favorable alleles (Smith and Haigh, 1974). In earlier works, microsatellite markers were used to detect the selection signatures in
A comprehensive overview of the selection signature methods from the last fifty years, explaining their conceptual motivations and statistical interpretations is outlined in Vitti et al. (2013). Strategies and approaches to detect positive signatures of selection are assessed in Qanbari and Simianer (2014).
The concept of evaluation of biological diversity is central to proper management of livestock breeds. An accurate assessment of such diversity characteristics enables to make qualified decisions on the application of protective measures.
This paper reviews the genomic measures of inbreeding, effective population size and allele frequency changes within the population in the form of genetic drift and selection. These and other indicators of genetic diversity could be accurately assessed, given the availability of dense molecular marker across the genome. The SNP marker based genomic data is widely considered to be the most accurate and affordable source to estimate such indicators, as it does not rely on any other requirement than the availability of a DNA sample. As a direct consequence, the approach also enables to characterize populations that would be otherwise extremely challenging, due to the lack of conventional information sources.
Priming effect of native rhizosphere bacteria on little millet ( Panicum sumatrense) Determination of some physiological parameters in spinach ( Spinacia oleraceaL.) cultivated in Kosovo Asimina triloba(L.) Dunal: Botanik, Kulturgeschichte, Kulturführung und Verwendungszwecke – Eine Literaturübersicht Asimina triloba(L.) Dunal: Langzeitstudie über 13 Sorten in Österreich Forage growth, yield, and quality responses of two hybrids of Napier and sorghum at three cutting intervals in the southern province of Sri Lanka