Selecting Chromosomes for Polygenic Traits

Or Zuk

doi:10.1101/2022.11.14.516379

Abstract

We define and study the problem of chromosomal selection for multiple complex traits. In this problem, it is assumed that one can construct a genome by selecting different genomic parts (e.g. chromosomes) from different cells. The constructed genome is associated with a vector of polygenic scores, obtained by summing the polygenic scores of the different genomic parts, and the goal is to minimize a loss function of this vector. While out of reach today, the problem may become relevant in the future with emerging future technologies, and may yield far greater gains in the loss compared to the present day technology of as embryo selection, provided that technological and ethical barriers are overcome. We suggest and study several natural loss functions relevant for both quantitative traits and disease. We propose two algorithms, a Branch-and-Bound technique, to solve the problem for multiple traits and any monotone loss function, and a convex relaxation algorithm applicable for any differentiable loss. Finally, we use the infinitesimal model for genetic architecture to approximate the potential gain achieved by chromosomal selection for multiple traits.

1 Introduction

Polygenic Scores (PS) are genetic scores predicting a phenotype of interest, by combining the contribution of multiple genomic alleles. In the last few years hundreds of polygenic scores were developed for predicting complex diseases and quantitative traits in humans [25], with the scores coefficients typically fitted in large Genome-Wide-Association-Studies (GWAS) [22, 39]. The accuracy of PSs is expected to improve significantly in the upcoming years due to increase in GWAS sample sizes, inclusion of additional populations [32], usage of whole genome sequencing (rather than genotyping using SNP-arrays) that enable to profile of additional (in particular rare) variants [14], and improvement in statistical methods for fitting such scores [5, 11, 29].

These recent advances make it possible to screen embryos for common, complex conditions and traits, when using in vitro fertilization (IVF), in a technology termed Polygenic Embryo Screening (PES). Conceptually, PES is quite straightforward: screen the potential embryos to calculate their PS for traits of interest, and select the embryo maximizing a PS (e.g. minimizing disease risk), and the technology is already offered commercially and is in use in the clinic [35]. Several works have analyzed the potential benefits and risks using theoretical and empirical analysis but the effectiveness of current technology is debatable [21, 26, 35, 36]. In particular, the limited number of embryos to select for (typically not more than 5–7) may limit the potential benefits from the technology, especially when selecting for multiple traits of interest simultaneously [26].

With novel technological advances, it might be possible in the future to go beyond embryo selection, and select and combine parts of the genome of different cells. Such flexibility, is made possible, will lead to a far greater space of possibilities for selection compared to PES, with potentially significantly larger benefits in disease risk reduction. For example, chromosomal transplantation was recently demonstrated in vitro [30, 31], in order to replace a defective chromosome by a normal one. Similar technologies are used in the lab to study humanized animal models [37, 41]. For complex polygenic traits, selection of individual chromosomes or large-scale genomic regions from available cells may be more effective compared to genome engineering approaches gene editing using the CRISPR-CAS9 system [18] that affect only a one or a handful of genes.

A major technological challenge will be to determine the PS of individual cells and chromosomes in a non-invasive manner. Such methods may be available for oocytes [17]. Alternatively, phenotyping individual cells (e.g. using imaging techniques) [20] may be correlated with DNA quality [28], and may provide indirect proxy for polygenic scores. Assuming that technological and ethical issues are resolved, allowing chromosomal selection in humans, animal models or agriculture, computational methods and statistical analysis are are needed in order to fully utilize the potential of the different selection methods. The goal of this paper is to develop these methods and analysis. Specifically, we focus on chromosomal selection for multiple quantitative traits and diseases. In this problem, multiple copies are available for each chromosome (or possibly a smaller genomic part), and based on these copies’ PS we select one of them. Such choices are utilized to generate an embryo from the different chromosomal parts. We address the following two main questions:

How should the chromosomes be selected in order to maximize utility across multiple traits? When selecting for T traits, each selection c of chromosomes leads to an overall genomic score vector . A loss function is defined and our goal is to find the selection c minimizing the loss , and compare it to the loss obtained for random selection. When C copies are available for each of the M chromosomes, the total number of possible selections C^M is exponential in M, hence the need to design efficient general algorithms for the problem.
What is the expected gain when-using optimal chromosomal selection? how does it compare to the baseline, i.e. selecting embryos at random, as well as to the embryo selection procedure that is enabled by current technology and already offered to patients [26, 35]?

Our main contributions are threefold: first, we formulate the chromosomal selection problem mathematically. Second, we provide two algorithms for chromosomal selection for multiple traits and general loss functions and investigate their empirical performance. Third, we estimate the expected gain achieved by chromosome selection, for both linear loss functions where we establish an analytic approximate formula, as well as for a few nonlinear loss functions using simulations.

2 The Chromosomal Selection Problem

2.1 Background and Selection Problems for Polygenic Scores

Consider a genome composed of M distinct chromosomes, where for diploid cells we count the maternal and paternal chromosomes separately, hence for example for a human diploid cell M = 45 with 22 pairs of autosomes and the ‘XX’ pair (as explained later, in most plausible scenarios selection for the ‘XY’ pair is determined by the sex and not the scores). We associate with each chromosome a Polygenic Score (PS) vector representing the genetic contribution of the chromosome to a T traits of interest. Suppose that we have C distinct cells, each with its own genome, hence C copies are available for each chromosome overall. Our goal is to select one copy for each chromosome, possibly under constraints, yielding a full genome with desired properties in terms of the resulting polygenic score. For example, in embryo selection, the C cells are C embryos obtained from the same parents, and the selection is performed by simply choosing one of the cells, such that all selected chromosomes belong to the same cell. In chromosomal selection, it may be possible to select different chromosomes from different cells. For example, suppose that a diploid parental cell and a diploid maternal cell are available, and both are reprogrammed to create haploid sperm and oocytes cells, respectively [7, 16], and later to yield a viable (diploid) embryo after fertilization. Suppose further that it will be possible to select the maternal or paternal copy independently for each chromosomes in the created haploid cells. Hence, there are overall C = 2 copies of each chromosome, with chromosomes in each haploid cell (ignoring for simplicity the uniqueness of the sex chromosomes), and a space of 2^M overall possible resulting genomes from the embryo, depending on the selected copy at each chromosome. Figure 1(b) shows an illustration of chromosomal selection for sperm cells and oocytes. A simplified variant of the problem is obtained when an oocyte is already available, and we only select chromosomes from a diploid parental cell for sperm cell, or vice versa, hence M = 22 or 23, and the scores vector of the selected gamete can be then added to the scores of the available gamete (assumed w.l.o.g. to be 0_T), yielding a chromosomal selection problem with a smaller M value. Additional scenarios in which chromosomal selection may be possible are described in Appendix Section C.

Fig. 1:

Illustration of different types of selection based of polygenic scores for T traits. (a.) Embryo selection out of n viable embryos is performed by selecting an embryo based on its PS after fertilization and growing the embryos. (b.) Chromosomal selection with one sperm cell and one oocyte is performed by selecting for each chromosome one of the C = 2 copies and creating a fertilized embryo using all the selected chromosomes.

2.2 Preliminaries

We first introduce mathematical notations used throughout the paper. For a natural number denote the set {1, .., n} by [n]. For two natural numbers m ≤ n denote the set {m, m + 1, .., n} by [m, n]. The vector of all zeros (ones) of length n is denoted 0_n (1_n).

For a polytope , we define the projection operator .

Tensors

Let be a 3^rd order tensor with elements X_ijk. We use the _•, notation to define lower-dimensional fibers of a tensor. For example, X_ij•, denotes the vector . Similarly, X_•j•, is a matrix of size m × p containing all elements X_ijk, ∀i ∈ [m], ∀k ∈ [p]. We can also describe sub-tensors obtained by taking subsets of the indices across each dimension. For example, X_[i][j]• is a 3^rd order tensor of size i × j × p obtained by taking the first i and j coordinates, on the first and second dimension, respectively.

For a 3^rd-order tensor and a matrix , define the 2-mode tensor-by-matrix product [1, 23], as a matrix , with elements defined by:

Gaussian Distribution

For the multivariate Gaussian distribution with mean μ and variance Σ, denote by ϕ(x; μ, Σ) and Φ(x; μ, Σ) the density function and cumulative distribution function, respectively. When μ, Σ are omitted, Φ and ϕ refer to the standard multivariate Gaussian distribution with mean zero and identity covariance matrix. For a measurable set denote by the probability of the set under the Gaussian distribution, i.e. .

2.3 Chromosome selection for Multiple Traits

Let be a 3^rd order tensor of polygenic scores, where X_ijk denotes the score in chromosome i of copy j for trait k. Let c = (c₁,.., c_M) ∈ {1,.., C}^M be a selection vector. The associated selected polygenic vector is defined as , with X_{c_k} denoting its k-th element (X_c)_k, ∀k ∈ [T]. Our goal is to find the selected score vector X_c minimizing a loss function of our choice. The multi-trait chromosomal selection problem is defined as follows:

Problem 1.

Given a 3^rd order tensor of scores , and a loss function , find a vector c* ∈{1,.., C}^M minimizing the loss: .

Table 1 lists several examples of natural loss functions of interest for both quantitative and disease traits. The computational difficulty of Problem 1 above depends on the loss function . For example, when the is linear in the polygenic risk vector X_c, we can select the optimal vector for each of the M independently, and the computational problem becomes trivial. This is the case when selecting for maximizing a weighted combinations of quantitative traits. However, for non-linear loss functions (e.g. minimizing the overall disease probability over multiple diseases), selecting the best chromosome may be computationally challenging since we need to take into account the scores of all chromosomes jointly. In Section 4, we propose two algorithms for finding the optimal selection applicable for general classes of loss functions.

View this table:

Table 1:

Example loss functions and their properties. (i) For quantitative traits we define a linear loss, with a weight vector representing the importance of each trait, and where X_{c_i} denotes the i-th element of the vector X_c. (ii) In stabilizing selection [15, 33, 34], we select such that the resulting scores will be as close as possible to the mean (assumed to be zero w.l.o.g). This selection is desired when assuming that the value of a quantitative trait is already distributed around its optimum at equilibrium, and significant deviations from it may be harmful, and are penalized by a quadratic term. (iii) When the polygenic score vector determines the probability of several diseases, it is desirable to select such that the disease probabilities will be minimized. Consider T diseases with a vector of prevalences K ∈ [0, 1]^T. The binary vector D ∈{0, 1}^T represents the status of each disease, where P(D_i = 1|X_c) is defined according to the liability threshold model [9]. See details about the statistical model relating the score and other genetic and environmental contributions to the vector of phenotypes in the Appendix, Section B.1. (iv) A loss can represent the probability of being disease free for multiple diseases simultaneously.

In particular, many natural loss functions are monotone functions of the scores vector. This montonicity may be used when solving Problem 1. Formally, we define monotone loss functions with respect to the (partial) product order between vectors as follows:

Definition 1.

A vector dominates a vector , denoted x ✓ y if x_i < y_i, ∀i ∈ [n]. We say that a loss function is monotonically non-increasing if for any two vectors X_c ✓ Y_c we have .

The Gain due to Selection

Definition 2.

For a tensor X of scores, and a loss function , we define the Gain G due to chromosomal selection and Gain G_e due to embryo selection as the differences between the optimal loss and the expected loss when selecting at random, i.e. with respect to the uniform distribution over all C^M possible choices of chromosomes:

The gain is similar in spirit to previous definitions given in [21, 26], but with two major differences: First, it is defined for a general loss whereas [21, 26] defined the gain only for additive losses for quantitative and disease traits. Second, the gain in [21, 26] was defined with respect to the actual trait value, that is determined by the score, as well as non-score genetic and environmental components. Here, the gain and the loss are defined in terms of the scores only, hence the gain can be viewed as expectation over a latent variable representing the phenotype value of the previous gain. By definition, and we are interested in the expected gain of both approaches compared to random selection, and their expected (non-negative) difference. In Section B we use a statistical model for the scores to get approximate formulas for the expected gain and its dependence on model parameters for the linear loss.

3 The Expected Gain

The gain defined in eq. (2) represents the utility of chromosomal selection for a concrete set of chromosomes with scores. We are interested in statistical properties of the gain in a population, hence the need for a statistical model for the scores tensor X. That is, suppose that X ~ P_X. Hence is also a random variable determined by the scores distribution P_X and the loss . We study here expected gain , and similarly the expected gain due to embryo selection . In Section B.1 we derive a statistical model for X based on quantitative genetics principles, extending the models for whole-genome scores in [21, 26]. Under this model, the scores for the different chromosomes are independent, and the covariance matrix of a randomly selected vector X_c is denoted by Σ^(X). For the linear loss , we showed in [21] that the gain due to embryo selection is

Chromosomal selection is simple for this case and is achieved by selecting for each chromosome i the copy j minimizing w^tX_ij•. Using this property, we derive the approximate gain for chromosomal selection as (see details in Appendix B.1): where is the proportion of score variance explained by chromosome i, satisfying . The expected gain for chromosomal selection is thus roughly α_i-fold higher compared to the expected gain for embryo selection in eq. (3). For the α_i values in Table 2 in the Appendix representing human chromosomal lengths, this gives a 4.68-fold improvement. For general (nonlinear) loss functions, we compute the expected gain numerically using simulations, as is shown in Section 5.

View this table:

Table 2: Relative human chromosomes lengths

4 Algorithms for Chromosome Selection

The optimization problem 1 is difficult due to the exponential search space of size C^M. For example, selecting for 23 chromosomes in each of a single sperm and oocyte cell in humans, the number of possible selections is 2⁴⁵ ≈ 3 × 10¹³. We describe next two classes of algorithms for the problem: a Branch-and-Bound that eliminates dominated selections approach, and a relaxation of the discrete selector variables c_i to continuous vectors in the simplex. The two algorithms are applicable for different scenarios: The Branch-and-Bound techniques can be applied to any monotone loss, are guaranteed to yield an optimal solution, but their worst-case computational complexity is exponential in M. The Relaxation approach is polynomial and can be applied to any differentiable loss, but has no optimality guarantees.

4.1 A Branch-and-Bound algorithm

In the Branch-and-Bound algorithm for selecting chromosomes for monotone loss functions, we grow a tree of all possible selected chromosomes, and at each level keep only leafs not dominated by other leafs. Finally, we evaluate the loss of all leafs at the last level. A tree of depth b is represented as a collection of paths from the root to the leafs Γ = {c⁽¹⁾, .., c^(m)} where each c^(j) ∈ {1,.., C}^b represents the choices of chromosomes in the first k levels. The partial score sum is calculated: , and dominated partial score vectors are pruned. Then, each of the remaining c^(j)’s is expanded into C paths of length b + 1. A formal step-by-step description is shown in Algorithm 1. The computational complexity is determined by the number of leafs corresponding to non-dominated vectors considered at each step b, with C^b possible leafs to consider. In the worst case, the Branch-and-Bound algorithm enumerates over all leafs, hence it may run in time exponential in M, as shown in the Appendix, Section A.2.

While the worst-case computational complexity of Algorithm 1 is exponential in M, the number of vectors considered may be far lower than C^M in practice. Further pruning can also be achieved by computing upper-and lower-bounds for the optimal loss function as follows: Let X_∨(X_∧) be a vector obtained by summing over all chromosomes i the vector obtained by taking for each coordinate k the maximum (minimum) of X_ijk over j ∈ [C]. Then . Solutions violating this bound are also pruned as part of the algorithm in Step 5.

For simulated problem instances, using the model in eq. (26), the average number of leafs at each stage b was far lower than C^b, and grows roughly as C^b/2, as shown for example in Figure 2(a,b), enabling the usage of Algorithm 1 in practice for large problem instances (e.g. C = 2, M = 45).

Algorithm 1

Choosing Best Embryo (Branch-and-Bound)

Fig. 2:

(a.) The number of vectors enumerated by an algorithm (y-axis) vs. the chromosome considered, in order (x-axis), for a single run in a chromosomal selection problem with M = 23 chromosomes, C = 2 copies of each, and T = 5 traits, with a monotone disease loss. The black, red and blue line correspond respectively to the number of vectors considered in exhaustive search, Algorithm 1 (Branch-and-Bound), and Algorithm 3 (divide and conquer with 8 blocks of 3 or 2 chromosomes each). The number of vectors scanned by Algorithm 3 is approximately four orders of magnitude smaller than exhaustive search, and is maximized at the 15-th chromosome, after which most vectors are excluded due to bound violations. (b.) Average plus/minus one standard deviation of the number of vectors scanned by the different algorithms across 100 simulations, for chromosome selection problems with tensors having different values of M, from 2 to 23. On average, only ~ 100 vectors are scanned for M = 23 for Algorithm 3. (c.) Expected gain for chromosomal selection vs. embryo selection with M = 45 chromosomes and T = 5 diseases as a function of the number of chromosomal copies C for each chromosome. (c.) For the disease loss (genetic covariance matrices and disease prevalences are the same as in Figure 2), the green curve shows the average gain with optimal selection. The blue curve shows the average gain when using the relaxation Algorithm 2. simulation details are shown in the main text. (d.) For the stabilizing selection non-monotone loss, the relaxation using a closed-form solution is shown to improve upon embryo selection. The gain plateaus at C = 4 as the loss approaches zero, vs. an average loss of ≈0.2 for random selection.

Divide-and-conquer

We can improve the speed of our algorithm, by dividing the M chromosomes into groups, optimizing each of them separately, and then combining the solution in a manner where sub-vectors that cannot be extended to the optimal solution are filtered out. This procedure significantly improve performance, while still guaranteed to yield an optimal solution for monotone losses. Due to its technical details, it is described in Appendix A.3.

4.2 A Relaxation Algorithm

Algorithm 1 (Branch-and-Bound) is inapplicable for non-monotone loss functions. Moreover, even for monotonic losses, the Branch-and-Bound algorithm could be computationally intensive, taking exponential time in the worst case, hence the need for alternative algorithms.

We encode each selection c_i ∈ [T] using a one-hot vector: C_i•, = (C_i1, .., C_iT) with C_{i_{c_i}} = 1 and C_ij = 0, ∀j ≠ c_i. Next, we relax the requirement that each C_ij ∈ {0, 1}, and instead just require: C_i•, ∈Δ_T, where Δ_T denotes the T-dimensional simplex. Concatenating all selection vectors yields a stochastic matrix , and the score is given by X_c = [X ×₂C]1_M. This leads to the following relaxed problem:

Problem 2.

(relaxation):

Given a 3^rd order tensor of scores , and a loss function , find a matrix minimizing the loss: .

We solve Problem 2 using projected gradient descent, where each row of C is projected separately onto the simplex Δ_T as described in [4]. Then,closest vertex of the polytope to c* is given as an approximate solution of the original Problem 1. The details are shown in Algorithm 2. When the loss is convex, it is possible to establish convergence guarantees for the relaxed Problem 2 (see e.g. [6]), yet the original Problem 1 is computationally hard in general. For smooth losses, it may be possible to get a closed-form solution using Lagrange multipliers, as is demonstrated for the Stabilizing selection loss in Appendix, Section A.5.

Algorithm 2

Choosing Best Embryo (Relaxation)

5 Simulation Results

To examine the utility of the two algorithms, we have implemented them as part of an R package called “EmbryoSelectionCalculator”, available at https://github.com/orzuk/EmbryoSelectionCalculator (see additional details in Appendix D). We simulated embryo scores from a Matrix Gaussian distribution (see [19]). We mimicked selection of a single sperm cell and a single oocyte, giving us M = 22 + 23 = 45 and C = 2, i.e. a search space of size 2⁴⁵ ≈ 3.5 × 10¹³. We selected for T = 5 diseases with equal prevalence of 0.1, and assumed that the polygenic scores explain 20% of the liability for each disease. The relative proportion of variance explained by each chromosome for all traits was according to Table 2. We assumed an heritability of h² = 50% for all diseases liabilities, and as a consequence define the covariance matrix Σ of the non-score part ε to have 0.8 on the diagonal and 0.65 for the off-diagonal elements.

We used the sum of disease probability loss function from eq. with equal weights, and the (minus) probability of being disease-free loss functions (lines 3,4 in Table 1, respectively). The baseline loss under random selection was, as expected 0.1 × 5 = 0.5 for the first loss, and was 0.72 for the second, disease-free probability loss, slightly higher than 0.9⁵ ≈ 0.59 if diseases were uncorrelated.

We repeated the simulation 100 times, and each time computed the optimal selection strategy using Algorithms 1 and 2. The results are shown in Figure 2(c,d), as a function of the number of available copies for selection C. For the first loss, the outputs obtained by the two algorithms usually coincided, and on average the loss was reduced by 37%. For the second loss, the relaxation algorithm achieved the same solution as the exact Branch-and-Bound algorithm only in 62 out of 100 simulations, and performed worse in the rest 38 simulations, which can be expected for a non-convex loss. The average reduction for this loss was smaller, at 29%. Perhaps surprisingly, the Branch-and-Bound algorithm was faster for both losses, indicating that the trees grown for this model were always kept small. We therefore recommend using the Branch-and-Bound algorithm, and only if the tree size explodes either prune the tree by using a heuristic of keeping only the top paths at each step, or resorting to the relaxation algorithm.

6 Discussion

We have defined and formulated the chromosomal selection problem, and provided two algorithms for solving it. Our Branch-and-Bound algorithm, while exponential in the worst case, can easily be used empirically for the problem of selecting chromosomes from a single sperm cell and a single oocyte for humans, for monotone loss functions. The relaxation algorithm can handle much larger selection problem, yet the performance of the solution obtained by this algorithm may vary. Developing an efficient algorithm with optimality guarantees for major classes of loss functions is an interesting direction for future research.

While the technology for chromosomal selection is not currently available, we believe that our analysis is insightful as it may guide practitioners in the future regarding the potential utility of such technologies. As technologies improve, it may be reformulate the selection problem and adjust the algorithms to adapt to the availability of scores and the constraints on selection imposed by the technology. For example, recent imaging studies of embryos may provide information on their viability and possibly disease risk, without needing to destructively sequence the embryos. If such techniques mature, they can be combined with our computational method to estimate the score of each chromosomal copy and select based on these estimates.

Finally, while current polygenic scores are linear, improved risk predictions may be achieved in the future using nonlinear scores. Formulating the chromosomal selection problem for such nonlinear scores and dealing with the increased combinatorial complexity will posses algorithmic challenges.

Appendix

A Algorithms and Optimization Details

A.1 Notations

For a matrix X, we denote by vec(X) the column vector obtained by stacking the columns of X, from first to last. Similarly, for a 3^rd-order tensor X, we denote by mat(X) the matrix obtained by stacking the 2^nd-order fibers of X, from first to last.

There are 2^T possible binary vectors of length T, with each such vector d ∈ {0, 1}^T, corresponding to an orthant defined as O_d ≡ {(x₁,.., x_T) s.t. (−1)^d_ix_i < 0, ∀i ∈ [T]} These 2^T orthants form a disjoint union of (ignoring equalities with the axes).

In similar to eq. (1), for a vector , define the 3-mode tensor-by-vector product as the matrix , with elements defined by:

Element-wise notations

For two matrices A, B of the same size, their Hadamard product ⊙ is defined as a matrix obtained by element-wise multiplication of their elements, i.e. [A ⊙ B]_ij = a_ijb_ij. Similarly, we define their entry-wise minimum and maximum and as and . For a real number , the Hadamard power of A is defined by taking raising each element to power α, i.e. .

In the same spirit, the row-wise maximum and minimum vectors are denoted as , where and . Finally, we can similarly define a vector of indexes obtained by taking the index maximizing/minimizing the elements of A in each row, i.e. is defined as and similarly for .

A.2 Branch-and-Bound

Claim

In the worst case, the number of non-dominated vectors at stage b of Algorithm 1 is C^b.

Proof. We construct the chromosome scores as follows: Draw , ∀i ∈ [M], ∀j ∈ [C]. For each u_ij, set the vector X_ij• ≡ (u_ij, u_ij, .., u_ij, (T – 1)u_ij). At stage b of Algorithm 1, any vector present is of the form for some c_i ∈ [C] and for some . With probability one all such u values for different linear combinations are different. Any vector (u, u, .., u, (T – 1)u) is Pareto-optimal among any set of vectors all sharing the same direction, hence at stage b we get a set of C^b distinct, Pareto-optimal vectors, and in this case the Branch-and-Bound Algorithm does not exclude any of them, reaching at the final stage to all C^M linear combinations.

A.3 Divide-and-conquer

Remark 1. Suppose that we divide the M chromosomes into b blocks, and let be a disjoint union of [M]. Let be the set of Pareto-optimal vectors obtained by running Algorithm 1 on X_{B_i••} for i = 1,.., b. Furthermore, define as the vectors obtained by taking in coordinate j the maximum (minimum) over all , ∀_j ∈ [S_i]. Then:

Based on eq. (6), it is possible to design an algorithm that approximates the true loss by providing upper and lower bounds. When these bounds are close to each other, we may stop the algorithm, while if they are far from each other, we may continue by taking the union of B_i’s to get fewer and larger blocks.

We can also exclude some vectors, in similar to above. Namely, Let be the optimal vector for X_{B_i••}. Then:

The upper-bound is tighter (smaller) than the upper-bound .

We can use the bound to get a divide-and-conquer approach detailed in Algorithm.

Algorithm 3

Choosing Best Embryo (Branch-and-Bound Divide-and-Conquer)

A.4 Computing the Gradient

We show for example the gradient computation for the stabilizing selection loss and for the disease loss. We also show that the relaxed optimization problem 2 is convex for the first case, and not convex for the second case.

Consider the stabilizing selection loss: . In terms of the relaxed variables, the loss becomes: where the Hadamard power and the Hadamard product ⊙ are taken element-wise. We next compute the gradient and show that the problem is convex:
Claim. The loss in eq. (8) is convex in C.
Proof. The gradient is given by:
Therefore, the gradient is:
The Hessian elements are given by:
If we vectorize the matrix to get a vector , and similarly get , the Hessian can be written in matrix form as:
Hence the Hessian is positive semi-definite, therefore the loss is convex in C.
Recall the disease loss , with the conditional disease probabilities given by the liability-threshold model, . Taking the partial derivatives with respect to the relaxation variables yields: and the gradient is, in matrix form and using the tensor-by-vector product:
We next compute the Hessian matrix, where . We have sign(α_k) = sign(z_{K_k} – X_{c_k}) therefore the sign of α_k changes as we change X_ck, hence the loss is not convex in C.

A.5 A Closed-form Relaxation

For the stabilizing selection loss, it is possible to obtain a closed-form solution to the relaxed Problem 2 of minimizing a quadratic loss under linear constraints by adding Lagrange multipliers [12]. Define:

Then:

Taking , ∀j ∈ [M], we can represent the above in matrix form: where is defined by: .

We can stack columns of C, and similarly stack fibers of the fourth-order tensor A, to get the following problem: where is a matrix in which each row contains the rows of the matrix A^{(i, j)} concatenated, is a vector obtained by stacking the columns of C, and is a matrix encoding the equality constraints given by e_ij = 1 ∀i ∈ [M], ∀j ∈ [C(i – 1) + 1, Ci] and e_ij = 0 otherwise.

The solution for the above problem is given via Lagrange multipliers as a solution of the linear system:

Since mat(A) is a sum of T matrices of rank 1, we have rank(mat(A)) ≤ T and

When T + 2M < (C + 1)M the above system has an infinite subspace of solutions. When T + 2M ≥ (C + 1)M, there is typically a unique solution for C obtained by solving the above system, giving us: and the solution vector c is obtained by: where the outer mat operation is reshaping the solution vector vec(C) into M consecutive equally-sized vectors and stacking them together to form a matrix, in which each row is maximized.

The closed-form solution in eqs. (22,23) can be used instead of Algorithm 2 for the stabilizing selection loss.

When T + 2M < (C + 1)M the inverse in eq. (22) can be replaced by the Moore-Penrose pseudo-inverse, yielding the minimal Euclidean-norm solution vec(C) which is rounded to get c.

Regularization

The relaxation in the previous section may yield outputs with many non-zero entries, that are far away from the vertices of the polytope of stochastic matrices. To obtain a solution closer to one of the vertices we add a sparsity-promoting term to the optimization problem 2. The standard L₁ regularization often employed to promote sparsity is inappropriate here, since for every row we have already hence the elements sum is constant. Several previous works have suggested algorithms for sparse projection and optimization over the simplex [24, 27]. Our space is a Cartesian product of multiple simplices, where each simplex representing a different row of the matrix C, and we employ a similar technique to promote sparsity. Specifically, we add to the optimization criteria in eq. (2) a negative quadratic loss term [27]: , where η < 0 and is the squared Frobenius norm. This term promotes solutions with a high Frobenius norm, that are likely to be concentrated on a few entries. Incorporating the additional term in the optimization is straightforward. For example, the term −2ηC is added gradient of the loss in Algorithm 2. The closed-form solution for the regularized problem with the stabilizing selection loss is obtained by simply replacing the term mat(A) with mat(A) – ηI_MC in eqs. (19–22). In similar to Ridge regression, the addition of the regularization term yields a unique solution even when the matrix in the left hand side of eq. (21) is singular and the least square solution is not unique, as is the case whenever T + 2M < (C + 1)M.

B Quantitative Genetics

To put the abstract problem presented in the previous section in the context of current practice in embryo selection, we describe here a simple quantitative genetics model for embryo selection for multiple quantitative traits and diseases.

B.1 A Statistical Model for Chromosomal Selection

We describe here a statistical model for the joint distribution of the scores and non-score components determining a phenotype in a set of C genomes and for T complex traits. The model is related to and extends models used in [21, 26] for multiple traits, with two main differences: First, we model explicitly the joint distribution of the individual chromosomes’ scores, wheres in [21, 26] a model was given for the entire genomic score. Second, [21, 26] considered embryos derived from the same two parents, yielding a specific genetic relationship matrix Σ^(C) representing the Identity-by-Descent sharing of siblings, while in our case the genetic relationship matrix may be more general depending on the selection scenario.

We assume that the genetic architecture of the traits is infinitesimal, namely that there are numerous causal variants, uniformly distributed along the genome. Denote the matrix of quantitative trait values as , where Z_ij denotes the value of trait j for the i-th copy. We can decompose Z as follows: where the error term Y represents a tensor of genetic components not accounted for by the scores X, and the error term ε represents a matrix of environmental components, both having zero mean.

We assume that all the traits have mean zero and variance 1, and further that the individual chromosome scores also have zero mean.

We further assume that the distribution of the polygenic scores X is approximately Normal in each embryo (due to the polygenic nature of most complex traits [38]), and that the joint distribution of the polygenic scores over n embryos is multivariate Gaussian,

Consider T traits normalized to have zero mean and unit variance. Let be a matrix of polygenic scores for the C copies, obtained by summing the individual chromosome scores X_i••, and similarly let . The vector of polygenic score for a single genomic copy for all traits has a covariance matrix Σ^(X) under a Normal model: where contains the variance explained by the polygenic scores of each trait, and the off-diagonal elements of Σ^(X) represent pleiotropic effects. For C full-genome copies we obtain a C × T matrix of polygenic scores with a matrix Normal distribution [13]:

The matrix Σ^(C) represents (twice) the kinship coefficients between the C full-genome copies. For example, when the copies represent sibling embryos (as is the case for embryo selection), . We assume that the chromosome scores are independent, with the scores matrix of each chromosome having the matrix Normal distribution: where is the proportion of genetic variance explained by chromosome i. The genetic variances satisfy , and this proportion is assumed to be the same for all traits, a consequence of the infinitesimal model and provided that the relative density of causal variants across the genome is similar across traits.

This contributions determines the utility of chromosomal selection, and are expected to be roughly proportional to chromosomes’ length or to their number of genes. Here, we show a numerical analysis with the (normalized) chromosomes lengths as in [2], shown in Table 2. The actual coefficients may deviate from this rough estimate and from trait to trait, based on the distribution of causal alleles along the genome for each trait. Methods for partitioning heritability [10, 40] can be used to estimate these coefficients for specific traits, and in case of significant deviations, Eq. (27) can be modified accordingly.

Similarly, the non-score genetic components are modeled as: where Σ^(X)+∑^(Y)+Σ^(e) = I_T. The matrix Σ^(X)+ ∑^(Y) is known as the genetic covariance matrix, and can be estimated from GWAS data using e.g. methods like LD-Score-Regression [3]. The diagonal elements are the narrow-sense heritabilities of the traits.

The matrix Σ^(Y)+Σ^(e) is the covariance matrix of the residuals, and determines the conditional distribution of the phenotypes vector conditioned on the scores vector. For simplicity, our model makes several standard assumptions: no shared environment (hence the identity I_C is used as a covariance matrix for ε), and no assortative mating. If these assumptions are violated, this can be encoded by the covariance matrices of our model.

The expected gain

The gain defined in eq. (2) is a random variable, with a sample space over all theoretical sets of C copies. In the following, we will derive the approximate mean of the gain for linear loss functions , as a function of the loss parameters, and of C, Σ^(C), and Σ^(X).

For embryo selection with a linear loss, selection is performed on the vector of scores , with the joint distribution:

It was shown in [21] that . Moreover, and

Using extreme value theory for the above, we get as in [21] the approximate gain from embryo selection:

Next, we will compare this result to the gain obtained from chromosomal selection. For each individual chromosome i, the distribution of the scores vectors is

Since selection is performed for each block separately, and using again the asymptotic approximation from [21] for the covariance matrices of individual chromosome’s scores, the gain can be written as:

Hence the expected gain due to chromosomal selection is roughly α_i-fold higher compared to the expected gain from embryo selection in eq. (32). For the α_i. values in Table 2, this gives a 4.68-fold difference between the gains.

B.2 Disease Traits

Consider a disease with population prevalence K and let X be the polygenic score with variance explained on the liability scale, using the liability threshold model: with z = X + ∈. The polygenic scores X¹, .., Xⁿ can be thought of as liabilities, where the actual disease score modeled as z_i = Xⁱ + ∈_i with being random variables representing both the environmental contribution as well as unaccounted for genetic effects. The resulting disease status of each individual is given by thresholding the z_i’s, D_i = 1{_{z_i}<Φ−1(K)}.

We select the embryo with maximal score X_max as in the quantitative trait example, and denote by i_max the index of this embryo. As shown in [26], the risk for disease for the embryo with maximal polygenic score is given by a convolution: and the expected (absolute) gain for the single disease loss is:

Multiple Diseases

We consider screening for multiple T diseases, with the polygenic risk scores given in eq. (26), and with prevalences vector K = (K₁,…, K_T). We need to define a loss function, representing the trade-offs of reducing risk for multiple diseases - for example the probability of being disease free. The next Section formalizes the goal of selection for multiple quantitative traits or diseases, and in addition presents the problem of chromosomal selection.

We next define the associated disease status and disease probabilities for chromosomal selection.

Definition 3.

For a vector of prevalences K ∈ [0, 1]^T and a residual covariance matrix Σ^(Y)+Σ^(e), let Y × 1_M + ε ~ N(0, Σ^(Y) + Σ^(e)). Then, the disease status is a vector random variable defined as: where the indicator function is taken element-wise.

The associated disease probability for a given binary vector d ∈ {0, 1}^T is

The marginal disease probability for disease i and status j = 0, 1 is given by:

Definition 4.

A chromosomal selection loss function is called a disease-loss function if there are vectors and a positive semi definite matrix such that can be written as follows:

If the loss above can be written as follows for a vector : then the loss function is called a linear disease-loss function.

C Modified Selection Problems

We describe here a few additional scenarios that yield the chromosomal selection Problem 1 or variants of it.

C.1 Gamete Selection

Consider an intermediate case of gamete selection, where it is possible to select a sperm and an oocyte separately, as discussed in [2]. We consider T continuous phenotypes as in Section B.1. Consider C_p haploid sperm cells, and C_m haploid oocyte cells. Denote their scores matrices by

The Gamete selection problem is to select a single sperm cell i_p ∈ [C_p] and a single oocyte i_m ∈ [C_m] minimizing the loss . See an illustration in Figure 3(a).

Fig. 3:

Illustration of different types of selection based of polygenic scores for T traits. (a.) Gamete selection may enable to choose sperm and oocyte cells based on their polygenic scores or proxy phenotypes, and fertilize the selected oocyte and sperm. (b.) In chromosomal selection, we may select for each chromosome i from a different sperm cell or oocyte j based on the score vector X_ij•. When the numbers of sperm and oocyte cells are equal, C_p = C_m = C, the problem reduces to Problem 1.

We can compute the expected gain for gamete selection with a linear loss in similar to the derivations for embryo selection. First, in similar to eq. (26), we have where Σ^(C_p), ∑^(Cm) are the covariance matrices for the sperm and oocyte cells, respectively, with (twice) kinship coefficient of , and assuming that the covariance of the scores of any sperm and oocyte cell is zero. We also assume that the trait’s variance matrices Σ^(X) are equal due to symmetry of the maternal and paternal contribution to traits (we ignore here the contribution of the sex chromosomes).

With these matrices, we get for all i_p ∈ [C_p], i_m ∈ [C_m]: and

Following the derivation for a single trait, we get:

For example, suppose that we have an equal number of sperm and oocyte cells C_p = C_m = C. Then the gain is ≈ , a -factor improvement over the gain from embryo selection with the same C shown in eq. (32).

C.2 Chromosomal Selection for Multiple Sperm Cells and Oocytes

Suppose that we face the scenario in the previous sub-section, except that it is possible to select different chromosomes from different sperm cells, and similarly different chromosomes from different oocytes (assuming the scores can be computed form the cells in a non-destructive manner). Then, we face a chromosomal selection problem similar to Problem 1, except that the number of available copies may be different for different chromosomes (either C_p or C_m), as shown in Figure 3(b). When, C_p = C_m, the problem reduces back to Problem 1.

C.3 Chromosomal Selection from Multiple Diploid Cells

Consider C/2 diploid cells (for even C), and suppose that we select for each chromosome two copies in an arbitrary manner for the fertilized embryo (for example, it may be possible to select both copies of the same diploid cells). Then, we face a chromosomal selection problem with M = 23, except that two, rather than one copy, is selected from the scores tensor X. Algorithms 1 and 2 can be adapted in a rather straightforward manner to handle this case, and their implementation and study remain for future work.

D Implementation Details

The entire algorithms and the simulation study are implemented as part of an R package called “EmbryoSelectionCalculator”, available at https://github.com/orzuk/EmbryoSelectionCalculator, with the functions related to chromosomal selection located in the chrom sub-directory. To speed-up computations, code for finding Pareto-optimal vectors was implemented in cpp, and linked using rcpp [8]. To avoid combinatorial explosion of the Branch-and-Bound algorithm, a heuristic of passing only the top B vectors at each step when the number of partial Pareto-optimal vectors exceeds B was implemented as optional, and used with the default value of B = 10, 000. An additional optional improvement to the relaxation Algorithm 2 is also implemented: instead of rounding the solution of the relaxed problem C^(t+1) to the nearest vertex of the polytope of stochastic matrices, it is possible to draw at random the selection variables c_i from a categorical distribution with values {1, .., C} and with probabilities given by the i-th row of C^(t+1). One can draw independently multiple such vectors (default value: R = 1, 000) and output the solution minimizing the loss among the resulting X_c scores vectors. Additional details about the software implementation and usage are available in the package documentation.

Footnotes

https://github.com/orzuk/EmbryoSelectionCalculator

Bibliography

[1].↵
Bader, B., Kolda, T.: Algorithm 862: Matlab tensor classes for fast algorithm prototyping. ACM Transactions on Mathematical Software (TOMS) 32(4), 635–653 (2006)
OpenUrl
[2].↵
Branwen, G.: Embryo selection for intelligence, https://www.gwern.net/embryo-selection, https://www.gwern.net/Embryo-selection
[3].↵
Bulik-Sullivan, B., Finucane, H., Anttila, V., Gusev, A., Day, F., Loh, P.R., Duncan, L., Perry, J., Patterson, N., Robinson, E., et al: An atlas of genetic correlations across human diseases and traits. Nature genetics 47(11), 1236–1241 (2015)
OpenUrl CrossRef PubMed
[4].↵
Chen, Y., Ye, X.: Projection onto a simplex. arXiv preprint arXiv:1101.6081 (2011)
[5].↵
Choi, S., Mak, T.S.H., O’Reilly, P.: Tutorial: a guide to performing polygenic risk score analyses. Nature Protocols 15(9), 2759–2772 (2020)
OpenUrl
[6].↵
Correa, R., Lemaréchal, C.: Convergence of some algorithms for convex minimization. Mathematical Programming 62(1), 261–275 (1993)
OpenUrl
[7].↵
Easley IV., C.A., Phillips, B., McGuire, M., Barringer, J., Valli, H., Hermann, B., Simerly, C., Rajkovic, A., Miki, T., Orwig, K., et al: Direct differentiation of human pluripotent stem cells into haploid spermatogenic cells. Cell reports 2(3), 440–446 (2012)
OpenUrl
[8].↵
Eddelbuettel, D., François, R.: Rcpp: Seamless R and C++ integration. Journal of Statistical Software 40(8), 1–18 (2011). https://doi.org/10.18637/jss.v040.i08
OpenUrl
[9].↵
Falconer, D.: Introduction to Quantitative Genetics. Pearson Education India (1996)
[10].↵
Finucane, H., Bulik-Sullivan, B., Gusev, A., Trynka, G., Reshef, Y., Loh, P.R., Anttila, V., Xu, H., Zang, C., Farh, K., et al: Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature Genetics 47(11), 1228 (2015)
OpenUrl CrossRef PubMed
[11].↵
Ge, T., Chen, C.Y., Ni, Y., Feng, Y.C.A., Smoller, J.: Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nature Communications 10(1), 1–10 (2019)
OpenUrl
[12].↵
Gould, N., Hribar, M., Nocedal, J.: On the solution of equality constrained quadratic programming problems arising in optimization. SIAM Journal on Scientific Computing 23(4), 1376–1395 (2001)
OpenUrl
[13].↵
Gupta, A., Nagar, D.: Matrix Variate Distributions. Chapman and Hall/CRC (2018)
[14].↵
Halldorsson, B., Eggertsson, H., Moore, K.H., Hauswedell, H., Eiriksson, O., Ulfarsson, M., Palsson, G., Hardarson, M., Oddsson, A., Jensson, B., et al: The sequences of 150,119 genomes in the UK biobank. Nature pp. 1–9 (2022)
[15].↵
Hansen, T.: Stabilizing selection and the comparative analysis of adaptation. Evolution 51(5), 1341–1351 (1997)
OpenUrl CrossRef PubMed Web of Science
[16].↵
Hayashi, K., Ogushi, S., Kurimoto, K., Shimamoto, S., Ohta, H., Saitou, M.: Offspring from oocytes derived from in vitro primordial germ cell–like cells in mice. Science 338(6109), 971–975 (2012)
OpenUrl Abstract/FREE Full Text
[17].↵
Hou, Y., Fan, W., Yan, L., Li, R., Lian, Y., Huang, J., Li, J., Xu, L., Tang, F., Xie, X., et al: Genome analyses of single human oocytes. Cell 155(7), 1492–1506 (2013) 12 O. Zuk
OpenUrl CrossRef PubMed Web of Science
[18].↵
Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J., Charpentier, E.: A programmable dual-rna–guided dna endonuclease in adaptive bacterial immunity. Science 337(6096), 816–821 (2012)
OpenUrl Abstract/FREE Full Text
[19].↵
Kamath, G.: Bounds on the expectation of the maximum of samples from a gaussian. http://www.gautamkamath.com/writings/gaussian_max.pdf (2015)
[20].↵
Kandel, M., Rubessa, M., He, Y., Schreiber, S., Meyers, S., Matter Naves, L., Sermersheim, M., Sell, G.S., Szewczyk, M., Sobh, N., et al: Reproductive outcomes predicted by phase imaging with computational specificity of spermatozoon ultrastructure. Proceedings of the National Academy of Sciences 117(31), 18302–18309 (2020)
OpenUrl Abstract/FREE Full Text
[21].↵
Karavani, E., Zuk, O., Zeevi, D., Atzmon, G., Barzilai, N., Stefanis, N., Hatzimanolis, A., Smyrnis, N., Avramopoulos, D., Kruglyak, L., et al: Screening human embryos for polygenic traits has limited utility. Cell 179(6), 1424–1435 (2019)
OpenUrl CrossRef
[22].↵
Khera, A., Chaffin, M., Aragam, K., Haas, M., Roselli, C., Choi, S., Natarajan, P., Lander, E., Lubitz, S., Ellinor, P., et al: Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics 50(9), 1219–1224 (2018)
OpenUrl CrossRef PubMed
[23].↵
Kolda, T., Bader, B.: Tensor decompositions and applications. SIAM Review 51(3), 455–500 (2009)
OpenUrl
[24].↵
Kyrillidis, A., Becker, S., Cevher, V., Koch, C.: Sparse projections onto the simplex. In: International Conference on Machine Learning. pp. 235–243 (2013)
[25].↵
Lambert, S., Gil, L., Jupp, S., Ritchie, S., Xu, Y., Buniello, A., McMahon, A., Abraham, G., Chapman, M., Parkinson, H., et al: The polygenic score catalog as an open database for reproducibility and systematic evaluation. Nature Genetics 53(4), 420–425 (2021)
OpenUrl
[26].↵
Lencz, T., Backenroth, D., Granot-Hershkovitz, E., Green, A., Gettler, K., Cho, J., Weiss-brod, O., Zuk, O., Carmi, S.: Utility of polygenic embryo screening for disease depends on the selection strategy. Elife 10(2021)
[27].↵
Li, P., Rangapuram, S., Slawski, M.: Methods for sparse and low-rank recovery under simplex constraints. arXiv preprint arXiv:1605.00507 (2016)
[28].↵
McCallum, C., Riordon, J., Wang, Y., Kong, T., You, J., Sanner, S., Lagunov, A., Hannam, T., Jarvi, K., Sinton, D.: Deep learning-based selection of human sperm with high dna integrity. Communications Biology 2(1), 1–10 (2019)
OpenUrl
[29].↵
Pattee, J., Pan, W.: Penalized regression and model selection methods for polygenic scores on summary statistics. PLoS Computational Biology 16(10), e1008271 (2020)
OpenUrl
[30].↵
Paulis, M., Castelli, A., Susani, L., Lizier, M., Lagutina, I., Focarelli, M., Recordati, C., Uva, P., Faggioli, F., Neri, T., et al: Chromosome transplantation as a novel approach for correcting complex genomic disorders. Oncotarget 6(34), 35218–35230 (2015)
OpenUrl
[31].↵
Paulis, M., Susani, L., Castelli, A., Suzuki, T., Hara, T., Straniero, L., Duga, S., Strina, D., Mantero, S., Caldana, E., et al: Chromosome transplantation: A possible approach to treat human x-linked disorders. Molecular Therapy-Methods & Clinical Development 17, 369–377 (2020)
OpenUrl
[32].↵
Privé, F., Aschard, H., Carmi, S., Folkersen, L., Hoggart, C., O’Reilly, P., Vilhjálmsson, B.: Portability of 245 polygenic scores when derived from the uk biobank and applied to 9 ancestry groups from the same cohort. The American Journal of Human Genetics 109(1), 12–23 (2022)
OpenUrl CrossRef
[33].↵
Sanjak, J., Sidorenko, J., Robinson, M., Thornton, K., Visscher, P.: Evidence of directional and stabilizing selection in contemporary humans. Proceedings of the National Academy of Sciences 115(1), 151–156 (2018)
OpenUrl Abstract/FREE Full Text
[34].↵
Schmalhausen, I.: Factors of evolution: the theory of stabilizing selection. (1949)
[35].↵
Treff, N., Eccles, J., Lello, L., Bechor, E., Hsu, J., Plunkett, K., Zimmerman, R., Rana, B., Samoilenko, A., Hsu, S., et al: Utility and first clinical application of screening embryos for polygenic disease risk reduction. Frontiers in Endocrinology 10, 845 (2019)
OpenUrl
[36].↵
Treff, N., Eccles, J., Marin, D., Messick, E., Lello, L., Gerber, J., Xu, J., Tellier, L.: Preimplantation genetic testing for polygenic disease relative risk reduction: Evaluation of genomic index performance in 11,883 adult sibling pairs. Genes 11(6), 648 (2020)
OpenUrl PubMed
[37].↵
Uno, N., Takata, S., Komoto, S., Miyamoto, H., Nakayama, Y., Osaki, M., Mayuzumi, R., Miyazaki, N., Hando, C., Abe, S., et al: Panel of human cell lines with human/mouse artificial chromosomes. Scientific Reports 12(1), 1–13 (2022)
OpenUrl
[38].↵
Visscher, P.M., Wray, N.R., Zhang, Q., Sklar, P., McCarthy, M.I., Brown, M.A., Yang, J.: 10 years of GWAS discovery: Biology, function, and translation. The American Journal of Human Genetics 101, 5 (2017)
OpenUrl CrossRef PubMed
[39].↵
Visscher, P., Wray, N., Zhang, Q., Sklar, P., McCarthy, M., Brown, M., Yang, J.: 10 years of gwas discovery: biology, function, and translation. The American Journal of Human Genetics 101(1), 5–22 (2017)
OpenUrl CrossRef PubMed
[40].↵
Yang, J., Manolio, T., Pasquale, L., Boerwinkle, E., Caporaso, N., Cunningham, J., De Andrade, M., Feenstra, B., Feingold, E., Hayes, M., et al: Genome partitioning of genetic variation for complex traits using common snps. Nature Genetics 43(6), 519 (2011)
OpenUrl CrossRef PubMed
[41].↵
Zhu, F., Nair, R., Fisher, E., Cunningham, T.: Humanising the mouse genome piece by piece. Nature Communications 10(1), 1–13 (2019)
OpenUrl

View the discussion thread.

Posted November 16, 2022.

Download PDF

Data/Code

Citation Tools

Subject Area

Genetics

Subject Areas

All Articles

Animal Behavior and Cognition (5195)
Biochemistry (11695)
Bioengineering (8714)
Bioinformatics (29105)
Biophysics (14917)
Cancer Biology (12045)
Cell Biology (17343)
Clinical Trials (138)
Developmental Biology (9403)
Ecology (14133)
Epidemiology (2067)
Evolutionary Biology (18256)
Genetics (12213)
Genomics (16752)
Immunology (11837)
Microbiology (27982)
Molecular Biology (11540)
Neuroscience (60745)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3224)
Physiology (4933)
Plant Biology (10379)
Scientific Communication and Education (1679)
Synthetic Biology (2875)
Systems Biology (7328)
Zoology (1640)

[1] [1].↵
Bader, B., Kolda, T.: Algorithm 862: Matlab tensor classes for fast algorithm prototyping. ACM Transactions on Mathematical Software (TOMS) 32(4), 635–653 (2006)
OpenUrl

[2] [2].↵
Branwen, G.: Embryo selection for intelligence, https://www.gwern.net/embryo-selection, https://www.gwern.net/Embryo-selection

[3] [3].↵
Bulik-Sullivan, B., Finucane, H., Anttila, V., Gusev, A., Day, F., Loh, P.R., Duncan, L., Perry, J., Patterson, N., Robinson, E., et al: An atlas of genetic correlations across human diseases and traits. Nature genetics 47(11), 1236–1241 (2015)
OpenUrl CrossRef PubMed

[4] [4].↵
Chen, Y., Ye, X.: Projection onto a simplex. arXiv preprint arXiv:1101.6081 (2011)

[5] [5].↵
Choi, S., Mak, T.S.H., O’Reilly, P.: Tutorial: a guide to performing polygenic risk score analyses. Nature Protocols 15(9), 2759–2772 (2020)
OpenUrl

[6] [6].↵
Correa, R., Lemaréchal, C.: Convergence of some algorithms for convex minimization. Mathematical Programming 62(1), 261–275 (1993)
OpenUrl

[7] [7].↵
Easley IV., C.A., Phillips, B., McGuire, M., Barringer, J., Valli, H., Hermann, B., Simerly, C., Rajkovic, A., Miki, T., Orwig, K., et al: Direct differentiation of human pluripotent stem cells into haploid spermatogenic cells. Cell reports 2(3), 440–446 (2012)
OpenUrl

[8] [8].↵
Eddelbuettel, D., François, R.: Rcpp: Seamless R and C++ integration. Journal of Statistical Software 40(8), 1–18 (2011). https://doi.org/10.18637/jss.v040.i08
OpenUrl

[9] [9].↵
Falconer, D.: Introduction to Quantitative Genetics. Pearson Education India (1996)

[10] [10].↵
Finucane, H., Bulik-Sullivan, B., Gusev, A., Trynka, G., Reshef, Y., Loh, P.R., Anttila, V., Xu, H., Zang, C., Farh, K., et al: Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature Genetics 47(11), 1228 (2015)
OpenUrl CrossRef PubMed

[11] [11].↵
Ge, T., Chen, C.Y., Ni, Y., Feng, Y.C.A., Smoller, J.: Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nature Communications 10(1), 1–10 (2019)
OpenUrl

[12] [12].↵
Gould, N., Hribar, M., Nocedal, J.: On the solution of equality constrained quadratic programming problems arising in optimization. SIAM Journal on Scientific Computing 23(4), 1376–1395 (2001)
OpenUrl

[13] [13].↵
Gupta, A., Nagar, D.: Matrix Variate Distributions. Chapman and Hall/CRC (2018)

[14] [14].↵
Halldorsson, B., Eggertsson, H., Moore, K.H., Hauswedell, H., Eiriksson, O., Ulfarsson, M., Palsson, G., Hardarson, M., Oddsson, A., Jensson, B., et al: The sequences of 150,119 genomes in the UK biobank. Nature pp. 1–9 (2022)

[15] [15].↵
Hansen, T.: Stabilizing selection and the comparative analysis of adaptation. Evolution 51(5), 1341–1351 (1997)
OpenUrl CrossRef PubMed Web of Science

[16] [16].↵
Hayashi, K., Ogushi, S., Kurimoto, K., Shimamoto, S., Ohta, H., Saitou, M.: Offspring from oocytes derived from in vitro primordial germ cell–like cells in mice. Science 338(6109), 971–975 (2012)
OpenUrl Abstract/FREE Full Text

[17] [17].↵
Hou, Y., Fan, W., Yan, L., Li, R., Lian, Y., Huang, J., Li, J., Xu, L., Tang, F., Xie, X., et al: Genome analyses of single human oocytes. Cell 155(7), 1492–1506 (2013) 12 O. Zuk
OpenUrl CrossRef PubMed Web of Science

[18] [18].↵
Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J., Charpentier, E.: A programmable dual-rna–guided dna endonuclease in adaptive bacterial immunity. Science 337(6096), 816–821 (2012)
OpenUrl Abstract/FREE Full Text

[19] [19].↵
Kamath, G.: Bounds on the expectation of the maximum of samples from a gaussian. http://www.gautamkamath.com/writings/gaussian_max.pdf (2015)

[20] [20].↵
Kandel, M., Rubessa, M., He, Y., Schreiber, S., Meyers, S., Matter Naves, L., Sermersheim, M., Sell, G.S., Szewczyk, M., Sobh, N., et al: Reproductive outcomes predicted by phase imaging with computational specificity of spermatozoon ultrastructure. Proceedings of the National Academy of Sciences 117(31), 18302–18309 (2020)
OpenUrl Abstract/FREE Full Text

[21] [21].↵
Karavani, E., Zuk, O., Zeevi, D., Atzmon, G., Barzilai, N., Stefanis, N., Hatzimanolis, A., Smyrnis, N., Avramopoulos, D., Kruglyak, L., et al: Screening human embryos for polygenic traits has limited utility. Cell 179(6), 1424–1435 (2019)
OpenUrl CrossRef

[22] [22].↵
Khera, A., Chaffin, M., Aragam, K., Haas, M., Roselli, C., Choi, S., Natarajan, P., Lander, E., Lubitz, S., Ellinor, P., et al: Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics 50(9), 1219–1224 (2018)
OpenUrl CrossRef PubMed

[23] [23].↵
Kolda, T., Bader, B.: Tensor decompositions and applications. SIAM Review 51(3), 455–500 (2009)
OpenUrl

[24] [24].↵
Kyrillidis, A., Becker, S., Cevher, V., Koch, C.: Sparse projections onto the simplex. In: International Conference on Machine Learning. pp. 235–243 (2013)

[25] [25].↵
Lambert, S., Gil, L., Jupp, S., Ritchie, S., Xu, Y., Buniello, A., McMahon, A., Abraham, G., Chapman, M., Parkinson, H., et al: The polygenic score catalog as an open database for reproducibility and systematic evaluation. Nature Genetics 53(4), 420–425 (2021)
OpenUrl

[26] [26].↵
Lencz, T., Backenroth, D., Granot-Hershkovitz, E., Green, A., Gettler, K., Cho, J., Weiss-brod, O., Zuk, O., Carmi, S.: Utility of polygenic embryo screening for disease depends on the selection strategy. Elife 10(2021)

[27] [27].↵
Li, P., Rangapuram, S., Slawski, M.: Methods for sparse and low-rank recovery under simplex constraints. arXiv preprint arXiv:1605.00507 (2016)

[28] [28].↵
McCallum, C., Riordon, J., Wang, Y., Kong, T., You, J., Sanner, S., Lagunov, A., Hannam, T., Jarvi, K., Sinton, D.: Deep learning-based selection of human sperm with high dna integrity. Communications Biology 2(1), 1–10 (2019)
OpenUrl

[29] [29].↵
Pattee, J., Pan, W.: Penalized regression and model selection methods for polygenic scores on summary statistics. PLoS Computational Biology 16(10), e1008271 (2020)
OpenUrl

[30] [30].↵
Paulis, M., Castelli, A., Susani, L., Lizier, M., Lagutina, I., Focarelli, M., Recordati, C., Uva, P., Faggioli, F., Neri, T., et al: Chromosome transplantation as a novel approach for correcting complex genomic disorders. Oncotarget 6(34), 35218–35230 (2015)
OpenUrl

[31] [31].↵
Paulis, M., Susani, L., Castelli, A., Suzuki, T., Hara, T., Straniero, L., Duga, S., Strina, D., Mantero, S., Caldana, E., et al: Chromosome transplantation: A possible approach to treat human x-linked disorders. Molecular Therapy-Methods & Clinical Development 17, 369–377 (2020)
OpenUrl

[32] [32].↵
Privé, F., Aschard, H., Carmi, S., Folkersen, L., Hoggart, C., O’Reilly, P., Vilhjálmsson, B.: Portability of 245 polygenic scores when derived from the uk biobank and applied to 9 ancestry groups from the same cohort. The American Journal of Human Genetics 109(1), 12–23 (2022)
OpenUrl CrossRef

[33] [33].↵
Sanjak, J., Sidorenko, J., Robinson, M., Thornton, K., Visscher, P.: Evidence of directional and stabilizing selection in contemporary humans. Proceedings of the National Academy of Sciences 115(1), 151–156 (2018)
OpenUrl Abstract/FREE Full Text

[34] [34].↵
Schmalhausen, I.: Factors of evolution: the theory of stabilizing selection. (1949)

[35] [35].↵
Treff, N., Eccles, J., Lello, L., Bechor, E., Hsu, J., Plunkett, K., Zimmerman, R., Rana, B., Samoilenko, A., Hsu, S., et al: Utility and first clinical application of screening embryos for polygenic disease risk reduction. Frontiers in Endocrinology 10, 845 (2019)
OpenUrl

[36] [36].↵
Treff, N., Eccles, J., Marin, D., Messick, E., Lello, L., Gerber, J., Xu, J., Tellier, L.: Preimplantation genetic testing for polygenic disease relative risk reduction: Evaluation of genomic index performance in 11,883 adult sibling pairs. Genes 11(6), 648 (2020)
OpenUrl PubMed

[37] [37].↵
Uno, N., Takata, S., Komoto, S., Miyamoto, H., Nakayama, Y., Osaki, M., Mayuzumi, R., Miyazaki, N., Hando, C., Abe, S., et al: Panel of human cell lines with human/mouse artificial chromosomes. Scientific Reports 12(1), 1–13 (2022)
OpenUrl

[38] [38].↵
Visscher, P.M., Wray, N.R., Zhang, Q., Sklar, P., McCarthy, M.I., Brown, M.A., Yang, J.: 10 years of GWAS discovery: Biology, function, and translation. The American Journal of Human Genetics 101, 5 (2017)
OpenUrl CrossRef PubMed

[39] [39].↵
Visscher, P., Wray, N., Zhang, Q., Sklar, P., McCarthy, M., Brown, M., Yang, J.: 10 years of gwas discovery: biology, function, and translation. The American Journal of Human Genetics 101(1), 5–22 (2017)
OpenUrl CrossRef PubMed

[40] [40].↵
Yang, J., Manolio, T., Pasquale, L., Boerwinkle, E., Caporaso, N., Cunningham, J., De Andrade, M., Feenstra, B., Feingold, E., Hayes, M., et al: Genome partitioning of genetic variation for complex traits using common snps. Nature Genetics 43(6), 519 (2011)
OpenUrl CrossRef PubMed

[41] [41].↵
Zhu, F., Nair, R., Fisher, E., Cunningham, T.: Humanising the mouse genome piece by piece. Nature Communications 10(1), 1–13 (2019)
OpenUrl