ABSTRACT
Background It is known that at least some fluorophores can act as ‘surrogate’ substrates for solute carriers (SLCs) involved in pharmaceutical drug uptake, and this promiscuity is taken to reflect at least a certain structural similarity. As part of a comprehensive study seeking the ‘natural’ substrates of ‘orphan’ transporters that also serve to take up pharmaceutical drugs into cells, we have noted that many drugs bear structural similarities to natural products. A cursory inspection of common fluorophores indicates that they too are surprisingly ‘drug-like’, and they also enter at least some cells. Some are also known to be substrates of efflux transporters. Consequently, we sought to assess the structural similarity of common fluorophores to marketed drugs, endogenous mammalian metabolites, and natural products. We used a set of some 150 fluorophores.
Results The great majority of fluorophores tested exhibited significant similarity (Tanimoto similarity > 0.75) to at least one drug as judged via descriptor properties (especially their aromaticity, for identifiable reasons that we explain), by molecular fingerprints, by visual inspection, and via the “quantitative estimate of drug likeness” technique. It is concluded that this set of fluorophores does overlap a significant part of both drug space and natural products space. Consequently, fluorophores do indeed offer a much wider opportunity than had possibly been realised to be used as surrogate uptake molecules in the competitive or trans-stimulation assay of membrane transporter activities.
INTRODUCTION
Fluorescence methods have been used in biological research for decades, and their utility remains unabated (e.g. [1–16]). Our specific interest here is in the transporter-mediated means by which small fluorescent molecules enter living cells, and our interest has been stimulated by the recognition that a given probe may be a substrate for a large variety of both influx and efflux transporters [17]. Efflux transporters are often fairly promiscuous, since their job is largely to rid cells of unwanted molecules that may have entered, although they can and do have other, important physiological roles (e.g. [18–28]), and they are capable of effluxing a variety of fluorescent probes (e.g. [29–36]). However, given that most of these probes are contemporary, synthetic molecules, the uptake transporters for which they are substrates must have evolved in nature for other purposes. These purposes may reasonably be expected to include the uptake of endogenous metabolites in multicellular organisms [37–40], as well as exogenous natural products whose uptake can enhance biological fitness (e.g. [41; 42]). This explanation does seems to hold well for synthetic, marketed pharmaceutical drugs [42].
Consequently, it seemed reasonable that existing fluorescent molecules, not specifically designed for the purpose but that are taken up by biological cells, might also bear structural similarities to endogenous substrates (metabolites) and to natural products, and potentially also to marketed drugs. If so, they might then serve as surrogate transporter substrates for them. Indeed, there are examples - so-called fluorescent false neurotransmitters - where such fluorescent analogues of natural substrates have been designed precisely for this purpose (e.g. [43; 44]). The aim of the present work was to assess the extent to which this kind of structural similarity between (i) common fluorophores used in biology and (ii) other molecular classes (endogenous mammalian metabolites, marketed pharmaceutical drugs, and known natural products) might be true. It is concluded that in structural terms common fluorophores do indeed overlap drug space significantly, and we offer an explanation based on the consonance between aromaticity, conjugated π-bonds, and fluorescence.
MATERIALS AND METHODS
Fluorophores were selected from the literature and by scanning various catalogues of fluorophores, and included well known cytochemical stains, food dyes, laser dyes and other fluorophores, including just a few marketed drugs plus fluorescent natural products. We chose only those whose structures were known publicly. The final set included 150 molecules. Supplementary Table 1 gives a spreadsheet of all the relevant data that we now discuss, including the marketed drugs, Recon2 metabolites [45] (both given also in ref 37) and a subset of 2000 natural products from UNPD (see [42; 46]).
Although there are a great many possible molecular encodings (whether using molecular fingerprints or vectors of calculated properties), each of which can give a different Tanimoto similarity, for our present purpose we chose to use only the Patterned encoding within RDKit (www.rdkit.org/). We also used the RDKit version of QED (https://www.rdkit.org/docs/source/rdkit.Chem.QED.html). Workflows were written in KNIME as per our standard methods [37-40; 42; 47-49]. t-SNE plots used the first 10 PCs (95.3% explained variance) as inputs based on 27 RDKit descriptors, and were otherwise as previously described [50].
RESULTS
Figure 1A gives a Principal Components Analysis (PCA) plot of the distribution of the four classes based on a series of descriptors in RDKit (www.rdkit.org/), while Fig 1B gives a t-SNE [51] plot of the same data. These clearly show a strong overlap between the rather limited set of fluorophores used and quite significant parts of drug space. Fig 1C gives just the fluorophores, with the nominal excitation maximum encoded in its colour. This suggests that even with just ~150 molecules we have achieved a reasonable coverage of the relevant ‘fluorophore space’, with no obvious bias, nor trend in excitation wavelengths.
Principal components and t-SNE plots of the principal components of the variance in calculated properties of the molecules used. A. The first two principal components of the variance in calculated properties of the four classes fluorophores, drugs, metabolites and natural products. Molecules are as in Table S1, with drugs and metabolites being those given in [37]. A sampling of 2000 natural products from our download [42] of UNPD was used. Descriptors were z-scores normalised and correlation filtered (threshold 0.98. B. t-SNE plot of the data in (A), using the same colour coding. C. Plot of the first two principal components of the variance of the fluorophores alone. The excitation wavelength is encoded in the colour of the markers. The size of the symbol encodes the molecular weight, indicating that much of the first PC is due to this (plus any other covarying properties).
We previously developed the use of rank order plots for summarising the relationships (in terms of Tanimoto similarities) between a candidate molecule or set of molecules and a set of targets in a library [37]. Fig 2 shows such a rank order plot, ranking for each fluorophore the most similar molecule in the set of endogenous Recon2 [37; 52] metabolites, the set of marketed drugs [37], and a random subset of 2000 of some 150,000 molecules taken [42; 49] from the Unified Natural Products Database (UNPD) [46]. This again shows very clearly that the majority of fluorophores chosen do look moderately similar (TS>0.75) to at least one drug (and even more so to representatives of the natural products database).
Ranked order of Tanimoto similarity for fluorophores vs marketed drugs , fluorophores vs Recon2 metabolites
, and fluorophores vs a 2000-member sampling of UNPD
. Each fluorophore was encoded using the RDKit ‘Patterned’ encoding, then the Tanimoto similarity for it calculated against each drug, metabolite or natural product sample. The highest value of TS for each fluorophore was recorded and those values ranked. Read from right to left.
It is also convenient [37] to display such data as a heat map [53], where a bicluster is used to cluster similar structures and the colour of the cell at the intersection encodes their Tanimoto similarity. Figure 3 shows such heatmaps for (A) fluorophores vs endogenous (Recon2 [45]) metabolites, (B) drugs, and (C) 2000 sampled natural products from UNPD. The data reflect those of Fig 2, and it is again clear that for each fluorophore there is almost always a drug or a natural product for which the average Tanimoto similarity is significantly greater than 0.7.
Heat maps illustrating the Tanimoto similarities (using the RDKit patterned encoding) between our selected fluorophores and (A) Recon2 metabolites, (B) Drugs, and (C) a subset of 2000 natural products from UNPD.
While is rather arbitrary, to say the least (given how the Tanimoto similarity varies with the encoding used), as to whether a particular chemical structure is seen by humans as ‘similar’ to another, we provide some illustrations that give a feeling for the kinds of similarity that may be observed.
Thus (Fig 4A) we illustrate the drugs closest to fluorescein in t-SNE space (as per Figure 1B), since fluorescein is a very common fluorophore, is also widely used in ophthalmology (e.g. [54; 55]), and can enter cells via a variety of transporters [56] such as monocarboxylate transporters (SLC16A1, SLC16A4) [57], SLC01B1/3B1 [58; 59] and SLC22A20 [60] (see also Table 1).
Observable structural similarities between selected fluorophores and drugs. The chosen molecules are (A) fluorescein, (B) dapoxyl (both fluorophores) and (C) nitisinone (a drug). Data are annotated and/or zoomed from those in Fig 1B.
Fluorescein is similar in t-SNE space (Fig 4A) to a variety of drugs. This similarity is not at all related to the class of drug, however, as close ones include balsalazide (an anti-inflammatory used in inflammatory bowel disease [61]), bentiromide (a peptide used for assessing pancreatic function [62]), butenafine (a topical antifungal [63]), sertindole (an atypical antipsychotic), and tolvaptan (used in autosomal dominant polycystic kidney disease [64]). Similar remarks may be made of dapoxyl (Fig 4B). Note, of course, that the t-SNE plots are based on property descriptors, while the Tanimoto distances are based on a particular form of molecular fingerprint, so a priori we do not necessarily expect the closest molecules to be the same in the two cases. In addition, we note that molecules with different scaffolds may be quite similar; in the cheminformatics literature this is known as ‘scaffold hopping’ (e.g. [65–70]).
For a drug, we picked nitisinone, a drug active against hereditary tyrosinaemia type I [71] and alkaptonuria [72; 73], as it is surrounded in t-SNE space (Fig 4C) by several tricyclic fluorophores, that do indeed share similar structures (Fig 4C).
Bickerton and colleagues [74] introduced the concept of the quantitative estimate of drug-likeness (QED) (but see [75]), and it is of interest to see how ‘drug-like’ our four classes of molecule are by their criteria. Fig 5A shows the distribution of QED drug-likenesses for marketed drugs, for Recon2 metabolites, for our selected fluorophores, and for a sample of 2000 molecules from UNPD. Our fluorophores are noticeably more similar to drugs than are endogenous metabolites, and roughly as similar to drugs as are natural products (Fig 5A).
Distribution of quantitative estimate of drug-likeness (QED) values in different classes of molecule. A. Cumulative distributions for the four classes. B. Relationship between QED and aromaticity for the four classes as encoded by the fraction of C atoms exhibiting sp3 bonding. QED values were calculated using the RDKit Python code as described in Methods and plotted in (A) using ggplot2 and in (B) using Spotfire. C. Density distribution of fraction of C atoms with sp3 bonding. D. Histogram of distributions of numbers of aromatic rings in the four given classes.
Given that essentially all drugs are similar to at least one natural product [42], this is entirely consistent with our thesis that most fluorophores do look rather like one or more marketed drugs. One aspect in which (a) drugs and fluorophores differ noticeably from (b) metabolites and natural products is the extent to which they exhibit aromaticity, here encoded (Fig 5B, on the abscissa) via the fraction of carbon atoms showing sp3 hybridisation (i.e. non-aromatic). This is shown as a distribution in Fig 5C. There is clearly a significant tendency for drugs to include (planar) aromatic rings, and although this is changing somewhat [76–80] there are strong thermodynamic reasons as to why this should be so (see Discussion). The modal number of aromatic rings for both drugs and fluorophores is two, significantly greater than that (zero) for metabolites and for natural products (Fig 5D). One reason for fluorophores to exhibit aromaticity is simple, as reasonable visible-wavelength fluorescence in organic molecules relies strongly on conjugation (e.g. [81]), to which aromatic rings can contribute strongly. This argument alone probably accounts in large measure for the drug-likeness of fluorophores.
Finally, a very recent, principled, and effective clustering method [82; 83], representing the state of the art, is that based on the Uniform Manifold Approximation and Projection (UMAP) algorithm. In a similar vein, and based on the same descriptors as used in the t-SNE plots, we show the clustering of our four classes of molecule in UMAP space, where most clusters containing drugs also contain fluorophores. Despite being based on property descriptors, the UMAP algorithm is clearly very effective at clustering molecules into structurally related classes.
DISCUSSION
The basis of the main idea presented here is that the structures of common fluorophores are sufficiently similar to those of many drugs as to provide suitable surrogates for assessing their uptake via solute carriers of the SLC (and indeed their efflux via ABC) families. While the latter transporters are well known to be rather promiscuous, and to transport a variety of fluorophores [34; 36; 84-86], considerably less attention has been paid to the former. Of course some marketed pharmaceutical drugs that are transported into cells are in fact naturally fluorescent, including molecules such as anthracyclines [87–89], mepacrine (atebrin, quinacrine) [90], obatoclax [91; 92], tetracycline derivatives [88; 93] and topotecan [94], The same is true of certain vitamins such as riboflavin [95; 96] (that necessarily have transporters, as cells cannot synthesise them), as well as certain bioactive natural products (e.g. [97–99]). As an illustration, and as a complement to our detailed gene knockout studies [17], Table 1 gives an indication of dyes whose interaction with specific transporters has been demonstrated directly. In some cases, their surrogacy as a substrate for a transporter with a known non-fluorescent substrate is clear, and as mentioned in the introduction they are sometimes referred to as ‘false fluorescent substrates’. Overall, while not intended to be remotely exhaustive, this Table does serve to indicate the potentially widespread activity of transporters as mediators of fluorophore uptake, and indeed a number of such transporters are known to be rather promiscuous.
Structural similarity (or the assessment of properties based simply on analyzing structures) is an elusive concept (e.g. [125]), but as judged by a standard encoding (RDKit patterned) there is considerable similarity in structure between almost all of our chosen fluorophores and at least one drug, whether this is judged by their descriptor- or fingerprint-based properties (Figs 1-3), by observation (Figs 4, 6), or (Fig 5) via the QED [74] measure.
UMAP projection into two dimensions of the four classes of molecules, annotated by the type of molecular structure in the various clusters.
Although there is a move to phenotypic screening [126–129], many drugs were developed on the basis of their ability to bind potently in vitro to a target of interest. If the unbound molecule is conformationally very flexible, and the bound version is not, binding necessarily involves a significant loss of entropy. Potent binding (involving a significant loss in free energy) of such a molecule would thus require a very large enthalpic term. Consequently, it is much easier to find potent binders if the binding can involve flat (which implies aromatic), conformationally inflexible planar structures. Such reasoning presumably reflects the observation (Fig 5B) that drugs tend to have a low sp3 character, typically with a number of aromatic rings. Conjugated aromatic rings are also a major (physical and electronic) structure that allow fluorescence from organic molecules [130–133], with greater π-bond conjugation moving both absorbance and fluorescence toward the red end of the spectrum. Overall, these two separate roles for aromatic residues, in low entropy of binding and in electronic structure, provide a plausible explanation for much of the drug-likeness of common fluorophores.
While this study used a comparatively small set of fluorophores, increasing their number can only increase the likelihood of finding a drug (or natural product) to which they are seen to be similar. This said, this set of molecules provides an excellent starting point for the development of competitive high-throughput assays of drug transporter activity.
CONCLUSIONS
An analysis of some 150 fluorophores in common usage in biological research has shown that a very great many of them bear significant structural similarities to marketed drugs (and to natural products). This similarity holds true whether the analysis is done using structures encoded as fingerprints or via physico-chemical descriptors, by visual inspection, or via the quantitative estimate of drug likeness measure. For any given drug there is thus likely to be a fluorophore or set of fluorophores that is best suited to competing with it for uptake, and thus for determining by fluorimetric methods the QSAR for the relevant transporters. This should provide the means for rapid and convenient competitive and trans-stimulation assays for screening the ability of drugs to enter cells via SLCs.
SUPPLEMENTARY MATERIALS
The following supplementary materials are available online in the file fluorophoresSI.xIsx. Table S1: The list of all the molecules and properties used in the present analysis.
DATA AVAILABILITY
The dataset generated from (or analyzed in) the study can be found in the Supplementary Excel sheet entitled fluorophoresSI.xlsx.
AUTHOR CONTRIBUTIONS
DBK developed the idea of determining fluorophore-drug similarity. SO’H wrote and ran the majority of the workflows and created ~two-thirds of the visualisations. Both authors contributed to the analysis of the data and to the writing of the paper.
CONFLICTS OF INTEREST
The authors declare that there is no conflict of interest.
FUNDING
This research was funded by the UK BBSRC (grant BB/P009042/1).
How to cite this article
O’Hagan S, Kell DB. Structural similarities between some common fluorophores used in biology and marketed drugs, endogenous metabolites, and natural products. Pharm Front. 2019, submitted for review.
ABBREVIATIONS
- PCA
- Principal Components Analysis
- QED
- Quantitative Estimate of Drug-likeness
- QSAR
- Quantittaive Structure-Activity Relationship
- SLC
- solute carrier
- TS
- Tanimoto similarity
- UNPD
- Universal Natural Products Database
REFERENCES
- [1].↵
- [2].
- [3].
- [4].
- [5].
- [6].
- [7].
- [8].
- [9].
- [10].
- [11].
- [12].
- [13].
- [14].
- [15].
- [16].↵
- [17].↵
- [18].↵
- [19].
- [20].
- [21].
- [22].
- [23].
- [24].
- [25].
- [26].
- [27].
- [28].↵
- [29].↵
- [30].
- [31].
- [32].
- [33].
- [34].↵
- [35].
- [36].↵
- [37].↵
- [38].
- [39].
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].
- [67].
- [68].
- [69].
- [70].↵
- [71].↵
- [72].↵
- [73].↵
- [74].↵
- [75].↵
- [76].↵
- [77].
- [78].
- [79].
- [80].↵
- [81].↵
- [82].↵
- [83].↵
- [84].↵
- [85].
- [86].↵
- [87].↵
- [88].↵
- [89].↵
- [90].↵
- [91].↵
- [92].↵
- [93].↵
- [94].↵
- [95].↵
- [96].↵
- [97].↵
- [98].
- [99].↵
- [100].↵
- [101].
- [102].↵
- [103].
- [104].
- [105].
- [106].
- [107].
- [108].
- [109].
- [110].
- [111].
- [112].
- [113].
- [114].
- [115].
- [116].
- [117].
- [118].
- [119].
- [120].
- [121].
- [122].
- [123].
- [124].
- [125].↵
- [126].↵
- [127].
- [128].
- [129].↵
- [130].↵
- [131].
- [132].
- [133].↵