Introduction

Natural products and their derivatives continue to have an important role as a resource for various pharmaceuticals and biologically active substances. Some synthesized drugs were inspired by the scaffold of natural products and, even now, natural products are undoubtedly useful for discovering pharmaceutical leads.1, 2, 3, 4, 5 However, after the great strides made in high-throughput screening (HTS) and synthetic compound libraries, natural product screening has been withdrawn in some pharmaceutical companies because it requires special know-how, and identifying active compounds takes time and is laborious.6 These drawbacks mean that, to make natural product screening feasible, it is essential to develop an optimized assay system for an optimized natural product library and to use high-level analytical techniques for dereplication.

Screening procedures that explore a natural product as a source for a lead are roughly classified into two methods: either the screening is carried out using previously purified natural compounds, or the crude extract mixture is used in a primary screening, and after the active principle has been detected, it is purified by activity-guided purification. For the former, there are plenty of commercially purified natural product libraries that can be purchased to expand the screening quality to the same rank as a synthetic compound library. However, commercially available natural products cannot provide the wide diversity and wide area of chemical space that are sought for natural products. Alternatively, building a purified natural products library by purifying the compounds ourselves would require thousands of man-hours to construct a library sufficient in number and diversity.

On the other hand, because each crude extract contains a number of compounds, constructing a crude extracts library is not as laborious as making a purified compounds library and the range of diversity may be comparable with a large systematic compound library. However, a sophisticated and robust screening system with high specificity and sensitivity is needed to detect the target compound in the various compound mixtures.7, 8 In addition, dereplication strategy is required to eliminate known and redundant compounds, indicate which peak should be efficiently purified and accumulate knowledge for structure determination. Further, if the natural product screening is synchronized with pure compound HTS, the isolation and structure elucidation must be made time competitive.

The word ‘dereplication’ was used in the first CRC Handbook of Antibiotic Compounds published in 1980 to refer to recognizing and eliminating the active substances already studied in the early stage of the screening process.9, 10 Thereafter, it has been considered a stage subsequent to the preliminary screening in the process of discovering new pharmacologically active substances in a mixture of natural products. A systematic dereplication method for crude extract samples that can be combined with HTS has been developed using LC-UV-(MS).11, 12 Dereplication was carried out in combination with microbiological study, such as identifying producer microbes by morphology and gene analysis, and sometimes used comprehensively in combination with various informatics techniques.13 Currently, the term dereplication is used in a broader sense in the screening of natural products mixtures, as shown in Figure 1. In this manuscript, we discuss dereplication and its related analytical techniques in the following five sections.

  1. 1

    Direct Detection from Microbial Colonies

  2. 2

    UHPLC-MS Profiling for Library Construction

  3. 3

    Micro-fractionation for Active Peak Identification

  4. 4

    Quantification of Small-Amount Compounds

  5. 5

    Structure Identification from Small Amounts

Figure 1
figure 1

Dereplication process in the natural products screening. A full color version of this figure is available at The Journal of Antibiotics journal online.

In this review, the dereplication process and related technologies are summarized from the aspect of analytical chemistry.

Direct detection from microbial colonies

If compounds such as mycotoxins or other nuisance compounds can be detected directly from the surface of a microbial colony that has been newly obtained, it is possible to avoid culturing unnecessarily. Alternatively, some unrecognized colonies may be useful, if they have a spectrum never seen before. MS is an effective way to directly analyze natural product samples. The recent development of ambient MS technology means that an untreated sample can be directly analyzed at atmospheric pressure,14 which prevents the need for separation by LC, and analysis can be completed in an extremely short time.

Desorption electrospray ionization (DESI) is an ambient ionization technique that uses a high voltage ion source for the electrospray to make an ionizing liquid mist. The charged liquid mist is sprayed onto the sample surface so that surface molecules are ionized in atmospheric pressure before being moved into the MS analyzer.14, 15 A modified DESI method, nano-DESI, uses two capillary systems at the ion source; the first capillary is connected to a nano-LC pump and charged at high voltage, and the second capillary is connected to an MS inlet. The joint between the first and second capillaries, termed as liquid bridge, is placed slightly above the surface of the microbial colony.16 Ionization occurs over a small area of the colony and the metabolite ions are detected. Watrous et al.17 analyzed live microbial colonies of Bacillus subtilis, Streptomyces coelicolor, Mycobacterium smegmatis and Pseudomonas aeruginosa using a nano-DESI-MS/MS instrument and monitored the metabolic profile. They demonstrated the microbial molecular networking and interference of live colonies.

The nano-DESI method does not seem to be easy to use in microbial laboratories generally because of capillary clogging. To directly analyze microbial colonies, Hsu et al.18 employed a liquid microjunction surface sampling probe (LMJ-SSP), which is a more robust direct ionization method based on the principle of electrospray ionization (ESI) that uses a coaxial tube geometry sampling probe.19 They analyzed yeast, filamentous fungi, pathogens and marine-derived bacteria collected worldwide by this method.

Liquid extraction surface analysis, automated chip-based nanoelectrospray ionization technique, was used for direct analysis of live bacterial colonies on the culture media. Kai et al.20 demonstrated that the antibiotics such as thiazolyl peptide could be correctly identified by the combination of liquid extraction surface analysis and Orbitrap.

Another atmospheric pressure ionization method is direct analysis in real time (DART), which uses the plasma-based technique. The ionization process involves an interaction between the analyte molecule and a gas (N2, Ne or He) that has been electrically excited to interact directly with the sample surface and ionize the molecule. This ion source has already been commercialized by several companies and can be used with some mass detectors. Direct analysis of microbial colonies by DART-MS has not yet been reported; although Zachariasova et al.21 tried to analyze contaminated mycotoxins directly from beer, it seems that DART-MS is not as sensitive in detecting mycotoxins as ultra high performance liquid chromatography (UHPLC)-MS. Vaclavik et al.22 analyzed 24 different mycotoxins spiked in grain using a DART-Orbitrap-MS and found that zearalenone and deoxynivalenol produced by Fusarium sp. could be analyzed quantitatively, but ions of T-2 toxin, HT-2 toxin and ochratoxin A could not be observed.

It is expected that matrix assisted laser desorption ionization (MALDI)-time of flight (TOF)-MS will be used for the same purposes as above, and the use of MS imaging is also expanding.23, 24

At present, there are only a few reports on directly detecting metabolites from a microbial colony by ambient MS technology, but in the future this technology could be utilized as a means of identifying substances quickly. The time may come when MS detectors installed beside a clean bench can be used to detect and isolate microorganisms in a culture.

UHPLC-MS profiling for library construction

When screening libraries are constructed from microbial culture extracts, a lot of strains are cultured in small bottles or flasks, extracted with the appropriate organic solvent, and the extracts are stored in storage tubes before being dispensed into 96-well microtiter plates, and from there into 384- and/or 1536-well plates. At this point, unlike a synthesized compound library, it is not known what compounds are included in the extract library or what the mixture profiles are. If the same method of screening is used as in a synthesized compound library, the same well-known compounds would be given at the top of the hit lists. Alternatively, if the samples are randomly analyzed by HPLC or LC-MS, a considerable number of crude extract samples would be found to contain almost no lead-like compounds.

Because of this situation, we realized that samples needed to be selected by a high-throughput method of analysis before constructing an extract screening library. This can be efficiently accomplished with an HPLC connected to MS and photo-diode array detectors, so that the system can analyze and identify many substances in one injection and also carry out profiling of the mixture. In 2004, UHPLC was commercialized using a column packed with chemically modified silica particles (C18, C8, Phenyl, etc.), whose diameters were less than 2 μm, and now this system is widely used.25 Because the detectors can be connected to the UHPLC, the system can obtain a large number of data points quickly, and examples of analysis are increasing.26, 27, 28, 29, 30, 31 In particular, UHPLC can analyze at a high resolution in a short time compared with HPLC, making it possible to analyze a lot of samples.

Various detectors can be connected to the UHPLC or HPLC but none are completely comprehensive. The general ionization methods of LC-MS are ESI, atmospheric pressure chemical ionization and atmospheric pressure photo ionization, each of which has both positive and negative ion modes, although it depends on the compound whether it should be detected with both ion modes or with just one.32 Sample analysis with both the positive and negative of ESI means that the detectable range is wide. Although many of the quadrupole-type low resolution (LR)-MS methods can acquire both polarities simultaneously by rapidly switching polarity, most HR-MS, such as TOF and Orbitrap, cannot switch polarity rapidly, so samples must be analyzed twice to gain data on both polarities. To analyze the data, metabolomics technology can be applied and multivariate analysis is a well-established procedure for LC-MS data.33, 34, 35, 36 Cluster analysis of LC-MS data by several types of freeware or commercial software has been reported.

We have developed a high-throughput profiling method that can analyze within 3.1 min per cycle using an instrument that combines the single quadrupole MS with UHPLC.37 Because data processing takes much time, we used LR-MS, but a higher level of identification accuracy is possible with HR-MS. The analysis was conducted using a C18 reverse phase column, and extracts of liquid culture broth (by butanol) or of solid cultures (by methanol) were analyzed by both positive and negative ion modes. Peak picking from the raw data was carried out by ACD MS Manager with IntelliXtract (the latest version is ACD Spectrus MS workbook suite V2012). Then the number of occurrences of peaks in all 16 000 samples was calculated by the peak identifier, using a combination of MS (nominal m/z) and retention time (RT, min). The profile of the crude extract components was visualized in TIBCO Spotfire software to produce a scatter plot that reveals what substances are in the sample, what their MW is and how frequently the substance occurs in the library (Figure 2). The combination of MS and RT data can help us to identify some substances.

Figure 2
figure 2

UHPLC-MS profiling of microbial extracts. Horizontal axis is retention time (min); vertical axis is mass (m/z). Each cell shows one sample. Circles indicate positive ions, and squares indicate negative ions of mass. Symbol size represents peak intensity. Frequency of appearance throughout library was indicated by color. A full color version of this figure is available at The Journal of Antibiotics journal online.

Another database (DB) for managing fraction libraries that combines MS and RT is established by Kato et al.38 and called NP Plot. This system uses not only MS data but also a UV spectrum to identify novel metabolites. Combining RT and MS has also been attempted in medicinal plant metabolite profiling.39 Bertrand et al.29 carried out profiling using the UHPLC-TOF-MS. They tried various solvents to make the extraction from solid culture and selected one (a ratio of dichloromethane:methanol:water at 64:36:8) that could dissolve both lipid and hydrophilic compounds and was therefore appropriate for profiling diverse microbial metabolites. The LC-MS data were first converted to nCDF, a common data format, then to a peak list using MZmine Software and then to a peak table. By this method, they analyzed the changes of metabolites in the co-culture of the fungi.

When UHPLC-MS analysis was first used, the scan speed of MS could not keep up with the high resolution of UHPLC, so the resolution of UHPLC was lowered on purpose. Currently, MS analysis at a sufficiently high speed is possible if we analyze the positive and negative ion modes separately, and the rate-limiting factor would be data processing, because analyzing large amounts of samples at high speed and at high resolution gives large amounts of data. However, this will be solved in the near future by the rapid development of informatics technology. Growing the number of library samples increases the time taken from isolating the microorganism culture to profiling the samples, so a server system that processes the analysis data automatically is desirable.

Micro-fractionation for active peak identification

Active peaks cannot be identified merely by looking at the data of LC-UV-MS or LC-UV. Ways of detecting biological activity using on-line systems that directly connect to HPLC have been reported previously,40, 41 but the range of assay systems they accept is limited. To identify an active component, a mixture sample needs to be fractionated, which consists of connecting a microscale fraction collector to an LC-UV-MS system and subjecting the collected fractions to various assays to reveal which peak is the active component (Figure 3). Hook et al.,11, 12 researchers at Bristol Myers Company, described their micro-fractionation process. They connected the fraction collector to HPLC-UV or HPLC-UV-MS and used microtiter plates to collect fractions that were then subjected to HTS using robotic systems. This procedure has been used widely42, 43, 44 and is called dereplication, in the narrow meaning of the word. Wagenaar45 also reported a method of constructing a fraction library that was prepared before screening.

Figure 3
figure 3

Micro-fractionation (dereplication) to identify active peaks. (a) Schematic diagram of LC-MS for dereplication. (b) View of the in-house DB (NASDAS) for natural products screening. (c) Total ion chromatogram of mixture sample by MS. (d) Activity of fractions. (e) UV spectrum of an active peak. (f) A mass spectrum (ESI+). The active compound was identified as cephalochromine from the mass and UV spectra. A full color version of this figure is available at The Journal of Antibiotics journal online.

Of the equipment currently available for micro-fractionation, the C18 reverse phase column is the most versatile of HPLC columns. For special analysis, columns such as HILLIC,46 CN, Amino and C4 are used, but they are not the first choice for analyzing unknown samples. Considering the amount of fractions to be collected, a column with a diameter of 3.0 or 4.6 mm (flow rate 1–3 ml min−1) is recommended, and a column length of 50–150 mm is convenient. For a fractionation method using a larger diameter column, Bohni et al.47 have fractionated using a 10 × 250 mm column to obtain fractions in an in vivo evaluation using Zebrafish. To connect a fraction collector and MS at the same time, a splitter and a method of optimizing the splitting ratio are required. To remove the solvent from fractions, a centrifugal evaporator is mainly used but this has two major drawbacks: the solvents used in reverse phase HPLC are water rich, which makes this step time-consuming and a bottleneck of the micro-fraction assay; second, unstable compounds may easily degrade during the evaporation. In order to avoid evaporation step, trap column (solid phase extraction cartridge) can be used to concentrate the fraction like LC-SPE-NMR technique.44 The unstable compounds might be captured by this method.

As mentioned in former chapter, the ion sources currently used in LC-MS instruments, ESI, atmospheric pressure chemical ionization and atmospheric pressure photo ionization, have been commercialized. For natural products, ESI is the most widely used, and a large number of reports have been published since the 1990s.31 There is no doubt that ESI is the ion source that can detect most things as exhaustively as possible. Also, because analysis of both positive and negative polarities simultaneously is appropriate for comprehensive detection, TFA is widely used as a pH adjuster in LC-MS because it has low UV absorption and a low boiling point, but unfortunately it cannot be used in negative ion mode because it lowers the sensitivity.

As MS instruments, the quadrupole-type system, ion trap, TOF and Orbitrap can be used for dereplication; however, TOF or Orbitrap are preferable because they are capable of HR-MS measurement. Using MS data, it is possible to search compounds from a commercially available natural products DB. We usually use Antibase (http:// www.user.gwdg.de/Bhlaatsc/antibase.htm), or Chapman & Hall Dictionary of Natural Products (CHDNP) (http://dnp.chemnetbase.com/intro/). In Antibase 2013, 41 864 metabolites derived mainly from microorganisms are listed and updated once a year; in CHDNP, all natural products are covered and updated twice a year.

Generally, it is difficult to identify compounds only by MS data, and referring to the UV spectra in addition to MS data can be helpful because a characteristic UV spectrum is easy to identify. In some DB records, such as Antibase and CHDNP, the value of UV λmax wavelength is described but not in all of them. RT also helps to identify the compounds and, although it is not possible to find RT in a commercial DB, it is possible to refer to data in published papers. Nielsen et al.48, 49 summarized the MS, UV and RT data of 475 fungal metabolites, and these data are useful for identifying compounds. Boswell et al.50 described that MS data and RT are orthogonal, so compounds can be identified more efficiently from a combination of RT and LR-MS than with only HR-MS.

MS/MS spectra can be easily obtained during micro-fractionation with LC-UV-MS analysis. Ion trap, triple quadrupole and Q-TOF-MS can acquire MS/MS data, and even single-stage quadrupole and TOF apparatus can also get MS/MS spectra using in-source collision-induced decay. Elucidating a structure by MS/MS spectra has been reported,51, 52 but it is not easy in natural products that have a complex scaffold of compounds. There is a public DB of electron impact-MS spectra, but none for the ESI-MS/MS spectrum. Recently, the public DB Mass Bank, in which 30 000 MS/MS spectra have been collected, can be used on a web environment,53 but the number of microbial natural compounds is small.

As explained earlier, UV spectra acquired by a photo-diode array detector in an HPLC or UHPLC system are important because, in contrast to MS spectra, it is easy to identify the compound group that has a characteristic UV chromophore, even in unknown derivatives. HPLC/UHPLC instruments are regulated by the software installed by the LC instrument maker, and can be classified into two classes: one is derived from the HPLC system (for example, Waters Empower) and one is from the MS system (for example, Waters MassLynx). The former (Empower, Agilent Chemstation, Thermo Dionex Chromerelon etc.) functions as a UV spectrum DB, which can store the whole spectrum of UV and is searchable making it convenient and easy to use. However, the latter system has no function for UV spectrum search in the DB, so if we want to use an MS instrument and a UV spectrum DB, we have to use two PCs for acquisition.

Accumulating in-house data on LC-UV-MS and on chemical structure is important for continuing the dereplication process, and some research groups have attempted to construct such a DB.38, 54, 55 The purpose of constructing the DB is to identify newly analyzed data and to record library data on which compounds were identified from the samples. We have constructed an in-house DB called Natural Sample Data System (NASDAS) (Figure 3).

We have described dereplication with fractionation by reverse phase HPLC as being the most versatile, but we cannot deny that there are a few substances that cannot be captured as a peak. As a method that does not use the LC, there are solid phase extraction, ion exchange and a preparative TLC. In addition, MÃ¥nsson et al.56 have designed a dereplication system (E-SPE) using a type of SPE cartridge (LH20, SAX, Oasis MAX, SCX, etc.) and then analyzing the elution pattern of activity. High-throughput 96-well fractionation using the hydrophilic-lipophilic balance plate (Oasis HLB) was employed for the dereplication and efficient discovery of thiazolyl peptides.57 The eluted fractions were analyzed by HR-FT-LC-MS. Normal phase chromatography is a powerful tool for analyzing compounds that are not suitable for reverse phase separation. It is also possible to observe the MS spectrum of the substance by DART-MS directly from TLC.58

The dereplication process involving micro-fractionation can be used not only to identify known substances but also to target peaks for purification, so that we only purify those in which a target peak has been identified. This is efficient compared with traditional purification that needs the fractionation-activity evaluation cycle to be repeated.

Quantification of small-amount compounds

The amount of microbial metabolites that are produced is extremely diverse. Some are less than 1 mg in 1 l of culture, while others produce more than 1 g in 1 l. When crude extracts are screened biologically, the sample is counted as a hit whether a large amount of metabolite with weak activity or a trace amount of metabolite with strong activity is produced. It would be helpful to be able to quantitatively determine metabolites with a certain degree of accuracy in the dereplication step of the initial screening. On the other hand, compounds purified from a small amount of culture broth at the stage of initial purification are very small and difficult to weigh. If an authentic sample exists, a comparative quantification is possible by HPLC; if not, a method of absolute quantification must be used.

In the case of LC-UV, compounds with chromophores can be quantified with relatively high sensitivity but, depending on the compound, determining the quantity without a standard sample is difficult. When the compounds have end absorption, the amount can be guessed to a certain extent, but if compounds do not have any UV absorption at all, as in the case of lipids, some macrolides and sugars, it is difficult to determine the quantity by LC-UV. In LC-MS, the sensitivity is also affected by how easily a substance ionizes, as well as on the model of MS apparatus used (even when they have the same ESI), which again makes absolute quantification difficult.

A chemiluminescent nitrogen detector (CLND) is an HPLC detector that can quantify the amount of nitrogen with high accuracy and very high sensitivity.59 However, CLND cannot detect compounds that contain no nitrogen, and it is not possible to use nitrogen-containing solvents, such as acetonitrile.

The first weight detector connectable to HPLC to be marketed was an evaporative light-scattering detector (ELSD).60 The principle of ELSD is that the sample is nebulized in the mobile phase of HPLC and, after excluding the large water droplets, solute is taken as solid particles by heating; finally, the solid particles of the sample are detected by the intensity of scattered light. A nano quantity analyte detector (NQAD)61 is a detector similar to ELSD, but the sensitivity is greater because the size of solid particles is enhanced by water vapor. Unlike NMR and CLND, signals of ELSD and NQAD correspond to the weight rather than the number of moles of substance. The relationship between the response and the amount of substance gives a straight line in the Log-Log graph in ELSD.62 The sensitivity of ELSD and NQAD is better than that of refractive index detectors.

Weight can also be detected by Corona’s charged aerosol detection (CAD) system, which was developed soon after ELSD, but has sensitivity several times greater. The measurement principle is similar to ELSD in that it makes solid particles of the solute by nebulizing the mobile phase of HPLC. The solid particles are charged with a Corona electrode, and then quantifies the charged particles using like an electrochemical detector.63 Similarly to ELSD, the signal corresponds to the analyte weight, and the relationship between the responses and the amount of substance is a straight line in the Log-Log graph. The response from the Corona CAD depends on the mobile phase, such as acetonitrile content, but if the composition of the mobile phase is maintained at a constant rate before entering the detector with another gradient pump, the so-called reverse gradient, the response reflects only the analyte weight. A second generation system, the Corona Ultra detector, has been developed and can be used with UHPLC. Hutchinson et al.61 compared the ELSD, NQAD, Corona CAD and Corona Ultra detectors. They described NQAD as the most sensitive, and Corona CAD and Corona Ultra best in terms of reproducibility. All detectors are less sensitive with semi-volatile molecules (non-polar and MW <270).

Quantification by NMR (qNMR) is a powerful tool for purified natural products that, similarly to CLND, can obtain the concentration in molar units by qNMR64, 65 and the peak area of 1H-NMR signal correlates to the number of protons. Recently, the linearity of the receiver gain of NMR instruments has been improved, making it unnecessary to use an internal or external standard for quantification.66 Currently, a pulse sequence function, named electronic reference to access in vivo concentrations (ERETIC) that is included as default on NMR instruments can be used to quantify without standard chemicals.

In the flow of natural products screening, active compounds are quantified by Corona CAD/Ultra before purification, and qNMR is useful to quantify purified compounds.

Structure identification from small amounts

It is possible to increase the efficiency of lead discovery if the structure is determined with as small amount of culture broth as possible. When the active peak is identified through micro-fractionation, and a small amount of the active compound has been isolated, the purified compound may be subjected to 1H-NMR acquisition as the first step. 1H-NMR is useful both for elucidating structure and for checking the impurity. Sometimes impurities that are invisible in LC-UV-MS can be seen in 1H-NMR, and in many cases 1H-NMR reveals whether the purification went well or not.

From microbial culture broth, many compounds can be purified by just combining solvent extraction and reversed phase preparative HPLC with gradient elution. In such a short-term purification, lipids are problematic impurities that are not detectable in LC-UV-MS analysis because they have low UV absorption, have difficulty of ionizing and high lipophilicity. In 1H-NMR data acquisition, we first recognize the lipid impurity by the large peak of its methylene chain. The contamination is adsorbed to the preparative column, and it is possible for the octadecyl group that comes off the stationary phase of the C18 column to be mixed in the purified samples. The latter may also become noticeable when a new preparative column is used in the mobile phase with a high acetonitrile ratio without washing thoroughly. To check how efficiently the column has been washed, monitoring with Corona CAD or ELSD is recommended. When an appropriate 1H-NMR spectrum has been obtained, we assign the peaks and elucidate the structure by the NMR with UV and MS data. Lang et al.67 have developed a DB named Anti Marin that can display the number of aldehyde, singlet methyl or other singlet peaks that are easily identifiable in the 1H-NMR spectrum. Even for trace compounds, easily readable NMR data can be obtained from various types of LC-NMR and Cap NMR.68 If the compound has some characteristic signals, it can be searched in the DB by entering the number of peaks.

The hyphenated technique of NMR with HPLC (LC-NMR) is a widely used technique that dispenses with the need to isolate the compound before NMR acquisition. The method is classified into two categories: one measures NMR directly on-line from LC using a flow cell, the other collects the peaks of HPLC in an SPE column and measures the eluent by heavy solvent. Some excellent reviews of LC-NMR have been published.69, 70

In 1H-NMR of a mixture sample, clearly different values of signal integrals make it easy to distinguish the substances in the mixture, but when mixed substances show similar signal intensity with very different MWs, measuring by diffusion-ordered NMR spectroscopy (DOSY) spectrum is recommended. DOSY is a method of separating the mixed signals by utilizing the difference in the coefficient of molecular diffusion.71 Recently, many NMR spectrometers capable of a magnetic field gradient have the pulse sequence installed by default, but DOSY struggles when diffusion coefficients are similar and/or when spectra overlap heavily. To overcome the problem posed by analog mixtures that show the same diffusion, a DOSY approach has been reported by Cassani et al.72 for resolving flavonoid mixtures with a slow diffusion matrix (micellar sodium dodecyl sulfate).

Even if high quality 1H-NMR spectra are obtained, sometimes we cannot elucidate the structure. In that case, the various 2D-NMR spectra such as HSQC, 1H-detected HMBC, DQF-COSY, TOCSY, NOESY and ROESY are usually tried, and a planar structure of compound can be elucidated. Presently the NMR of 600 MHz equipped with a Cryo Probe (Cold Probe), which can measure two-dimensional NMR of the proton in sample amounts of 10 μg, is considered to show a high cost benefit in determining the structure of natural products. However, structural analysis is sometimes difficult even in simple molecules; if the compound has many hetero elements and quaternary carbons, the obtained information from NMR is limited even when the MW is small.

X-ray crystallography is an excellent method for determining the structure, including the stereochemistry of the molecule, but crystallization of the compound is often difficult and requires a certain amount of sample. Inokuma et al.73 have developed an X-ray crystallographic analysis using crystalline sponge, which absorbs the compound solution into the tiny crystal of a porous complex that they developed. As the guest molecules are oriented regularly in the pores, crystallographic analysis clearly determines the absorbed guest structure along with the host frameworks. A great advantage of this method is clear when the structure of trace amounts of samples is elucidated. For example, they have succeeded in elucidating the structure from a small amount of compound separated by analytical HPLC (LC-SCD). The crystalline sponge method is not applicable to all natural products, because it requires that the molecular size should be smaller than the pores (MW of 500 or less), and the molecular polarity be slightly lower, but in the future it may become one of the most powerful methods of analyzing the structure of trace amounts of natural products.

Conclusion

This paper described various processes for screening natural products from microbial culture broth, that means identifying their active principle, and finally purifying the active compound or determining its structure. Of these processes, we focused on dereplication from the aspect of analytical chemistry.

Direct detection from the microbial colony using the ambient MS technology might be used widely. MS instruments are gradually becoming miniaturized and using these compact instruments may make identifying a substance as easy as picking the microbe from petri dish. It may be possible to identify metabolites at the same time as observing the morphology of microorganisms, so that only beneficial microorganisms are cultured.

Advances in UHPLC-MS technology have improved methods of profiling rapidly in the last 10 years. If profiles of the component are analyzed before building a mixture library, samples containing frequent hitter compounds74 or samples without any metabolites can be excluded, and as a result we can screen only useful broth. In addition, it is possible to construct a focused library in advance, because LC-MS data of a substance can predict the polarity from the RT and the MW from the MS. For example, it would be easy to build a library including metabolites with a MW of more than 600 for a protein–protein interaction (PPI) inhibitor that requires a certain molecular size.8

In dereplication by micro-fractionation, the active peaks and then the substance can be identified from the UV and MS. Dereplication can also indicate the appropriate purification condition for a preparative LC method.

Absolute quantification of the amount of active components in the early stage of purification is valuable for lead discovery by quantitatively analyzing the activity. LC-Corona CAD (or Corona Ultra) and ERETIC of 1H-NMR can quantify on a scale of 1–10 μg, even though it is impossible to actually weigh such small amounts.

Recently, X-ray crystallographic analysis technology of trace natural products, using a crystalline sponge has been developed and, in the near future, the concept of structure determination of natural products may change.

In this paper, we have presented natural product screening in terms of analytical chemistry, but constructing an appropriate assay system is the most important aspect when screening a mixture. Alternatively, there is a lot of room for considering how to fit the natural products library to the assay system; for example, how much pre-processing is needed. It is difficult to obtain a lead compound by simply applying the HTS method for a pure chemical compound to the natural product mixture library. The important thing is how to construct a mixture library that contains the diverse active compounds with an optimized assay system.