Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

DynPeak: An Algorithm for Pulse Detection and Frequency Analysis in Hormonal Time Series

  • Alexandre Vidal ,

    alexandre.vidal@univ-evry.fr

    Affiliation Laboratoire Analyse et Probabilités EA 2172, Université d’Évry-Val-d’Essonne, Evry, France

  • Qinghua Zhang,

    Affiliation Project-Team SISYPHE, INRIA Rennes-Bretagne Atlantique Research Centre, Rennes, France

  • Claire Médigue,

    Affiliation Project-Team SISYPHE, INRIA Paris-Rocquencourt Research Centre, Le Chesnay, France

  • Stéphane Fabre,

    Affiliation Laboratoire de Physiologie de la Reproduction et des Comportements, UMR 85 INRA, UMR 6175 CNRS, INRA Tours Research Center - Université F. Rabelais de Tours - IFCE, Nouzilly, France

  • Frédérique Clément

    Affiliation Project-Team SISYPHE, INRIA Paris-Rocquencourt Research Centre, Le Chesnay, France

Abstract

The endocrine control of the reproductive function is often studied from the analysis of luteinizing hormone (LH) pulsatile secretion by the pituitary gland. Whereas measurements in the cavernous sinus cumulate anatomical and technical difficulties, LH levels can be easily assessed from jugular blood. However, plasma levels result from a convolution process due to clearance effects when LH enters the general circulation. Simultaneous measurements comparing LH levels in the cavernous sinus and jugular blood have revealed clear differences in the pulse shape, the amplitude and the baseline. Besides, experimental sampling occurs at a relatively low frequency (typically every 10 min) with respect to LH highest frequency release (one pulse per hour) and the resulting LH measurements are noised by both experimental and assay errors. As a result, the pattern of plasma LH may be not so clearly pulsatile. Yet, reliable information on the InterPulse Intervals (IPI) is a prerequisite to study precisely the steroid feedback exerted on the pituitary level. Hence, there is a real need for robust IPI detection algorithms. In this article, we present an algorithm for the monitoring of LH pulse frequency, basing ourselves both on the available endocrinological knowledge on LH pulse (shape and duration with respect to the frequency regime) and synthetic LH data generated by a simple model. We make use of synthetic data to make clear some basic notions underlying our algorithmic choices. We focus on explaining how the process of sampling affects drastically the original pattern of secretion, and especially the amplitude of the detectable pulses. We then describe the algorithm in details and perform it on different sets of both synthetic and experimental LH time series. We further comment on how to diagnose possible outliers from the series of IPIs which is the main output of the algorithm.

Introduction

The neuroendocrine axes play a major part in controlling the main physiological functions (metabolism, growth, development and reproduction). The connection between the central nervous system and the endocrine system takes place on the level of the hypothalamus, where endocrine neurons are able to secrete hormones that target the pituitary gland. In birds and mammals, a dedicated portal system (the pituitary portal system) joins the hypothalamus and pituitary gland together. The anterior lobe of the pituitary gland (adenohypophysis) produces different hormones, which target either other endocrine glands (releasing their hormones directly into the bloodstream), exocrine glands (releasing their hormones into dedicated ducts) or non-secreting organs.

We will be particularly interested in the gonadotropic axis, that is named according to its most downstream component, the gonads (ovaries in females, testes in males). The reproductive axis is under the control of the gonadotropin-releasing hormone (GnRH), which is secreted in pulses from specific hypothalamic areas. GnRH effects on its target cells depend critically on pulse frequency and ultimately result in the differential secretion patterns of the luteinizing hormone (LH) and follicle-stimulating hormone (FSH). LH secretion pattern is clearly pulsatile, while FSH pattern is not. LH and FSH control the development of germinal cells within the gonads and the secretory activity of somatic cells. In turn, hormones secreted by the gonads (steroid hormones such as androgens, progestagens and oestrogens or peptidic hormones such as inhibin) modulate the secretion of GnRH, LH and FSH within intertwined feedback loops.

Whereas measurements of GnRH levels (in either the pituitary portal blood or the cerebrospinal fluid) cumulate anatomical and technical difficulties, LH levels can be easily assessed from jugular blood. In females, there is a clear modulation of LH pulse frequency along an ovarian cycle [1]. Pulse frequency is much lower in the luteal, progesterone-dominated phase compared to the follicular, oestradiol-dominated phase. Apart from the period surrounding ovulation, there is a good correlation between GnRH and LH pulses [2], [3], so that a precise determination of LH pulse frequency is valuable to investigate the feedback effects of gonadal hormones in different physiological or pathological situations.

LH plasma levels result from a convolution process. The instantaneous LH release rate from the pituitary gland is pulsatile, but as soon as LH enters the general circulation, it is subject to clearance effects. Simultaneous measurements of LH levels in the cavernous sinus and jugular blood [4] have revealed clear differences in the pulse shape and amplitude as well as in the baseline. Besides, experimental sampling occurs at a relatively low frequency (typically every 10 min, [5][7]) with respect to LH highest frequency release (one pulse per hour) and the resulting LH measurements are noised by both experimental and assay errors. As a result, the pattern of plasma LH may be not so clearly pulsatile. Yet, reliable information on the interpulse intervals (IPI) is a prerequisite to study precisely the steroid feedback exerted on the pituitary level. Hence, there is a real need for robust IPI detection algorithms.

In this article, we present an algorithm for the monitoring of LH pulse frequency, basing ourselves both on the available endocrinological knowledge on LH pulse (shape and duration with respect to the frequency regime) and synthetic LH data generated by a simple model. We make use of synthetic data to make clear some basic notions underlying our algorithmic choices. We focus on explaining how the process of sampling affects drastically the original pattern of secretion, and especially the amplitude of the detectable pulses. We then describe the algorithm in details and perform it on different sets of both synthetic and experimental LH time series. We further comment on how to diagnose possible outliers from the series of IPIs which is the main output of the algorithm.

Methods

A Mathematical Generator of Synthetic LH Time Series

Basing ourselves on a simple model of plasma LH level introduced in [8], we illustrate the effects of the sampling process upon a LH signal. This model combines a representative function of the pulsatile LH secretion by the pituitary gland with a term accounting for the clearance from the blood. The synthetic sampling process is designed to reproduce as close to experiments as possible the variability in the sampling times and measurements. The different steps of this construction, as well as the links between the mathematical objects and what they represent in the biological context, are illustrated in the diagram of Figure 1.

Model of Luteinizing Hormone secretion.

The pituitary gland releases LH into the blood as successive spikes characterized by a quasi-instantaneous increase followed by slower (yet quite fast) decrease (see [4]). Hence, in our model, the LH release along a spike is approximated by a discontinuous function of time: the jump accounting for the instantaneous increase in the LH release is followed by a fast exponential decrease. The interspike interval is controlled by a function of time accounting for the varying release frequency. The spike amplitude is also subject to an inter-spike variability as well as to long-term changes partly due to the time variations in the stock of LH available for release. In our model, the amplitude is controlled by another function of time . Hence, the instantaneous release of LH (expressed in ng/ml/min) in the blood by the pituitary gland is given by:(1)where ⌊x⌋refers to the greatest integer smaller than x (integer part). The exponential decay rate is directly linked to the spike half-life, , (i.e. the time taken for the instantaneous released LH quantity to drop from the maximal spike value to half this value) through:

We based our choice of parameter values on the few experiments that have investigated the LH release by the pituitary gland in the ewe from synchronous sampling in the jugular blood and cavernous sinus [4]. Accordingly, we chose a spike half-life

We represent the continuously measured LH blood level (expressed in ng/ml) as the solution of:(2)where the LH release rate LH(t) is given by equation (1). Parameter a represents the instantaneous LH clearance rate from the blood. To be consistent with the one hour half-life of LH pulses (i.e. the time taken for the blood LH level to drop from the maximal pulse value to half this value) in the jugular blood, we have fixed .

Sampling protocol and assay variance.

To mimic the experimental protocol for LH data acquisition, we extract time series from the fine step numerical integration of equation (2). This process is intended to obtain a time series of N consecutive samples similar to experimental results, i.e. a finite sequence of N couples , with i = 1,2,…,N, where is the measured LH level at time .

In most experiments, the LH data are retrieved at a fixed frequency. We describe below the corresponding synthetic process: the samples are obtained each Ts minutes from the starting time t = r (expressed in minutes). Hence, the sampling times are(3)Note that the first sample is retrieved at time . Parameter r, chosen between 0 and Ts, allows one to shift the beginning of the sampling process while using the same set of data simulated from equation (2).

To take into account the inherent variability of the experimental sampling times, we compute times near the theoretical sampling time :(4)In experiments, one naturally expects the error on the sampling times to be bounded, otherwise it would mean that successive samples could be retrieved in inverse time order. We also assume that between these bounds, the errors are equally distributed. Hence, the random numbers , are generated from an uniform distribution (using the Mersenne Twister algorithm [9]). Then, we can compute the value of the solution of equation (2) at each of these times. This amounts to choosing the i-th sampling time with equal probability in an interval centered on . It is worth noticing that a truncated normal distribution (such that inversion between two consecutive sampling times cannot happen) for the sampling time errors does not significantly change the impact of the sampling process in comparison with a uniform distribution.

We also reproduce the LH assay variance by applying a multiplicative noise on each sample. We compute:(5)where the random numbers , are also generated from an uniform distribution (Mersenne Twister algorithm). Compared to a normal distribution, this choice allows us to lower the extremal errors while enhancing the frequency of medium ranged errors in the synthetic LH measured levels. Except for large deviation-induced phenomena (that remain very rare), the choice of a normal distribution for the assay error does not impact much the pattern of synthetic LH time series, as shown in the section “Algorithm robustness to assay error” of the Appendix S1 (see Figure S4 and Table S1).

The output of the sampling process is the time series defined by the N couples , of times and corresponding measured LH levels. Figure 2 illustrates the construction of the i-th sample in a time series.

thumbnail
Figure 2. Computation of a synthetic sample.

Solution LHp(t) of equation (2) (green curve), retained sampling time (blue dotted line), real sampling time (red dotted line) randomly chosen in (blue interval), exact value of LH level at time (green dotted line), retrieved LH level (red dotted line) randomly chosen in (green interval). The output of the i-th step of the sampling process is the couple (magenta disc) of time and corresponding LH level.

https://doi.org/10.1371/journal.pone.0039001.g002

In the context of an experiment, there may be some uncertainty on the exact sampling times (i.e. the precise times at which the samples are retrieved). On the contrary, in our model of the sampling process, we can retrieve the sequence of effective sampling times for a given synthetic experiment. Figure 3 allows us to visualize the sequence (red dotted lines) compared to the registered sampling time sequence (blue dotted lines). Here, we set the sampling period Ts to 10 min and the shift constant r to 1 min, hence:

thumbnail
Figure 3. Effect of the variability of the sampling times upon the synthetic time series.

The , are the effective sampling times leading to the red-colored LH time series. The , are the expected sampling times leading to the blue-colored LH time series. The original, non-sampled time series corresponds to the green line. One can observe an instance of great discrepancy between the LH level measured at time , which corresponds to the very beginning of the ascending part of a pulse, and the LH level measured at time , which corresponds to the maximum of the same pulse.

https://doi.org/10.1371/journal.pone.0039001.g003

The maximal error is fixed to 15% of the sampling period, so that f = 1.5 min and, for each i from 1 to N, .

Moreover, Figure 3 compares the results obtained without any variability on the sampling times (blue time series) with those obtained with an error of (red time series). It illustrates that this variability can occasionally imply a great difference near a pulse maximum. The 6th blue sample, obtained at , corresponds to the theoretical pulse maximum around 2.3 ng/ml. Yet the 6th red sample, obtained one minute earlier due to the variability of the sampling time, corresponds to the preceding minimal LH level. Consequently, the local maximum of the blue time series, obtained with the 6th sample, is noticeably greater than the local maximum of the red time series, which is obtained with the 7th sample.

Model Outputs

On the endocrinological ground, a LH pulse is an increase in LH blood level triggered by the quick release of LH by the pituitary gland. As illustrated in the preceding section, the moderate clearance rate of LH from the blood underlies the specific asymmetric shape of the pulses, which is characterized by a fast increase immediately followed by a slower decrease. This property has been highlighted in dedicated studies using high frequency sampling (for instance [10]: horse, 2 samples per minute) of LH level during a short interval of time.

However, in long-time experiments, the sampling frequency is usually of the order of one per 10 minutes. Consequently, the precise shape and quantitative properties of the pulses are non longer obvious in the time series. In particular, the theoretical pulse amplitude (theoretical highest level hit during a pulse event) is most of the time not properly reflected by the highest sample obtained during the corresponding event. In the following, we introduce few notions allowing us to differentiate the properties of a theoretical pulse from those of the corresponding pulse obtained from a time series.

The advantages of synthetic time series is that the underlying signal LH(t) of LH release and the theoretical continuously measured blood LH level LHp(t) are available. This corresponds to the ideal experimental situation where one could get high-frequency sampled, variability-free time series retrieved at the same time from the cavernous sinus and jugular blood. With synthetic data, we dispose of reference sets that allow us to identify both LH spikes and pulses without any ambiguity.

Moreover, we can easily test different experimental protocols by changing the value of the parameters Ts, r, f, b controlling the sampling properties and choosing various functions and that determine the time-varying amplitude and frequency of LH spikes released by the pituitary gland.

Definitions.

For sake of clarity, we specify a few notions and terms that will be used in what follows. The definitions are illustrated by Figure 4. For a theoretical pulse (i.e. a peak in the signal LHp(t) triggered by a spike in LH(t)), we define:

  • the theoretical pulse amplitude as the maximal value hit during the event.
  • the theoretical pulse time as the time at which the level hits the theoretical pulse amplitude.
  1. For a pulse in a time series (corresponding to a theoretical pulse), we define:
  • the pulse amplitude as the maximal sample obtained during the pulse event,
  • the pulse occurrence as the sample time at which the time series hits the pulse amplitude.
thumbnail
Figure 4. Definition of pulse properties in the theoretical case versus experimental case.

For a theoretical pulse (i.e. a local maximum in the LHp(t) signal triggered by a spike in LH(t), we call “pulse time” the time at which LHp(t) admits a local maximum and “theoretical pulse amplitude” the value of LHp at this time. In a time series (either obtained from simulation and synthetic sampling protocol or experimental data), we call “pulse occurrence”, the time at which the time series admits a local maximum and “pulse amplitude” the corresponding value. Both the time values and the amplitude values are different in the theoretical and the experimental cases.

https://doi.org/10.1371/journal.pone.0039001.g004

Synthetic LH time series obtained from constant spike amplitude and frequency.

We first examine the effects of parameters r, f and b in case of a constant spike amplitude and constant interspike interval for a same sampling period Ts = 10 min:

  • Case A: r = 1 min, f = 0 min, b = 0;
  • Case B: r = 4 min, f = 0 min, b = 0;
  • Case C: r = 4 min, f = 1.5 min, b = 0;
  • Case D: r = 4 min, f = 1.5 min, b = 10%.

The top panel of Figure 5 displays the solution LHp(t) (green curve) of equation (2), i.e. the theoretical continuously measured LH blood level. Each panel from A to D shows the time series (blue stars) obtained through the sampling protocol for the values of parameters r, f, b specified above.

thumbnail
Figure 5. Effect of the sampling process upon a LH level signal with constant amplitude and pulse frequency.

In all panels, . Top panel: theoretical continuously measured LH blood level (green curve). Panels A, B, C, D: sampling points (blue stars) of the time series obtained from the top panel signal through the sampling protocol. Panel A: first sampling time at r = 1 min, without any variability in the sampling process. Panel B: first sampling time at r = 4 min, without any variability in the sampling process. Panel C: first sampling time at r = 4 min, with variability in the sampling times . Panel D: first sampling time at r = 4 min, with variability both in the sampling times and the assays . The histograms correspond to the distribution of the levels at the basal line and the distribution of the amplitudes of the LH pulses, measured from the four cases A to D. The A and B time series, that only differ in the first sampling time, display constant (yet different) pulse amplitude. Red bars stand for case A (r = 1 min) value of the pulse amplitude (2.425 ng/ml) and level at the basal line (0.107 ng/ml). Green bars stand for case B (r = 4 min) value of the pulse amplitude (2.188 ng/ml) and level at the basal line (0.096 ng/ml). Blue bars stand for distributions of levels at the basal line and pulse amplitudes in case C (r = 4 min; f = 1.5 min) and case D (r = 4 min; f = 1.5 min; b = 10%, i.e. a variability of in the LH assays). In case D, the distributions of basal line levels (between 0.082 and 0.108 ng/ml) and pulse amplitudes (between 1.940 and 2.486 ng/ml) are wider than in case C (levels at the basal line between 0.092 and 0.101 ng/ml; pulse amplitude between 2.092 and 2.266 ng/ml), due to combined variabilities in the sampling times and assays.

https://doi.org/10.1371/journal.pone.0039001.g005

The histograms display, for each time series obtained in case C or D, the distribution of LH pulse amplitude (blue bars in left panels), i.e. local maxima, and LH levels at the basal line (blue bars in right panels), i.e. local minima. For sake of comparison, the constant amplitudes obtained in cases A and B have been marked with red and green bars respectively.

In cases A and B (f =  b = 0), one obtains a strictly periodic pattern of sampled LH levels since is a multiple of Ts. However, depending on the beginning of the sampling process r (1 min in time series A and 4 min in time series B), the maximum sample value varies from 2.425 ng/ml in case A (red bars in the histograms of Figure 5) to 2.188 ng/ml in case B (green bars). This shows the importance of the phase between the pulsatile LH signal and the periodic sampling process in the resulting observed pulse amplitude. It is worth noticing that this phase cannot be controlled at all in experimental conditions since the delay elapsed from the last pulse time is not known at the beginning of the experimental sampling process. The dependence of the basal line on this phase is weaker but still exists: it varies from 0.107 ng/ml in time series A to 0.096 ng/ml in time series B.

By comparing time series B and C of Figure 5, we can observe the impact of the variability in the sampling times (f = 0 min in case B and f = 1.5 min in case C). With variable sampling times, the pulse amplitude along time series is also variable, although the original continuous signal LHp(t) is perfectly periodic. This variability is shown in the histogram corresponding to case C in Figure 5: the various LH pulse (resp. basal line) amplitudes obtained in case C (blue bars) are scattered around the constant case B pulse (resp. basal line) amplitudes (green bar).

The impact of the variability in the assays is illustrated by the enhanced dispersal of the LH pulse and basal line amplitudes obtained in case D (blue bars in the histograms corresponding to case D in Figure 5) for which b = 10% compared to the case C amplitudes (b = 0).

In any case of either shifted (case B) or noised (cases C and D) time series, the pulse amplitudes are undervalued with respect to the genuine amplitude (correctly assessed only in case A). It may nevertheless happen that the effective sampling time coincides with a (genuine) pulse time. In that case, the pulse amplitude can be overvalued if the sign of the assay error is positive (instance of the blue bar on the right of the red bar in the left panel of case D).

Synthetic LH time series obtained from time-varying spike amplitude and frequency.

We now further examine the effects of the sampling process upon theoretical continuously measured LH level with time-varying pulse amplitude and frequency:

  • Case E: constant spike amplitude and decreasing interspike interval, respectively

  • Case F: decreasing spike amplitude and decreasing interspike interval, respectively

The sampling period Ts is the same in both cases: 10 min.

Figure 6 gives some examples of time series obtained in cases E and F. In each case, the green curve (rows 1 and 3) represents the solution LHp(t) (theoretical continuously measured plasma LH level) and the blue stars (rows 3 and 4) represent the time series obtained through the synthetic sampling protocol.

thumbnail
Figure 6. Effect of the sampling process upon a LH level signal with regular increasing sampling frequency.

Case E (left panels): the pulse amplitude remains almost constant and the basal line increases regularly. Case F (right panels): the pulse amplitude decreases regularly and the basal line decreases regularly. Panels on row 1 represent the fine step simulation of LH blood level. Histograms on row 2 display the distribution of the LH pulse amplitudes and the distribution of the levels at the basal line, measured from the two theoretical LH level signals shown in row 1. A zoom on the distribution of the pulse amplitudes is shown as an insert in case E. Panels on row 3 represent the time series (blue stars) along the theoretical continuously measured LH level (green curve). Panels on row 4 represent the resulting LH measured time series (measured LH levels versus sampling times linked with segments). In both cases E and F, the sampling period is Ts = 10 min. In case E, the initial sampling time occurs at the first minute of the simulation (r = 1 min), without any variability in the sampling times (f = 0 min) or the assays (b = 0%). In case F, the initial sampling time occurs at the fourth minute of the simulation (r = 4 min), with variability both in the sampling times (f = 2 min) and the assays (b = 5%). Histograms on row 5 display the distribution of the LH pulse amplitudes and the distribution of the LH levels at the basal line, measured from the time series shown in row 4. While the distributions are regular in the theoretical time series, they become completely irregular in the sampled time series. As a result, the range of amplitudes is shortened. Regarding the distribution of the levels at the basal line, it is worth noticing that the measured values (E: between 0.125 and 0.519 ng/ml; F: between 0.094 and 0.258 ng/ml) are greater than the theoretical values (E: between 0.098 and 0.434 ng/ml; F: between 0.075 and 0.171 ng/ml). On the contrary, in case E, it is worth noticing that the theoretical pulse amplitudes vary from 2.379 to 2.425 ng/ml whereas measured pulse amplitudes vary from 1.447 to 2.395 ng/ml. In case F, all measured pulse amplitudes (between 0.302 and 1.959 ng/ml) are lower than the corresponding theoretical values (between 0.353 and 2.315 ng/ml).

https://doi.org/10.1371/journal.pone.0039001.g006

The histograms of Figure 6 detail, for cases E and F, the distribution of LH pulse amplitudes and successive LH levels at the basal line both for the theoretical continuously measured LH blood level (row 2, green bars) and the time series obtained through the sampling protocol (row 5, blue bars).

In case E, the spike frequency increases from 1 spike per 100 min to 1 spike per 50 min. Hence, the spike release arises more and more often along time, so that the LH blood pulses are successively triggered from a higher and higher basal level. Consequently, the basal line and, in a lesser extent, the pulse amplitude of the theoretical LH level undergo a small and smooth increase. Moreover, the number of samples per pulse decreases drastically as the pulse frequency increases, so that the LH time series looks noisier at the end than at the beginning of the time series (see case E, row 4 of Figure 6). This effect implies that the pulse amplitudes are spread out by the sampling protocol much stronger in case E (from 2.395 to 1.447 ng/ml) than in cases C and D (see case E, row 5 of Figure 6). Hence the time variations in the pulse frequency enhances the variability brought about by the sampling process.

Case F represents a situation that is naturally encountered in the physiological dynamics of LH secretion: the same increase in the spike frequency as in case E, combined with a decrease in the amplitude. As in the preceding cases, each of the LH pulse amplitude in the time series is smaller than the corresponding one in the theoretical LH level. Additionally in case F, the amplitude at the basal line is sensibly raised up by the sampling protocol. Hence, the difference between the pulse amplitude and the basal level is strongly deprecated, which adds to the noisy character of the ending of the time series.

Pulse Detection Algorithm

In the context of automatic pulse detection in a time series, a pulse is a peak (i.e. a local maximum of the time series) fulfilling given criteria. The most challenging issue in the algorithm design consists in formalizing the biologically relevant criteria that discriminate the pulses from other peaks. Among these criteria, the amplitude is the most obvious. However, as illustrated in the preceding section, it cannot be used as an infaillible criterium for automatic pulse detection since the quantitative features of a pulse (absolute amplitude, amplitude from baseline, …) are really altered in an unknown way by the sampling process.

For sake of robustness and acuteness of our pulse detection algorithm, we have introduced a selection process based on multiscale criteria involving different properties of the peaks. Besides the series of pulse occurrences, the main output of our algorithm is the series of the corresponding InterPulse Interval (IPI) together with a tunnel of confidence (IPI tunnel) related to the regularity of the pulse frequency variations.

Notations and definitions.

We consider a time series of N points obtained with a Ts-periodic sampling process and we note the sampling times:Hence, the k-th sampling time is (in particular, the first sampling time is ). We note either or the value corresponding to the k-th sample and call k a “time index”. For sake of simplicity in the following explanations, we will refer either to the time index k or the corresponding time .

The pulse detection algorithm is based on a sequence of different processes. Some of them consist in researching high amplitude peaks that can potentially be classified as pulses, others aim to remove, among the formerly selected peaks, those that do not fit other properties met by genuine pulses. In the following, we note the vector storing dynamically the time indexes at which the algorithm detects the summit of a potential pulse. s(P) is the size (number of elements) of P. The algorithm modifies vector P in such a way that indexes are always sorted in increasing order. Hence, at any time along the algorithmic process, s(P) pulses are detected, is the occurrence of the i-th detected pulse and is the amplitude of the i-th detected pulse.

In order both to moderate the importance of the amplitude and to account for several characteristics of the pulse shape, we need to introduce the notions of “height” and “magnitude” of a peak as well as that of “relative magnitude” between two peaks. Let us suppose that the i-th sample of the time series (occurring at time ) corresponds to a peak (i.e. and ). The height of this peak is defined as the difference between its amplitude and the lowest value of within the sampled time series, denoted by , i.e.:Let us assume additionally that a set of potential pulses P is identified from the time-series and the closest pulses registered in P before and after occur at and respectively (see Figure S1). We define the two minimal values of the time series between and on one hand, and between and on the other hand:

Then, we define the magnitude of the peak corresponding to the i-th sample as the geometric mean of and , i.e. .

Finally, let us consider two peaks occurring at and with . Let us call the minimal value of the time series between and :

We call “relative magnitude” between the two peaks the geometric mean between and .

The notion of peak height does not depend on the vector of potential pulses. It represents a normalization of the amplitude among the time-series with respect to the lowest value of the time serie. The peak magnitude can change as the vector P of identified pulses evolves and, consequently, will be used to compare a peak with its direct neighbors. The interest of the magnitude is to take into account the local baseline, without being sensitive to differences in the baseline from one side or the other of a peak. The relative magnitude between two peaks gives a semi-local reference for the pulse magnitude. In particular, the magnitude of a pulse can be usefully compared to the relative magnitude between its direct neighbors (i.e. the pulses just before and after it).

Pulse selection process.

  • Initialization: We first fill vector P by selecting time indexes corresponding to great height samples. Even if, at this stage, we intend to recover a large enough set of potential pulse indexes, the time intervals between pulses should be consistent with the maximal frequency. Accordingly, we introduce a parameter Tp, called the nominal period, defined as the smallest time duration in which, from one pulse occurrence, one expects the following one.

Starting from the time index of the maximal sample in [0, 2Tp], we locate the time index of the minimal sample in and then we retrieve the time index of the maximal sample in (see Figure S3). We iterate the process along the whole time series to obtain the initial guess of potential pulse indexes .

This method is a trade-off between selecting all the peak indexes in the time-series (with the drawback that there will be too many of them if the time series is noisy) and selecting only local maxima corresponding to great amplitudes (with the drawback that there will be too few of them if the time series is smooth and the pulse frequency is low).

  • Remove too small peaks: Once vector P is initialized, some of the registered indexes may correspond to small sample values. As illustrated in the first section, the pulse amplitude is strongly altered by the sampling process and, consequently, it cannot be used as an infallible criterium to select the pulses. Hence, we use multi-scale criteria based on the notions of height and magnitude to determine which indexes corresponds to too small peaks and to remove them from vector P.
  1. –. Global relative criterium: We aim to remove peaks whose height is small in comparison with the height of all detected pulses. We define the “median height” as the median of the detected pulse heights. For each pulse, if the ratio between its own height and the median height is less than a threshold parameter , it is removed from P.

We have chosen the median instead of the (arithmetic or geometric) mean since it is less sensitive to the presence of great height peaks.

  1. –. Semi-local relative criterium: We compare the magnitude of each peak with the relative magnitude between the immediately preceding and following pulses. If this comparison is not conclusive with respect to the threshold introduced above, we remove the peak index from vector P.
  1. It is worth noticing that the geometric mean used to define the magnitudes provides robustness (compared to arithmetic mean, for instance) to this criterium with respect to possible local variations of the base line.
  1. –. Global absolute criterium: We compare the magnitude of each pulse to an absolute threshold . If the comparison is not conclusive, the corresponding time index is removed from vector P.

This criterium precludes non significant elevations in the baseline to be considered as potential pulses. Hence, parameter corresponds to the assay detection threshold.

At this level of the selection process, vector P only contains the time indexes corresponding to peaks with sufficiently great height and magnitude (with respect to threshold and ) to be classified as pulses. However, as the initial guess for P may have skipped some potential pulses, the next step of the selection process consist in retrieving the missed pulses.

  • Retrieve missed pulses: Between each pair of successive pulses registered in P, we examine each peak. If this peak fulfills the semi-local relative criterium, the corresponding time index is added to vector P.

By construction, such a retrieved peak almost automatically fulfills the other two global criteria.

  • Shape-based criterium: The pulse duration has to be consistent with available knowledge on pulse half-life. For a fixed sampling rate, a detected pulse should extend over a minimum number of consecutive experimental data. Consequently, we intend to remove what we call “3-point peaks” for which the immediately preceding and following samples are local minima of the time series (see top panel of Figure S2). However, due to possible noise in the time-series, a pulse may appear as a 3-point peak. But, in this case, the pulse is expected to be not “too sharp” (see bottom panel of Figure S2). Thus we remove from P the time indexes corresponding to peaks with a “sharpness coefficient” greater than a chosen threshold .

The precise definition of “sharpness coefficient” will be detailed in Step 5 of the algorithm (see next subsection “Algorithmic pulse detection”). We only enlighten here that this criterium is based on the asymmetric shape of a pulse and allows one to get rid of the genuine peaks produced by occasional experimental errors.

Vector of InterPulse Interval (IPI) and IPI tunnel.

At a given step, we define the vector of InterPulse Intervals (IPI) from the current P vector by:As we aim to apply the algorithm mainly to hormonal time series, we designed a process to take into account some degree of regularity in the rate of change in the pulse frequency. Hence, given a vector of identified potential pulses, we introduce a cubic function fitted to in a least squares sense. Function gives an averaged, yet time evolving representation of the IPI built from the global sequence .

Then, we build a tunnel in delimited by the graphs of two piecewise linear functions of time and built from function . Precisely, for each pulse time , we define and to draw the lower and upper boundaries of the tunnel. Thus, parameters and tune the width of the so-called “IPI tunnel”; they represent a quantification of the pulse frequency regularity. The tunnel allows one to assess the regularity of the IPI time variations and can help the user to (i) classify a specific pulse as a potential outlier with respect to the pulse frequency properties of the series, (ii) identify and localize a possible rupture in the secretion rhythm.

To illustrate the use of the IPI tunnel, let us consider a time series for which the sequence of pulse indexes is easy to find by sight. We assume that the time series displays regular pulsatility, i.e. the pulse frequency undergoes smooth variations along time. Under this condition, function is a good approximation of the IPI sequence.

Let us first consider the case of a lack of detection: let P be the vector formed by the pulse indexes in Q except one (the ). In the course of the automatic pulse detection, this case may happen if the maximum amplitude corresponding to this pulse is low. Then, the corresponding IPI sequence is given by:(6)Under the regularity assumption, each IPI should be close to (or even in) the range delimited by the values of its neighbors and . On the contrary, in the case of vector P, the IPI, i.e. , is noticeably greater than the maximum of its neighbors. More precisely, its value is twice the mean of its neighbors and, consequently, is approximately twice the value.

Now, let us consider the case of an over-detection: let P be the vector formed by the pulse indexes in Q plus an extra pulse occurring at time lying between the and the pulse times stored in vector Q. Hence, . Then:(7)and moreover:

Under the regularity assumption, either or is too small compared to the expected IPI range delimited by and . In the less discriminating case, the extra pulse lies close to the middle of its neighbors . Then, and are almost equal to the half of the expected IPI . Hence, even if several combinations of and values exist, depending on the position of time k compared to and , one of the IPIs is always less than the half of the value, which indicates a potential over-detection.

An appropriate choice of the values of and allows one to discriminate the pulses that break the frequency regularity and indicate a rupture in the secretion rhythm.

The different steps of the algorithm are described as pseudo-code in the “Algorithm Description” of the Appendix S1. We have implemented the algorithm in the Scilab environment (http://www.scilab.org/) dedicated to numerical computing.

Experimental Data Provided to the Algorithm

We have run the algorithm on either experimental or synthetic LH time series. Experimental time series included nineteen ewes, distributed over two different protocols. All procedures were approved by the “Direction Départementale des Services Vétérinaires d’Indre-et-Loire” (approval number C37-175-2) for the agricultural and scientific research agencies INRA (French National Institute for Agricultural Research) and CNRS (French National Center for Scientific Research), and conducted in accordance with the Guide for the Care and Use of Agricultural Animals in Research and Teaching. Blood samples from a first group of ten estrus-synchronized ewes (Lacaune breed, [5]) were collected via jugular venous cannula every 10 min for a period of 24 h during the follicular phase. A second group of nine ovariectomized Ile-de-France ewes were collected during anestrus season for blood sampling every 10 min over a period of 15 h. Ewes received an agonist of somatostatin type 2 receptor via intracerebroventricular injection between 5 and 10 h after sampling start (Courtesy of A. Caraty, unpublished data). All blood samples were collected into heparinized tubes and then centrifuged for 20 min at 400 g. Plasma was stored at −20 C until hormone assays [5].

Results

The output of our algorithm consists of the IPI series, providing the number of detected peaks with respect to the time series indexes. Moreover, the IPI tunnel has been used on the LH time series according to the assumption of regularity in their frequency modulation. The sampling period and the absolute magnitude threshold , corresponding to the minimal detectable concentration, are provided by the protocol specifications. The default set, proposed in Table 1, has been used in all the cases: nominal period, Tp, equal to 40 min, relative magnitude threshold, , equal to 0.2, (20% of the geometric mean of the neighbors), both the lower and upper bound of the IPI tunnel equal to 0.6. The choice of the default parameter set is explained.

thumbnail
Table 1. Algorithm parameters and default set adapted to LH time series.

https://doi.org/10.1371/journal.pone.0039001.t001

The InterPulse Interval (IPI) Series

Figures 7 and 8 correspond respectively to LH synthetic and experimental series. The left panels display the LH plasma level time series. Vertical lines correspond to the pulse occurrences. Stars on the time series correspond to the points of sampled measures. The right panels display the resulting IPI series, indexed by the number of the pulse occurrence (each IPI is represented by a black diamond).

thumbnail
Figure 7. IPI series from synthetic LH time series with different sampling frequencies.

Left panels: LH plasma level time series retrieved over 1000 min. Vertical lines correspond to the pulse occurrences. Panel A: theoretical plasma level, corresponding to a continuous monitoring. The pulse frequency increases, whereas the pulse amplitude decreases along time. Panels B, C and D: sampled series, with a respective sampling period of 1, 5 or 10 min. Stars on the time series correspond to sampled points. The first sampling time occurs at the first minute of the simulation (r = 1 min), variability in the sampling times is set to 15% of the sampling period (f = 0.15 min for B, f = 0.75 min for C and f = 1.5 min for D) and the assay variability is set to b = 5%. Right panels: resulting IPI series, indexed by the number of the pulse occurrence (each IPI is represented by a black diamond). The theoretical IPI series is the continuous green curve, superimposed on the IPI series obtained after sampling. In any case, there are 16 detected peaks.

https://doi.org/10.1371/journal.pone.0039001.g007

thumbnail
Figure 8. IPI series from experimental LH series with different pulsatile rhythms.

Left panels: LH plasma level time series with a 10 min. sampling period. Vertical lines correspond to the pulse occurrences. Stars on the time series correspond to sampled points. Right panels: resulting IPI series (black diamond), indexed by the number of the pulse occurrence. Panel A: stable rhythm with final acceleration. Panel B: progressive acceleration. Panel C: fast deceleration in the second half of the series.

https://doi.org/10.1371/journal.pone.0039001.g008

On the synthetic LH series (Figure 7), the left panels display the following cases. Panel A represents a theoretical plasma level series, that would be retrieved in case of continuous monitoring. The pulse frequency increases whereas the pulses amplitude decreases along time. Panels B, C and D are the corresponding sampled series with a respective sampling period of 1, 5 or 10 min. The first sampling time occurs at the first minute of the simulation, i.e. r = 1 min, and there is variability both in the sampling times and the assays: f is equal to 15% of the sampling period, corresponding to 0.15 min for B, 0.75 min for C, 1.5 min for D and b = 5% in the three cases. On the right panels, the theoretical IPI series are represented by a continuous green curve, superimposed on the IPI series of measured LH series (black diamonds). Comparisons between the successive panels allow us to assess the influence of the sampling period on the IPI series. The number of detected peaks (16) is the same, and the patterns of regular acceleration are identical, whatever the sampling period is. Moreover, the discretization of the initially continuous signal induces delays in the time occurrence of pulses; the maximal delay corresponds to the sampling period. The higher the sampling period is, the closer the measured IPI series are to the theoretical ones. On the experimental LH series (Figure 8), three different pulsatile rhythms are displayed, with a 10 min sampling period. Panel A illustrates the case of a stable rhythm with a final acceleration resulting in IPI shortening in the last third of the series. Panel B illustrates the case of a progressive acceleration resulting in a progressive shortening of the IPIs. Panel C illustrates the case of a fast deceleration in the second half of the series resulting in increased IPIs. As the IPI series give information both on the number of detected peaks and the rhythm evolution, they are particularly useful for comparing the rhythmicity of different series of the same duration (for instance, B is almost twice as fast as A).

The IPI Tunnel

Due to the assumption of regularity in the frequency modulation of the LH time series, the IPI tunnel has been used to point out situations where there may be a lack of detection or an over-detection of pulses in the time series. In such situations, we can try to explain the detection error and propose possible corrections. On the opposite, the IPI tunnel can detect genuine long or short IPIs, and be used as a tool for analyzing sudden frequency breaks or accelerations in pulsatile rhythms. In all examples the sampling period was equal to 10 min.

Figures 9 and 10 display respectively apparent lacks of detection or over-detections in experimental time series. The top panels display the LH plasma time series and vertical lines correspond to pulse occurrences. The bottom panels display the resulting IPI series, indexed by the pulse time rather than the pulse number in order to keep the same reference time in both the LH and IPI series. Each IPI value (marked by a blue point) corresponds to the time elapsed between the current detected pulse and the previous one. The IPI series are displayed together with the three curves delimiting the tunnel: the dashed line represents the moving cubic function fitting the values of the IPI series, the solid lines represent the lower and uper bounds of the tunnel, i.e. respectively.

thumbnail
Figure 9. IPI outliers lying above the upper bound of the tunnel. Example of correction by decreasing the value of the relative magnitude threshold, .

Top panels: two experimental LH plasma time series, A (panels A1 and A2) and B (panel B). Vertical lines correspond to pulse occurrences. Bottom panels: resulting IPI series indexed by time. Each IPI value (blue point) corresponds to the time elapsed between the current detected pulse and the previous one. Dashed line: moving cubic function fitting the values of the IPI series. Solid lines: lower (-dependent function ) and upper (-dependent function ) edges of the tunnel (). Black arrows: occurrences of the outliers. Case A: outlier due to a lack of detection (missed pulse designed by a red arrow); panel A1: initial IPI series with (default value); A2: corrected IPI series; with . Case B: genuine long IPI.

https://doi.org/10.1371/journal.pone.0039001.g009

thumbnail
Figure 10. IPI outliers lying below the lower bound of the tunnel. Example of correction by increasing the value of the relative magnitude threshold, .

Top panels: two experimental LH plasma time series, A (panel A) and B (panels B1 and B2). Bottom panels: resulting IPI series indexed by time. The vertical lines, the blue points, the dashed line and the solid lines represent the same objects as in Figure 9. Panels A and B1: initial IPI time series. Solid black arrows: occurrence of clear outliers. Dashed black arrows: occurrence of IPIs that can be associated with over-detected peaks in the LH series although they remain above the lower bound of the tunnel. Red arrows: peaks lying on the middle of the descending phase of the preceding pulse. Green arrow: peak lying on the middle of the ascending phase of the following pulse. Panel B2: B corrected IPI series after increasing the relative magnitude threshold to 0.45. Two of the three false peaks have been discarded.

https://doi.org/10.1371/journal.pone.0039001.g010

Figure 9 displays two cases of apparent lack of detection, where the IPI outliers lie above the upper bound. In case A, the outlier appears at minute 400 (black arrow, panel A1). Going back and forth between the IPI series and the LH time series allows us to favor the hypothesis of a lack of detection. Indeed, if we take into account the small amplitude pulse occurring at minute 340 (red arrow, panel A1), the exceedingly large IPI can be distributed over two consecutive IPIs of 100 and 60 min, whose duration are compatible with the local tunnel size (local upper bound of 123 min, local lower bound of 30.5 min), hence with the regularity assumption. A first correction step consists in decreasing the relative magnitude threshold , set to the default value of 0.2, in such a way that the missed pulse can be recovered without adding false detections. Panel A2 illustrates the result of the correction: the missed pulse occurring at minute 340 was recovered (red arrow) after decreasing the value of parameter to 0.1. In case B, the outlier appears at minute 440 (black arrow, panel B). Going back and forth between the IPI series and the LH time series allows us to favor the hypothesis of a genuine long IPI. There is not only no visible pulse after the preceding detected pulse but the rhythm also remains slow after the long IPI.

Figure 10 displays some cases of over-detection, where the IPI outliers lie below the lower bound, in two LH series A and B. The outlier occurrences are indicated by solid black arrows. It is worth noticing that the two IPIs indicated by dashed arrows in case B (panel B1) are not classified as outliers with the default set of parameter values but they are close enough to the tunnel lower bound to draw the user’s attention. In this example, we considered them as IPI outliers. In both cases, the patterns of some detected peaks suggest false detections possibly imputable to measurement conditions. The IPI outliers in cases A (panel A) and B (panel B1) are either due to additional peaks lying on the middle of the descending phase of the preceding pulse (red arrows) or to a peak lying on the middle of the ascending phase of the following pulse (green arrow). A first correction step consists in increasing the relative magnitude threshold in such a way that the peak can be discarded without eliminating true detections. Increasing to 0.45, allows us to discard two false peaks in case B (panel B2). Nevertheless, no change in can get rid of the IPI outlier in case A nor of the third IPI outlier in case B without discarding genuine pulses at the same time. It appears that the amplitude of the peaks underlying the IPI outlier is too close to that of their neighbors. It will be up to the user to evaluate the influence of such IPIs on the characteristics of the series and to follow the more appropriate strategy.

Figure 11 shows how the IPI tunnel can be used for detecting sudden frequency breaks. It displays four different LH time series (left panels) retrieved from ewes subject to an experimental protocol inducing a steep decrease in the pulse frequency. IPI series (right panels) are indexed by the pulse time in order to keep the same reference time when studying variability between series. The break in the dynamics of the IPI series corresponds to the occurrence of the last IPI preceding the outlier lying above the upper bound. Moreover, identifying its precise location enables us to study the synchronization between the different time series, as pointed out by the vertical dashed line.

thumbnail
Figure 11. IPI-based study of the synchronization between LH series.

The four LH series are retrieved from ewes subject to an experimental protocol inducing a steep decrease in the pulse frequency. Left panels: experimental LH plasma time series; vertical lines correspond to pulse occurrences. Right panels: resulting IPI series indexed by time. The vertical lines, the dashed line and the solid lines represent the same objects as in Figure 9. Vertical dashed line: break in the dynamics of the IPI series corresponding to the last IPI preceding the outlier.

https://doi.org/10.1371/journal.pone.0039001.g011

User Parameters: Choice of a Default Set and Robustness Evaluation

Among the algorithm parameters (Table 1), two are provided directly by the protocol specifications: the sampling period Ts and the absolute magnitude threshold , corresponding to the assay detection threshold (minimal LH level that can be reliably measured). For Ts, an upper bound equal to 10 min is recommended (see the explanation below). Default values have been fixed for the other five parameters of Table 1: nominal period Tp, relative magnitude threshold , 3-point peak threshold and the lower and upper bounds of the IPI tunnel. The choice was based on the observation of LH time series retrieved in 19 ewes distributed over two different protocols.

Nominal period Tp.

The value of the nominal period is chosen so as to favor the maximal number of correct detections, especially at the beginning of time series. According to the existence of high frequency series with short IPIs, there is a risk to detect two consecutive pulses in the same window if it is too large. The value of parameter Tp has been set to 40 min, which is a value close to the minimal observed period. The number of missed pulses is equal to 0 for Tp ranging from 40 to 70 min; it increases up to 2 for Tp = 80 min. Even in that latest case, the number of missed pulses is small enough to guarantee the algorithm robustness with respect to this parameter.

Relative magnitude threshold.

Compared to the performances of the algorithm with the default value (0.2), a decrease down to 0.1 or an increase up to 0.3 increase the total number of outliers (either false detections or missed detections). Moreover, this parameter can be adapted in order to correct outliers lying outside the tunnel, as previously seen.

3-point peak threshold.

The 3-point peak threshold is introduced to prevent non-asymmetric pulses from being detected. Figure S2 illustrates the identification of a 3-point peak pattern. is the ratio between the arithmetic mean of the amplitude of the neighboring points of rank 2 (green lines) and the geometric mean of the amplitude of the immediate neighbors (red lines). For instance, based on that criterion, the peak selected on panel A (dashed vertical line), is identified as a genuine 3-point peak. On the contrary, the peak selected on Figure S2, panel B (solid vertical line) is not identified as a 3-point peak, since it belongs to a genuine, asymmetric LH pulse with an exponential decrease, albeit locally noised (the LH level corresponding to the latter neighbor of rank 2 is a little higher than expected for a smooth exponential decrease). Over 15 analyzed potential 3-point peaks, the minimal value corresponding to a genuine 3-point peak was equal to 0.15, while the maximal value corresponding to a locally noised, genuine asymmetric LH pulse was equal to 0.07, so that there was no overlapping. Consistently, we chose a default set value (0.1) lying within the [0.07,0.15] range. This parameter is embedded within the algorithm, since there is no reason for the user to modify it. Indeed, the 3-point peaks rely on endocrinological considerations and take into account, either directly or indirectly, the typical duration of a LH pulse (around 30 min) and its asymmetric shape, that can be reconstructed from time series sampled at least every 10 min.

Lower and upper bounds of the IPI tunnel.

For the lower bound , the 0.6 default value does not lead to any IPI outlier, whereas a value of 0.5 leads to classify two genuine pulses as over-detected pulses, and a value of 0.4 leads to classify twelve genuine pulses as over-detected pulses (among more than 300 pulses). For the upper bound , the 0.6 default value allows one to identify every genuine outliers, without generating false outliers, but changing the value to 0.5 led to false additional under-detected pulses. However, the use of the tunnel bounds is mainly to draw the user’s attention to possible events of interest, and there is no direct sensitivity of the algorithm to their precise values.

Discussion

For hormonal time series as those studied in this article, an important particularity is the fact that the signals are clearly subsampled due to the invasive nature of the sample collection procedure. In this case the classical filtering methods cannot be successful. Specific methods have to be developed to overcome this difficulty. There are two main approaches to study time series of pulsatile hormones. One consists in trying to detect, as accurately as possible, the pulse peaks, considered as discrete events [11], while the other is based on deconvolution principles and intend to reconstruct the underlying secretion process [12]. The deconvolution approach might seem more attractive, since it is susceptible to provide rich information on the hormonal signal, but it is hampered by the lack of validation, since information on the “true” signal is almost never available and cannot be directly compared to the reconstructed signal. Our own algorithm clearly belongs to the category of discrete peak detectors. Whereas, to our knowledge, the other available algorithms rely only on local and semi-local amplitude criteria, our algorithm combines local (on the data point level), semi-local (on the level of -possibly moving- windows of consecutive data points), and global (on the whole series level) amplitude criteria, with other criteria accounting for the pulse duration and the relative regularity in the pulse frequency modulation. Hence, this is a multi-scale and multi-criteria algorithm based on a dynamical selection process of the peaks.

To design our algorithm, the first idea was to locate significant local maxima in the processed time series, guided by the nominal IPI value so that the detected pulses have a reasonable rhythm. As the nominal IPI value may be too large or too small for each actual IPI in the processed signal, more steps were added to retrieve missed pulses and to remove false pulses. These extra steps are mainly based on the height of each pulse candidate and its magnitude with respect to the relative magnitude between neighbor pulses. The main advantage of these criteria is their weak dependence on the baseline that is most of the time non-stationary in typical hormonal time series.

An original feature of our work is to combine mathematical modeling with signal processing. We have used synthetic time series, generated by a simple dynamical model, to illustrate the fundamental concepts underlying our algorithmic choices, as well as to assess the robustness of the model outputs with respect to the sampling rate and different sources of variability. Introducing uncertainty on both the measured LH level (to mimic assay variability) and time of measurement (to mimic possible hidden variability in the sampling chronology) allowed us to check that the ability to detect the right number of events was not affected by the noise.

For a given time series, the outputs of the algorithm consists of a corresponding series of detected IPI, structurally expressed as multiples of the sampling period. Hence, the algorithm provides information on the evolution of the frequency regime along the series, which is essential for studying the control of frequency encoding in endocrine systems.

We have run the algorithm on two different sets of experimental time series collected in sheep. Since they have a large body size and a much longer ovarian cycle compared to rodents, domestic species such as the ovine species are more suited to longitudinal endocrine studies and their reproductive physiology is much closer to human reproductive physiology. We gave several instances showing that the algorithm is able to adapt to different patterns of frequency modulation (more or less rapid acceleration or deceleration) and also to detect breaks in the IPI rhythm. We then explained how one can make use of the IPI tunnel to discriminate outlier pulses from genuine pulses corresponding to a locally marked change in the frequency regime. On the whole, these results have shown that the algorithm can be employed to study and understand the frequency encoding of hormonal signals. To put the algorithm at the disposal of the user not familiar with computer programs, we are developing a user-friendly interface to make our software easily available and ready for use, cf https://www.rocq.inria.fr/sisyphe/paloma/dynpeak.html. The aim of this tool is to provide as much aid to decision as possible to the users together with guaranteeing full understanding on the detection process and the effect of the parameter values on the output.

In addition to the time series, the sampling period Ts and the assay detection threshold provided directly by the protocol specifications, there are only 5 parameters to be set by the user: for the pulse detection itself, and for the definition of the IPI tunnel edges. A default set of parameter values is proposed in the case of LH time series (Table 1). It was refined by performing the algorithm on LH time series characterized by a pulsatile pattern with an asymmetric shape of pulses and some regularity in the time evolution of the pulse frequency. LH can be considered as the paragon of any hormone whose secretion pattern is pulsatile, so that the algorithm would also be suited for other hormones (e.g. insulin or growth hormone). As for LH, one has to go through the whole steps of the algorithm, including the removing of 3-point peak (needed parameter: ) and tunnel-based identification of outliers (needed parameters: and ) for GH and insulin series analysis. In the case of time series for which there is no underlying assumption of asymmetric shape of the peaks or frequency regularity, such as intracellular calcium series, one only needs to go through the first to the fourth steps of the algorithm.

On a more theoretical ground, an interesting question may be addressed in relation to the discretization of a continuous signal. A time series results from a sampling process applied to a continuous signal, which implies that we have chosen (by default) to retrieve the sampling time corresponding to a local maximum to define the time of pulse occurrence. Thus, each IPI is a multiple of the sampling period. As illustrated in the first section, the corresponding theoretical pulse time differs from the pulse occurrence. A deeper analysis of the effect of the sampling on the pulse shape could be undertaken. This problem is hard to tackle since it mixes non linear dynamics, stochastic process and statistical inference. However, results on this subject would give precious additional knowledge on the location of the theoretical pulse time and could provide more accurate information on the IPI sequence and the frequency encoding.

Supporting Information

Figure S1.

Definition of the magnitude of a peak. Let P be a given vector of potential pulses. Considering the peak occurring at , we assume that P contains the pulses just before and after occurring at and . We define U (resp. V) as the difference between the peak amplitude and the minimum value (resp. ) of the time series between and (resp. and ) : (resp. ). The magnitude of the peak occurring at is the geometric mean between U (1.4 ng/ml) and V (1.8 ng/ml). Here, the peak magnitude is equal to 1.587 ng/ml.

https://doi.org/10.1371/journal.pone.0039001.s001

(TIF)

Figure S2.

Identification of a 3-point peak pattern. Parameter corresponds to the ratio between the arithmetic mean of the amplitude of the neighboring points of rank 2 (green lines) and the geometric mean of the amplitude of the immediate neighbors (red lines). Panel A: the selected pulse (dashed vertical line) is identified as a genuine 3-point peak. Panel B: the selected pulse (solid vertical line) is not identified as a 3-point peak, since it belongs to a genuine, asymmetric LH pulse with an exponential decrease, albeit locally noised.

https://doi.org/10.1371/journal.pone.0039001.s002

(TIF)

Figure S3.

One iteration of the forward research of pulses. For a given value of i in the iterative process (initialized with i = 1), the algorithm searches for the index of the minimal sample from the sample in the window defined by the nominal period Tp, i.e. among the kp samples (under the green segment) directly following the sample. Then, the algorithm searches for the index of the maximal sample among the kp samples (under the blue segment) directly following the sample. Index is stored in vector P and the process is iterated with i incremented by 1 until the end of the time series.

https://doi.org/10.1371/journal.pone.0039001.s003

(TIF)

Figure S4.

Outputs of the algorithm applied to synthetic LH time series obtained with uniformly distributed or normally distributed assay errors. Top panel: theoretical continuously measured LH blood level (green curve) obtained with a spike amplitude function decreasing linearly from 15 to 6.5 ng/ml and an interspike interval function decreasing linearly from 80 to 50 min. The pulse times are highlighted by vertical green bars. The other panels represent the outputs of the algorithm for 4 time series obtained with either uniformly (b = 32 or 36%) or normally (SD = 32 or 36%) distributed assay errors and a sampling period Ts = 10 min. In each case, the upper panel represents the time series (blue circles) with the detected pulse occurrences (vertical blue bars) and the lower panel displays the detected IPI series (blue diamonds) together with the IPI tunnel (delimited by the cyan and red lines). In both cases where b and SD equal to 32%, the algorithm has detected the pulses of the time series accurately. In the case of a uniform distribution with amplitude b = 38%, two over-detections occurred around 180 min and 770 min. Both led to IPI outliers. In the case a normal distribution with SD = 38%, two over-detections occurred around 190 min and 270 min and a lack of detection occurred around 500 min. Both the first over-detection and the lack of detection implied IPI outliers, yet the IPI sequence remains in the tunnel, even if it is close to the lower edge, near 270 min.

https://doi.org/10.1371/journal.pone.0039001.s004

(TIF)

Table S1.

Accuracy of the algorithm outputs for synthetic LH time series according to the features of the assay error distribution. The same set of 6 different values for the amplitude b of the uniform distribution (UD) of the assay error and for the standard deviation SD of the normal distribution (ND) of the assay error has been used to extract time series with a 10 minutes sampling period. In each case (column), 10 time series have been generated from the same theoretical LH signal with decreasing pulse amplitude and interpulse interval (see top panel of Figure S4). The table displays the corresponding numbers of time series for which the detection algorithm (i) detected all the pulses accurately, (ii) missed a pulse or produced an over-detection leading to IPI outlier or (iii) missed a pulse or produced an over-detection without an associated IPI outlier.

https://doi.org/10.1371/journal.pone.0039001.s005

(PDF)

Acknowledgments

The authors wish to thank Dr. Alain Caraty for providing them with additional LH time series and Serge Steer for useful discussions.

This work is part of the large-scale initiative REGATE (REgulation of the GonAdoTropE axis):

http://www.rocq.inria.fr/sisyphe/reglo/regate.html.

Author Contributions

Conceived and designed the experiments: AV FC. Performed the experiments: AV CM QZ. Analyzed the data: AV CM FC. Wrote the paper: AV CM FC QZ SF. Designed the model: AV. Designed and implemented the algorithm: AV QZ. Provided the experimental data: SF.

References

  1. 1. Sollenberger M, Carlsen E, Johnson M, Veldhuis J, Evans W (1990) Specific Physiological Regulation of Luteinizing Hormone Secretory Events Throughout the Human Menstrual Cycle: New Insights into the Pulsatile Mode of Gonadotropin Release. J Neuroendocrinol 6 (2): 845–852.
  2. 2. Moenter S, Caraty A, Karsch F (1990) The estradiol-induced surge of gonadotropin-releasing hormone in the ewe. Endocrinology 127: 1375–1384.
  3. 3. Moenter S, Brand R, Karsch F (1992) Dynamics of gonadotropin-releasing hormone (GnRH) secre- tion during the GnRH surge: insights into the mechanism of GnRH surge induction. Endocrinology 130: 2978–2984.
  4. 4. Clarke I, Moore L, Veldhuis J (2002) Intensive direct cavernous sinus sampling identifies high-frequency, nearly random patterns of FSH secretion in ovariectomized ewes: combined appraisal by RIA and bioassay. Endocrinology 143 (1): 117–129.
  5. 5. Drouilhet L, Taragnat C, Fontaine J, Duittoz A, Mulsant P, et al. (2010) Endocrine Characterization of the Reproductive Axis in Highly Prolific Lacaune Sheep Homozygous for the FecLL Mutation. Biol Reprod 82 (5): 815–824.
  6. 6. Baird D, Swanston I, McNeilly A (1981) Relationship between LH, FSH, and prolactin concentration and the secretion of androgens and estrogens by the preovulatory follicle in the ewe. Biol Reprod 24: 1013–1025.
  7. 7. Baird D, McNeilly A (1981) Gonadotrophic control of follicular development and function during the oestrous cycle of the ewe. J Reprod Fertil. pp. 119–133.
  8. 8. Vidal A, Médigue C, Malpaux B, Clément F (2009) Endogenous circannual rhythm in LH secretion: insight from signal analysis coupled to mathematical modelling. Phil Trans Roy Soc A 367: 4759–4777.
  9. 9. Matsumoto M, Nishimura T (1998) Mersenne twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Trans Model Comp Simul 8 (1): 3–30.
  10. 10. Keenan D, Sun W, Veldhuis J (2000) A stochastic biomathematical model of the male reproductive hormone system. SIAM J Appl Math 61 (3): 934–965.
  11. 11. Urban R, Evans W, Rogol A, Kaiser D, Johnson M, et al. (1988) Contemporary aspects of discrete peak-detection algorithms I. The paradigm of the luteinizing hormone pulse signal in men. Endocr Rev 9: 3–37.
  12. 12. Veldhuis J, Johnson M (1990) A review and appraisal of deconvolution methods to evaluate in vivo neuroendocrine secretory events. J Neuroendocrinol 2: 755–771.