Abstract
Time-frequency analysis is ubiquitous in many fields of science. Due to the Heisenberg-Gabor uncertainty principle, a single measurement cannot estimate precisely the localization of a finite signal in both time and frequency. Classical spectral estimators, like the short-time Fourier transform (STFT) or the continuous-wavelet transform (CWT) optimize either temporal or frequency resolution, or find a tradeoff that is suboptimal in both dimensions. Following the concept of optical super-resolution, we introduce a new spectral estimation method that enables time-frequency super-resolution. Sets of wavelets with increasing bandwidth are combined geometrically in a superlet to maintain the good temporal resolution of wavelets and gain frequency resolution in the high frequency range. We show that superlets outperform the STFT and CWT on synthetic data and brain signals recorded in humans and rodents. Superlets are able to resolve temporal and frequency details with unprecedented precision, revealing transient oscillation events otherwise hidden in averaged time-frequency analyses.
Introduction
Time-series describing natural phenomena, such as sounds, earth movement, or brain activity, often express oscillation “packets” at various frequencies and with finite duration. In brain signals, these packets span a wide range of frequencies (e.g., 0.1-600Hz) and temporal extents (10−2 - 102 s)1. Identifying the frequency, temporal location, duration, and magnitude of finite oscillation packets with high precision is a significant challenge.
Time-frequency analysis of digitized signals is traditionally performed using the short-time Fourier transform (STFT)2, which computes Fourier spectra on successive sliding windows. Long windows provide good frequency resolution but poor temporal resolution, while short windows increase temporal resolution at the expense of frequency resolution. This is known as the Heisenberg-Gabor uncertainty principle3 or the Gabor limit4, i.e. one cannot simultaneously localize precisely a signal in both time and frequency. Importantly, this limit applies to a single measurement. Frequency resolution is proportional to window size, as defined by the Rayleigh frequency5,6. Therefore, shortening the window to gain temporal resolution leads to a degradation of frequency resolution (Fig. 1a, left).
For a given window size, the STFT has fixed frequency resolution but its temporal precision relative to period decreases with increasing frequency (Fig. 1a, right). To overcome this limitation multi-resolution techniques have been introduced, based on the continuous-wavelet transform (CWT). The CWT provides good temporal localization by compression/dilation of a mother wavelet as a function of frequency7. The most popular wavelet in time-frequency analysis is the Morlet wavelet8,9, defined as a plane wave multiplied by a Gaussian envelope (see Fig. E1): where, f is the central frequency, c is the number of cycles of the wavelet, Bc is the bandwidth (in Hz−1 = s) or variance of the wavelet10. A Morlet with higher bandwidth contains more cycles, is wider in time but has a narrower frequency response. Here we will use Morlet wavelets for time-frequency analysis, but other choices are possible.
The CWT localizes well the oscillation packets in time, but trades in frequency resolution as frequency increases11,12 (Fig. 1b). Neighboring high frequencies cannot be distinguished, i.e. the representation is redundant across wavelets with close central frequencies in the high-range. For this reason, analyses are often performed using a diadic representation, like in the discrete wavelet transform (DWT), where frequencies are represented as powers of 212,13. This representation however resolves very poorly the high-frequencies.
Both the STFT and CWT (or DWT) have significant limitations. The STFT provides good frequency resolution but poor temporal resolution at high frequencies, while the CWT maintains a good temporal resolution throughout the spectrum but degrades in frequency resolution, becomes redundant with increasing frequency. This time-frequency uncertainty plagues analysis of neuronal signals, which have a rich time-frequency content14,15. Inspired by super-resolution methods used in imaging16,17, we introduce a novel approach that reveals a much sharper localization of oscillation packets in both time and frequency.
Methods
Superlets
We introduce a technique similar to structured illumination microscopy (SIM). SIM uses a set of known illumination patterns17 to obtain multiple measurements that are combined to achieve super-resolution. The super-resolution technique proposed here employs multiple wavelets for each time-frequency bin to detect localized time-frequency packets.
The method can be formalized as follows. A base wavelet, e.g. Morlet with a fixed number of cycles, provides multi-resolution in the standard sense, with constant relative temporal resolution but degrading frequency resolution (increased redundancy) as the central frequency of the wavelet increases. By increasing the bandwidth of the wavelet (more cycles) one increases frequency resolution (Fig. 1b) but loses temporal resolution. To achieve super-resolution we propose to combine wavelets with high temporal resolution (small number of cycles, low bandwidth) with wavelets having high frequency resolution (larger number of cycles, lower temporal resolution) (Fig. 1c).
We define a “superlet” (SL) as a set of Morlet wavelets with a fixed central frequency, f, and spanning a range of different cycles (bandwidths): where, o is the “order” of the superlet, and c1, c2, …, co are the number of cycles for each wavelet in the set. A superlet of order 1 is a single (base) wavelet with c1 cycles. In other words, a superlet is a finite set of o wavelets spanning multiple bandwidths at the same central frequency, f. The order of the superlet represents the number of wavelets in the set. The number of cycles defining the wavelets in the superlet can be chosen multiplicatively or additively. In a multiplicative superlet, ci = i · c1, whereas in an additive superlet ci = c1 + i − 1, for i = 1, 2, …, o.
We define the response of a superlet to a signal, x, as the geometric mean of the responses of individual wavelets in the set: where, R[ψf,ci] is the response of wavelet i to the signal, i.e., the magnitude of the complex convolution (for complex wavelets, such as Morlet): where, * is the convolution operator and x the signal. The superlet is an estimator of the magnitude of oscillation packets present in the signal at the central frequency, f, of the superlet. We will show that, while increasing frequency resolution locally, the superlet does not significantly lose time resolution.
The superlet transform (SLT) of a signal is computed analogously to the CWT, except that one uses superlets instead of wavelets. A SLT with superlets of order 1 is the CWT. As will be shown next, the SLT with orders > 1 is a less redundant representation of the signal that the corresponding CWT.
Adaptive superlets
At low central frequencies, single wavelets (i.e., superlets of order 1) may provide sufficient time-frequency resolution. Indeed, the CWT is less redundant at low than at high frequencies12. Adaptive superlets (ASL) adjust their order to the central frequency to compensate decreasing bandwidth with increasing frequency. In an adaptive superlet transform (ASLT) one starts with a low order for estimating low frequencies and increases the order as a function of frequency to achieve an enhanced representation in both time and frequency across the entire frequency domain, as follows: where, a(f) is a monotonically increasing function of the central frequency, having integer values. A simple choice is to vary the order linearly: where, omin is the order corresponding to the smallest central frequency, fmin, and omax is the order corresponding to the largest central frequency, fmax, in the time-frequency representation, and [] is the nearest integer (round) operator. We recommend using the ASLT when a high frequency range needs to be resolved, and the SLT for narrower bands.
Experimental data and ethics
High-density electroencephalography (EEG – Biosemi ActiveTwo 128 electrodes) data was recorded @1024 samples/s from healthy human volunteers freely exploring visual stimuli consisting of deformed lattices of dots that represented objects and were presented on a 22” monitor (1680×1050@120fps; distance 1.12m). Subjects had to signal a perceptual decision by pressing one of three buttons congruent with perception (“nothing”, “uncertain”, “seen”). A similar protocol was described elsewhere18. Here, we used data from a single subject, including trials with correct, “seen” responses (63 trials). The protocol was approved by the Local Ethics Committee (approval 1/CE/08.01.2018). Data was collected in accordance with relevant legislation: Directive (EU) 2016/680 and Romanian Law 190/2018.
In vivo electrophysiology data was recorded with A32-tet probes (NeuroNexus Technologies Inc) at 32 kSamples/s (Multi Channel Systems MCS GmbH) from primary visual cortex of anesthetized C57/Bl6 mice receiving monocular visual stimulation (1440×900@60fps; distance 10cm) with full-field drifting gratings (0.11 cycles/deg; 1.75 cycles/s; contrast 25-100%; 8 directions in steps of 45°, each shown 10 times). Anesthesia was induced and maintained with a mixture of O2 and isoflurane (1.2%) and was constantly monitored based on heart and respiration rates and testing the pedal reflex. Within a stereotaxic device (Stoelting) a craniotomy (1×1mm) was performed over visual cortex. To minimize animal use, multiple datasets were recorded over 6-8 hours from each animal. Experiments were approved by the Local Ethics Committee (3/CE/02.11.2018) and the National Veterinary Authority (ANSVSA; 147/04.12.2018). Local field potentials were obtained by low-pass filtering the signals @300Hz and downsampling to 4kHz.
Results
We will first illustrate the basic principle behind superlets by considering a known set of packets composed of 7 sinusoidal cycles. A target oscillation packet, T, is composed of a finite number cycles at a target central frequency. We define two additional oscillation packets: a temporal neighbor NT having the same frequency but shifted in time with a temporal offset Δt, and a frequency neighbor NF, at the same location in time but shifted with a frequency offset Δf (Fig. 2a, top). For convenience, all three packets have a magnitude of 1.
An example instantiation of this scenario is shown in Fig. 2a, bottom, for a target frequency of 50 Hz in a signal sampled at 1 kHz. We next evaluated how the presence of NF or that of NT influences the estimation at the location of T. In other words, without T being present, we systematically moved NF in frequency or NT in time and computed their contribution (leakage) to the estimate at the time-frequency location of T (Fig. 2b). As estimators, we initially considered a wavelet with c=3 cycles and a multiplicative superlet with c1=3 and o=5. The bandwidth of the wavelet was poor, with a broad frequency response around the target frequency of T, indicating that NF was hard to distinguish from T over a large frequency domain (Fig. 2b, top). By contrast, the superlet significantly sharpened the frequency response, reducing frequency cross-talk between NF and T. When NT was shifted in time away from the target’s location (time offset 0), the response of both the wavelet and the superlet dropped sharply after half the size of the target packet (3.5 cycles) (Fig. 2b, bottom). This indicates that, while significantly increasing frequency resolution, the superlet did not induce a significant reduction in temporal resolution.
To evaluate how the superlet achieves high frequency resolution without losing temporal resolution we quantified these two properties by evaluating the full width at half maximum (FWHM) of the frequency and temporal responses measured at T and induced by NF and NT, respectively (Fig. 2c). We varied the order of the superlet and compared its response to the response of the largest wavelet in its corresponding wavelet set (co, see eq. 3). As the order was increased, both the largest wavelet (with highest bandwidth) and the superlet approached the frequency resolution limit (Rayleigh frequency corresponding to the Gaussian-windowed oscillation packet) (Fig. 2c, top). By contrast, while the single wavelet’s temporal resolution decreased rapidly by increasing its number of cycles, the temporal resolution of the superlet degraded considerably slower (Fig. 2c, bottom). These results indicate that, as its order is increased, a superlet nears the theoretical frequency resolution possible for a limited duration oscillation packet (Rayleigh frequency) while maintaining a significantly better time resolution than a single wavelet.
In a second test, we generated a signal as a sum of multiple time-frequency packets (Fig. 3a), as follows. Three target packets of 11 cycles were generated at target frequencies of 20, 40, and 60 Hz. For each target, a neighbor in frequency (+10 Hz) and a neighbor in time (+12 cycles) were added to the signal. Due to constructive-destructive summation a clear modulation of magnitude is visible where the target was summed with its frequency neighbor. The correct time-frequency representation of this phenomenon should reveal corresponding bursts of magnitude (or power) at the two summed frequencies. We computed the time-frequency power representation of the signal using Blackman-windowed Fourier (STFT), wavelets (CWT), and adaptive additive superlets (o = 1:30; order varied linearly from 1@10 Hz to 30@75 Hz) (see Fig. 3).
The STFT with varying window sizes revealed either the temporal modulation (Fig. 2b, left) or the two frequencies (Fig 2b, right), but it was unable to fully segregate time and frequency, in spite of an “optimized” intermediate window size (Fig. 2b, center). A similar conclusion was reached with a CWT using increasing number of wavelet cycles (increasing bandwidth; Fig. 2c), with the difference that the CWT provided better frequency resolution in the low frequency range. By contrast, adaptive superlets (ASLT) provided a faithful representation with high resolution in both time and frequency across the entire spectrum (Fig. 3d). Increasing the number of base cycles (c1) had the effect of further increasing frequency precision, albeit at the cost of losing some temporal resolution at the low frequencies.
The CWT provides a representation of the signal that is increasingly redundant for higher frequencies11,12 because the frequency response of wavelets becomes wider as the number of samples per cycle decreases. The SLT (and ASLT) decreases the redundancy of the representation with increasing order of the superlets. Figure 4 depicts the average power measured over a long signal composed of three frequency components (20, 50, 100 Hz) with unitary amplitude. The average power in a perfect energy-conserving transform should be 1.5 (Fig. 4, green).
We used two types of superlets with base cycles c1 = 3 and 5, and progressively increased their order while computing the SLT and collapsing it in time (Welch-like). As the order was increased, the redundancy in the representation of high frequencies was reduced and the average power approached that of an energy-conserving transform (e.g., Fourier; Fig. 4). Importantly, superlets with larger base cycles provide a less redundant representation than those with smaller number of base cycles (compare Fig. 4 red with Fig. 4 blue), albeit at the expense of decreased temporal resolution (see Fig. 3d).
We next used superlets to analyze brain signals (EEG) recorded from humans in response to visual stimuli representing objects (deformable dot lattices)18 (see Methods). Because EEG signals are strongly affected by the filtering properties of the skull and scalp, having a pronounced 1/f characteristic19,20 that masks power in the high frequency range, we have baselined spectra to the pre-stimulus period21. The time-frequency power spectrum of the occipital signal over the Oz electrode was estimated using STFT, CWT, and ASLT (Fig. 5a). The STFT window was chosen to optimize the representation in the gamma range (> 30 Hz), while the number of cycles for the CWT was chosen to maximize temporal resolution. The STFT provided a poor resolution in the low frequency range (Fig. 5a, top), while the CWT showed good temporal resolution but poor frequency resolution for higher frequencies (Fig. 5a, middle). By contrast, the ASLT provided sharp time-frequency resolution across the whole frequency range and revealed fine details that could not be resolved by the other methods (Fig. 5a, bottom).
We next zoomed in the gamma frequency range, which poses particular challenges for time-frequency analysis22–25. The Fourier window (Fig. 5b, top row) and the number of wavelet cycles (Fig. 5b, middle row) were varied to optimize the temporal (left) or frequency (right) estimation, or a trade-off between the two (middle). Superlets (Fig. 5b, bottom) shared the major features with the other representations but provided time-frequency details that could not be simultaneously resolved by any of the latter.
In vivo electrophysiology signals are recorded at much higher sampling rates than EEG (32 kHz compared to 1 kHz), offering the opportunity to observe time-frequency components with higher resolution in local-field potentials (LFP) than in EEG. We next focused on LFPs recorded from mouse visual cortex during presentation of drifting sinusoidal gratings (see Methods). LFPs suffer from the 1/f issue significantly less than EEG and therefore baselining was not necessary. We computed the time-frequency representation of an LFP signal using STFT (Fig. 6a, top), CWT (Fig. 6a, bottom), and ASLT (Fig. 6a, middle) around the presentation of the visual stimulus and averaged across 10 presentations (trials). As was the case for EEG data, adaptive superlets provided the best time-frequency representation across the entire analyzed spectrum. They revealed 45 Hz gamma bursts induced by the passage of the grating through the receptive fields of cortical neurons26 and resolved many details in both the low and high frequency range.
The true power of superlets was, however, revealed when we zoomed in on a single gamma burst induced by the passage of the grating (see Fig. 6b). Superlets were computed with a base cycle c1 = 2, to maximize temporal resolution, and we used a fixed multiplicative order of 7 (SLT). The SLT provided very fine temporal and frequency details, whose presence in the signal was validated by computing the local CWT optimized for time (c = 2), frequency (c = 11), or a tradeoff between time and frequency (c = 6). The components seen in the superlet representation could be inferred from these multiple wavelet representations but none of the latter was able to simultaneously reveal all the time-frequency details (Fig. 6b).
To push the envelope, we further explored a time-frequency detail (Fig. 6b, left-bottom and Fig. 6c) revealed by superlets, composed of a lower ongoing rhythm at ∼17.5 Hz (LOR), two time neighboring packets at 24.5 Hz (NP1 and NP2), and one higher frequency packet at ∼31 Hz (HP) (Fig. 6c, top-left). To determine if these features were actually present in the signal over the 10 trials, we narrow-band filtered the signal (IIR, order 3, band-pass 10-40 Hz) such as to remove frequency contamination plaguing the wavelet estimates in the gamma range. This enabled us to largely validate the presence of the time-frequency packets using narrow (c = 3) and wider (c = 11) wavelets (Fig. 6c, bottom-left and top-right). However, while the frequency of HP could be identified, its clear temporal location could not be established, irrespective of the parameters of the wavelet (Fig. 6c, top-right). We suspected that this may originate from averaging over 10 trials such that time/frequency smearing of the long/short CWT could hide this detail. Indeed, we found that HP was expressed clearly in at least one of the trials in the set (Fig. 6c, bottom-right). Thus, all the packets in the time-frequency detail revealed by superlets were actual features in the signal and, in addition, the optimal time-frequency concentration provided by superlets was able to reveal bursts expressed at single-trial level. The latter, could be not identified by other methods because they were averaged out due to the time/frequency smearing in the wavelet or Fourier representations (see also Fig. E2).
Discussion
Superlets provide remarkable time-frequency resolution by taking advantage of multiple measurements at a range of temporal resolutions and frequency bandwidths. These measurements are combined geometrically to evaluate the temporal and frequency location of finite oscillation packets. To the best of our knowledge, this is the first super-resolution method for both time and frequency. Techniques exist for frequency super-resolution based on model fitting27, polyphase analysis filter banks28, Pisarenko harmonic decomposition29, or multiple signal classification (MUSIC)30. However, these other techniques ignore the temporal component and focus on the frequency dimension only.
Superlets do not violate the laws of physics. For a finite oscillation packet, their frequency resolution approaches the theoretical Rayleigh frequency as the order of the superlet is increased (see Fig. 2c). In addition, each single wavelet estimate obeys the Heisenberg-Gabor uncertainty principle but, since the signal is stored, multiple evaluations can be performed and combined to transcend the joint time-frequency resolution of each individual estimate. The same procedure is applied in SIM optical super-resolution17.
Superlets use the geometric mean (GM) across a set of wavelet responses to determine the best time-frequency localization. Intuitively, GM “correlates” responses with high temporal precision with those with high frequency precision. For example, if a large bandwidth wavelet (many cycles) detects a narrow frequency component this will be vetoed out in time if the narrow wavelet at a certain location has a low response, and vice versa. This property is not shared by the arithmetic mean (see Fig. E3).
The frequency resolution limit for a finite oscillation packet depends on the packet’s duration but temporal resolution can be increased by increasing sampling rate. Typically, LFPs are obtained by low-pass filtering (@300Hz) the electrophysiology signal sampled at much higher rate (32-50kHz), and then downsampling the signal. When using superlets, one should keep a high sampling rate after downsampling (e.g., 2-4 kHz) to enable the method to resolve very fine time-frequency details (see Fig. 6b).
Cortical responses exhibit a significant trial-to-trial variability31. Therefore, results are typically averaged across multiple trials. In time-frequency analysis this can pose significant problems22,32,33. Due to the time-frequency uncertainty, isolated packets, even when expressed in a significant number of trials, can be masked out by strong neighboring packets whose estimate leaks over the target’s representation. Because they concentrate the time-frequency estimate in each individual trial, superlets prevent this effect and provide a much sharper image of the time-frequency landscape, revealing oscillation packets that remain hidden from other estimation methods (STFT, CWT).
Superlets provide super-resolution in the time-frequency space and may become instrumental in discovering new phenomena in biological signals. They may find multiple applications in the analysis of brain signals, whose time-frequency landscape is complex.
Author contributions
R.C.M. developed the method. V.V.M. generated the toy data, A.N-D and R.C.M. recorded the test data, H.B. tested the method. V.V.M., A.N-D., and R.C.M. wrote the manuscript.
Acknowledgements
This work was supported by: two grants from the Romanian National Authority for Scientific Research and Innovation, CNCS-UEFISCDI (project numbers PN-III-P4-ID-PCE-2016-0010 and COFUND-NEURON-NMDAR-PSY), a grant by the European Union’s Horizon 2020 research and innovation programme – grant agreement no. 668863-SyBil-AA, and a National Science Foundation grant NSF-IOS-1656830 funded by the US Government.