Software application profile: GLU: A tool for analysing continuously measured glucose in epidemiology

Motivation Continuous glucose monitors (CGM) record interstitial glucose ‘continuously’, producing a sequence of measurements for each participant (e.g. the average glucose every 5 minutes over several days, both day and night). To analyze these data, researchers tend to derive summary variables such as the Area Under the Curve (AUC), to then use in subsequent analyses. To date, a lack of consistency and transparency of precise definitions used for these summary variables has hindered interpretation, replication and comparison of results across studies. We present GLU, an open-source software package for deriving a consistent set of summary variables from CGM data. General features GLU performs quality control of each CGM sample (e.g. addressing missing data), derives a diverse set of summary variables (e.g. AUC, and proportion of time spent in hypo-, normo- and hyper-glycaemic levels) covering six broad domains, and outputs these (with quality control information) to the user. Implementation GLU is implemented in R. AVAILABILITY GLU is available on GitHub at [https://github.com/MRCIEU/GLU]. Git tag v0.1 corresponds to the version presented here.


INTRODUCTION
Epidemiological and clinical studies interested in circulating glucose as a risk factor or outcome typically measure levels in the blood (fasting, non-fasting and/or post-oral glucose) at a single or widely spaced time points (e.g. every few years) (1)(2)(3)(4). While these are important health indicators, there has been an increasing appreciation that glucose levels and variability in freeliving conditions during both the day and night, may also provide important health measures in clinical (e.g. diabetic or obese) and 'healthy' populations (5)(6)(7)(8)(9)(10)(11). Continuous glucose monitoring (CGM) systems measure interstitial glucose levels by implanting a sensor subcutaneously (12). Typically, finger prick blood glucose measurements are needed to calibrate the interstitial glucose levels to capillary blood glucose levels, although devices that do not need this calibration step are now becoming increasingly available (12,13). Throughout this paper we refer to the sensor predicted capillary glucose levels as 'sensor glucose'.
CGM systems were initially used in research evaluating their potential value in patients with diabetes, and are now increasingly used in the management of type I and type II diabetes (8,9,(14)(15)(16). More recently, CGM has been used in a wider-range of epidemiological studies.
For instance, CGM has been used to measure glucose levels 'continuously' over a number of days to identify hypo-glycaemia in those receiving intensive care, and in 'healthy' populations to explore whether it can be used to identify groups at increased risk of diabetes, including gestational diabetes (17)(18)(19)(20)(21). Unlike glucose assessed at a single time point providing only a 'snap-shot' of glycaemic control, or glycated haemoglobin that gives a single measure indicating mean glucose levels over a period of weeks, researchers can use these continuously measured data to assess how glucose levels vary across the day and night for several days or weeks and identify determinants of this variation and its health impact (17)(18)(19)(20)(21).
Researcher using CGM data tend to first derive summary variables that are then used in their subsequent analyses (e.g. exploring the association of these summary variables with later health outcomes). Summary variables might include area under the curve (AUC) (i.e. the average glucose level over time) or time spent in low, medium or high levels. While there are a set of variables that may be commonly derived in CGM studies there are increasing examples of studies addressing broadly similar research questions but deriving different summary variables.
For example, we found two papers assessing glycaemic variability in non-diabetic people, one that included morbidly obese participants (17) and the other that included healthy people (22).
Whilst both of these studies used standard deviation (SD), coefficient of variation (CV) and mean amplitude of glycaemic excursions (MAGE) as measures of variability, the one in morbidly obese people also used mean of daily differences (MODD) (17) and the other used mean absolute rate of change (MARC) (22). These two studies illustrate that (a) several measures of variability can be derived from CGM data and it is important to justify which are used and differences between them, which neither of these papers did, and (b) we would want consistent measures to be used across studies. Even when different studies derive a variable representing the same fundamental property it may be defined differently, for example using different thresholds to define hypo-, normo-and hyper-glycaemia (5,17). This lack of consistency across studies, together with insufficient reporting of study methods, means that it is difficult to interpret results. It is also difficult to seek replication or pool study results in metaanalyses, when varied measures are derived (5)(6)(7)(8)11,(23)(24)(25). For example, a recent review that compared studies according to the proportion of time in hypo-normo-and hyper-glycaemia was limited because researchers used different thresholds or did not include these measures at all (17). It is also unclear whether researchers derive many summary variables but only present those for which analysis supports their hypothesis, such that the evidence published in the literature and on which clinical decisions are based may be biased (26). The American Diabetes Association recently suggested some summary statistics (such as the coefficient of variation to assess variability and proportion of time in ranges [hypo-, normo-and hyper-glycaemia]) to assess glucose control in patients with diabetes but acknowledged further research was needed to establish which summary measures are most useful even in diabetes patients (8). Outside this guidance we are unaware of any that has been suggested for the broader use of CGM in epidemiology; nor are we aware of any general epidemiology research tools to systematize analyses of CGM data.
In this paper, we present GLU, a general open-source tool for processing CGM data. GLU performs quality control and derives a set of glucose characteristics (illustrated in Figure 1), that can be used in subsequent analyses. Use of a common tool will help to standardise methods across research studies. Hence, in the future it will be easier to compare and meta-analyse results across studies, and perform replication analyses. An open source tool also improves transparency of methods as all code is freely available, aiding interpretation of results.
Furthermore, we intend to update GLU as methods advance. The presentation of this tool is timely as CGM is beginning to be widely adopted in epidemiological research, including both observational studies and randomised controlled trials (17)(18)(19)(20)(21).
IMPLEMENTATION GLU is implemented in R and requires the following R packages: optparse, ggplot2, stringr (see GitHub repository [https://github.com/MRCIEU/GLU] for package versions). A preprocessing step converts device generated CGM data to the CSV format required by GLU (detailed in the GitHub repository; scripts to preprocess Medtronic iPro2 (27) data are included in the GLU repository). Although we have developed GLU using example data from the Medtronic iPro2 device, we have ensured it can be easily used with data from other devices (e.g. Abbott's Freestyle Libre (28)) by using alternative preprocessing steps. After the initial preprocessing step GLU is run by specifying two directories; the location of the CGM data files, and the location where derived data (e.g. summary variables and plots) should be stored.
The CGM data is processed in two main stages: 1) quality control, and 2) deriving summary variables (illustrated in Figure 1). GLU allows the user to specify optional arguments, and these include: • nightstart and daystart: Specifies the start time of the day-time and night-time periods of each day to accommodate different populations (e.g. an early bedtime may be more appropriate for studies of children). By default, night-time is between 11.00pm and 6.30am. If other times are used then this should be reported.
• pregnancy and diabetes: Indicates that the data pertains to pregnant women or diabetic patients, respectively, such that summary variables specific to these populations are derived (i.e. the thresholds used to determine the time spent in hypo-, normo-and hyperglycaemia levels, described in the 'Deriving glucose summary variables' section below). If neither of these options are selected summary variables are produced that assume participants are from a 'general population' without selection for pregnancy or diabetes.
• impute: Specifies that GLU should perform 'approximal' imputation, rather than restricting to 'complete days', as described in the 'CGM data quality control' section below.
GLU generates a comma-separated value (CSV) file of derived summary variables, which can be imported into statistical software for analysis.
CGM data quality control GLU performs quality control to help researchers ensure the integrity of the data, consisting of 3 automated steps: resampling, outlier identification and dealing with missing data (illustrated in Supplementary figure 1). GLU also provides plots for manual review of the CGM data after these automated steps.

Resampling
We resample the sensor glucose values across each participant's CGM sequence to one-minute intervals using linear interpolation (i.e. assuming a straight line between values at adjacent time points), to facilitate computation of summary variables. Given two adjacent time points t1 and t2, with sensor glucose values SG1 and SG2, respectively, linear interpolation estimates the glucose value of time point ′ where 1 ≤ ′ ≤ 2 as:

Outlier detection
Previous work has suggested that outliers can be detected by identifying time points that are more than two standard deviations (SD) from the sensor glucose values at both the previous and subsequent time points (5). However, as noted previously (6,19,29), glucose levels may not be normally distributed, so SD may not be an appropriate measure of variability. Furthermore, this approach is sensitive to the resolution of the glucose trace such that changes in resolution would affect which regions of a glucose trace are marked as outliers. This is because SD is invariant to changes in sampling frequency of a glucose trace, while the difference in glucose levels between adjacent time points is not. For example, if sensor glucose is recorded every minute rather than every 5 minutes then the difference in glucose between adjacent time points will be smaller but the overall distribution of sensor glucose values, and hence the outlier detection threshold (based on the SD of this distribution), will not change.
Using data described in our usage example (see Usage section below), we visually assessed the distribution of sensor glucose values for each participant and found these distributions to be very variablesome were normally distributed while others were skewed. We therefore base our outlier detection on the distribution of the differences of adjacent sensor glucose values rather than the distribution of sensor glucose values. We found that the distributions of the difference of adjacent sensor glucose values were more consistently normally distributed compared with the distributions of sensor glucose values. Also, using the differences of adjacent values means that this approach is invariant to changes in the resolution of a glucose trace. We use a threshold d, of kSD of a participant's distribution of differences between adjacent values (30). Time points with a glucose value that deviate more than d from the value at both the previous and subsequent time points, are marked as outliers for further consideration by the researcher. We chose a threshold of 5SD based on experimentation with our example data (see Supplementary section S1 for further details). Users can also change the value of k using GLU's outlierthreshold argument (see GLU GitHub repository for details), to make the outlier detection more conservative or lenient. Should outliers be detected and confirmed by visual inspection of the glucose trace then researchers may wish to: 1) use other data such as diet diaries to determine whether detected outliers may be due to some underlying cause such as food intake (rather than erroneous), and 2) perform sensitivity analyses to see the effect that removing identified outliers has on their results. Our outlier detection method uses a threshold determined using artificial outliers because we have no CGM data containing clear (erroneous) outliers on which to base our approach (Supplementary section S1). As CGM becomes more widely used, it will be possible to improve detection of outliers using outlier examples, and we plan to update GLU outlier detection as the field matures.
Assessing the impact of missing data assumptions CGM data may have missing time periods when the device is unable to record an interstitial glucose value, for example, if the device becomes displaced. When missing periods do exist, there may be systematic differences between the missing and observed values in the CGM data, such that the derived GLU summary variables may be biased. For instance, if sensor displacement (or removal) occurs during swimming and swimming is associated with low glucose values, then a swimmer's average glucose levels estimated using the observed data may be higher compared to the true underlying value. Under those circumstances associations of the GLU summary variables with a potential outcome or a risk factor may be biased. Alternatively, the CGM missing time periods may be missing completely at randomfor instance, some technological failures of CGM devices may be due to chance. We note that there are two related but distinct biases when using GLU derived summary variables: 1) bias of the derived values of participants GLU summary variables, and 2) bias of subsequent analyses using these summary variables. Bias from the former does not necessarily cause bias for the latter as this depends on the specific analyses performed.
GLU provides two approaches to help address missing data called 'complete days' and 'approximal imputed', that make different missingness assumptions. GLU's complete days approach uses only days with complete sensor glucose values to derive glucose characteristics (e.g. 24(hours)*60/5=288 values when using CGM data with 5 minute intervals). If the days of CGM data are missing completely at random (MCARdays) such that there are no systematic differences between the days with and without missing CGM data, then the derived CGM statistics will be unbiased, hence this missingness will not bias results of subsequent analyses (31). The MCARdays assumption of the complete days' approach may be violated. For example, characteristics of the participants such as their age or employment status may influence whether or not they complete the required number of capillary blood tests or the likelihood of the CGM device being displaced. However, even when MCARdays does not hold analyses using GLU's complete days statistics may still be unbiased depending on the specific further analysis in which they are used (31).
In general, imputation may help to reduce the amount of excluded data and relax the missing data assumptions, such that missing at random (MAR) (or sometimes missing not at random [MNAR]) may be assumed rather than MCAR (31). However, glycaemic control is influenced by several characteristics such that imputing portions of a glucose trace is non-trivial. When a participant's diet and exercise are identical across different days then time-matched data from other days can be used to mean impute missing time points (5), but in most epidemiological studies where data are collected passively under free-living conditions this is not appropriate.
GLU includes a simple imputation approach that fills in the missing periods using nearby data.
We refer to this as 'approximal imputation'. Our approach splits the missing period in half, and uses the sensor glucose data on the left to fill in the left half, and the sensor glucose data on the right to fill in the right half, as illustrated in Supplementary  Approximal imputation may help to reduce bias in the derived CGM statistics and hence bias in subsequent analyses that use these statistics. Under the assumption that nearby regions of each missing period are representative of that particular missing period, then CGM statistics derived from approximal imputed data may be less biased. It may however be more likely that missing regions are systematically different to their nearby non-missing regions. For example, if a device is unable to record very high glucose values then nearby glucose values will be systematically lower than the missing region. In this case approximal imputation may still help to reduce bias in the derived CGM statistics. This is because, if days with missing data are systematically different to days without missing data then approximal imputation will enable information from (the non-missing time periods on) these systematically different days to be incorporated into the derived summary variables. Similarly, if the CGM data are MCARday then the summary variables derived using approximal imputed data will be unbiased and more precise than the complete days version.
By default, GLU uses the complete days approach. Users can use the approximal imputation approach by running GLU with the impute argument. A researcher wishing to apply another imputation approach to their data (e.g. mean imputation, if appropriate) can do this prior to running GLU. In the rest of this paper we refer to days with complete CGM sequences (after imputation if this option is used) as the set of included days. We would suggest that researchers run their analyses using both complete days and approximal imputation and present all results from further analyses (one set could be in supplementary material) so that over time we can learn more about the nature of CGM missing data and its impact on different research questions.

Manual review
In the Data visualisation section we describe two plots generated by GLU; these can be used to further check data validity (see Usage section for a description of how we do this in our example).

Deriving glucose summary variables
After quality control, GLU derives a set of summary variables illustrated in Figure 1.

Overall glucose levels
Overall glucose levels are characterised by the AUC, and specifically GLU derives the mean AUC per minute so that these levels are comparable across time periods of different lengths (e.g. night-time versus day-time) (8). For each day, the AUC is calculated using the trapezoid method (5), as the sum of the area of the trapezoids created using linear interpolation between sensor glucose values at adjacent time points (as described above). We divide by the number of minutes in the time period (e.g. 1440 for whole days) to give the average glucose (mmol/L) per minute.

Proportion of time in hypo-, normo-and hyper-glycaemia
We calculate the proportion of time spent in hypo-glycaemia, normo-glycaemia and hyperglycaemia (8,20,32). In a 'healthy' (non-diabetic) and non-pregnant population hypo-glycaemia is defined as <3.3 mmol/L and hyper-glycaemia as ≥10mmol/l (33). The default output from GLU uses these thresholds to define hypo-and hyper-glycaemia in healthy non-pregnant populations (with normo-glycaemia defined as ≥3.3 to <10mmol/L). In patients with diabetes GLUs default for hypo-glycaemia is <3.9 mmol/L and for hyper-glycaemia is ≥10.0 mmol/L (with normo-glycaemia defined as ≥3.9 mmol/L to <10mmol/L) (34). For 'healthy' (nondiabetic) pregnant women we use the UK National Institute for Health and Care Excellence (NICE) recommended target range during pregnancy, ≥3.9 mmol/L to <7.8 mmol/L to define normo-glycaemia, and <3.9mmol/L and ≥7.8mmol/L to define hypo-and hyper-glycaemia (35). As already described, these diabetic and pregnancy specific thresholds can be specified using GLU's diabetic and pregnancy arguments, respectively. Because thresholds for defining hypo-and hyper-glycaemia (in 'healthy', diabetic and pregnant populations) vary geographically and over-time, (35,36), and differ for other groups (for example patients on intensive care units (20)), GLU also allows users to specify other thresholds. However, since GLU is intended to provide standard measures that can be compared (and as appropriate pooled) across studies, where researchers do this a clear justification should be given.

Overall variability
While SD and CV are widely used measures of glucose variability (9,25), as discussed above, the distribution of sensor glucose values for a given participant may not be normally distributed.
For this reason we use the MAD as a measure of overall variability of sensor glucose levels, defined as: Thus, after calculating the distance of each sensor glucose value from the median value, MAD is the median of these distances.

Variability from one moment to the next
We capture variability in a person's glucose levels across time using a measure based on the length of the line of a glucose trace (i.e. as if the peaks and troughs were stretched out into a line). This idea was recently suggested for CGM data (37) and previously proposed as a measure of complexity for time-series analyses in general (38). Intuitively, if you stretch out a glucose trace then the resultant straight line will tend to be longer when a trace has a larger overall variability (represented by MAD) and is more complex (a higher number of peaks, valleys and values (38)

Fasting glucose proxy
While fasting glucose has previously been reported using CGM data, the methods used to derive this measure can be unclear (39,40). In studies where meal times are known fasting glucose levels may be inferred using CGM data recorded before breakfast or after at least 7 hours fasting (5,21,41). For example, using the mean of the 6 consecutive values (with 5 minute intervals) before breakfast (21). Others have used glucose levels during particular periods of the nighttime as fasting levels, when meal times are not known (42). This can be problematic if participants eat during the night-time period (5), which occurs in an important minority who may be different in terms of their health and health related behaviours to those who do not eat during the night (43). GLU derives a general proxy measure of fasting glucose that does not require knowledge of meal times, calculated as the mean of the 30 lowest consecutive minutes (equating to 6 CGM values at 5 minute intervals) during the night-time.

Event statistics
Studies may ask participants to report their meal times and where this is the case GLU will generate 3 statistics describing subsequent glucose levels: time to peak, and glucose levels 1and 2-hrs post-prandial (5).
Time to peak is calculated as the number of minutes from the meal to the next sensor glucose peak-i.e. the nearest subsequent sensor glucose value at time t where 1 < > 2 , and 1 and t2 are the nearest previous and subsequent time points to t, respectively, where ≠ 1 and ≠ 2 . We cannot simply find the time point with a higher glucose value than the time points directly before and after, as the peak may consist of a plateau where multiple time points have the same value.
The 1-hr and 2-hr post-prandial glucose measures are calculated as the AUC during the 15minute period 1-and 2-hrs, respectively, after the meal was recorded. We also calculate the 1and 2-hr AUC for exercise and medication events, when this information is available.

Data visualisation
The following plots are produced by GLU: • Sensor glucose trace plots for all participants that can be visually inspected. This plot also includes indicators of events (where these are provided) including the timing of meal, exercise, use of relevant medications and capillary blood glucose measurement levels. Identified outlier values and imputed time periods (as described above) are also shown on these plots.
• Poincare plots to illustrate the stability of each participants blood glucose levels (10,29,32). Women's weight and height were measured at the clinic visit when the CGM device was inserted and used to calculated body mass index (BMI; kg/m 2 ). We considered age, parity and gestational age at CGM measurement as potential confounding factors. Age and parity were reported by the woman; gestational age was calculated from the dates for which the CGM was worn and the woman's expected date of delivery based on her antenatal records (for the vast majority this would be based on a dating scan).

Analyses
Since GLU uses different thresholds for defining hypo-normo-and hyper-glycaemia in pregnant compared with non-pregnant women, we divided our CGM instances into pregnancy and postnatal subsets. For the pregnancy subset, we ran GLU with the pregnancy argument.
For the postnatal subset, we used the default GLU settings (i.e. we didn't specify any optional parameters). For both, we ran GLU with the complete days approach (which is used by default), and approximal imputation approach (by specifying the impute argument). We manually reviewed the trace and Poincare plots to determine whether there may be any anomalies.
Poincare plots show how a person's glucose varies across moments in time (specifically one minute to the next, because GLU resamples CGM data to 1-minute intervals as a pre-processing step). A deviation from the trend along the ascending diagonal on this plot may reflect an erroneous sensor glucose value in the original CGM data, rather than true variation of glucose levels. Sensor glucose values will tend to vary smoothly on CGM trace plots so erratic changes shown on these plots may also indicate erroneous data.
We summarised our derived GLU summary variables at each of the 2 pregnancy and 2 postnatal time points using median and interquartile range (IQR). We then examined the association between early pregnancy BMI (exposure) and GLU CGM derived variables during pregnancy, using the 43 women with a measure during pregnancy. Of these 43 women 32 had just one set of CGM data during pregnancy (18 early-and 14 late-pregnancy) and 11 had data for both early and late pregnancy. For the main analyses we used early pregnancy data for the 11 participants with data at both pregnancy time points. We also undertook a sensitivity analyses in which we instead used late pregnancy measures for these 11 participants. We used linear regression to estimate the association of BMI with the following glucose trace summary variables: overall mean glucose, MAD, sGVP, fasting glucose proxy, post-prandial time-to-peak, and post-  Table 6).
In analyses using both complete days and approximally imputed data a higher BMI during pregnancy was associated with higher overall mean glucose levels during both the day-and night-time (as measured by AUC), higher time spent in hyper-glycaemia during the night-time, and shorter post-prandial time to peak (Figure 2