## Abstract

**Importance** Hurricane Maria made landfall in Puerto Rico on September 20, 2017. As recently as May of this year (2018), the official death count was 64. After a study describing a household survey reported a much higher death count estimate, as well as evidence of population displacement, extensive loss of services, and a prolonged death rate the government released death registry data. These newly released data will permit a better understanding of the effects of this hurricane.

**Objective** Provide a detailed description of the effects on mortality of Hurricane Maria and compare to other hurricanes.

**Design** We fit a statistical model to mortality data that accounts for seasonal and non-hurricane related yearly effects. We then estimated the deviation from the expected death rate as a function of time.

**Setting** We fit this model to 1985-2018 Puerto Rico daily data, which includes the dates of hurricanes Hugo, Georges, and Maria, 2015-2018 Florida daily data, which includes the dates of Hurricane Irma, 2002-2004 Louisiana monthly data, which includes the date of Hurricane Katrina, and 2000-2016 New Jersey monthly data, which includes the date of Hurricane Sandy.

**Results** We find a prolonged increase in death rate after Maria and Katrina, lasting at least 207 and 125 days, resulting in excess deaths estimates of 3,400 (95% CI, 3,100-3,700), and 1,800 (95% CI, 1,600-2100) respectively, showing that Maria had a more long term damaging impact. Surprisingly, we also find that in 1998, Georges had a comparable impact to Katrina’s with a prolonged increase of 106 days resulting in 1,400 (95% CI, 1,200-1,700) excess deaths. For Hurricane Maria, we find sharp increases in a small number of causes of deaths, including diseases of the circulatory, endocrine and respiratory system, as well as bacterial infections and suicides.

**Conclusion and Relevance** Our analysis suggests that since at least 1998, Puerto Rico’s health system has been in a precarious state. Without a substantial intervention, it appears that if hit with another strong hurricane, Puerto Ricans will suffer the unnecessary death of hundreds of its citizens.

**Key Points** **Question:** How does the effect of Hurricane Maria on mortality in Puerto Rico compare to the effect of other hurricanes in Puerto Rico and other United States jurisdictions?

**Findings:** We estimate about 3,000 excess deaths after Maria, a higher toll than Katrina. Only other comparable effect was after Georges, also in Puerto Rico. For Georges and Maria, we observe a prolonged death rate increase of more than 10% lasting several months. The causes of death that increased after Maria are consistent with a collapsed health system

**Meaning:** Puerto Rico’s health system does not appear to be ready to withstand another strong hurricane.

## Introduction

Hurricane Maria made landfall in Puerto Rico on September 20, 2017, interrupting the water supply, electricity, telecommunications networks, and access to medical care. In early May 2018, the official death count stood at 64^{1}. This figure was in conflict with estimates obtained by groups that ostensibly had access to death counts for September and October from the demographic registry of Puerto Rico. By comparing these two numbers to historical averages, additional deaths were estimated to be in excess of 1,000^{2-5}. However, as of May 2017, the government of Puerto Rico was not releasing the 2017-2018 data.

On May 29, 2018 a multi-institutional study was published^{6}, here referred to as the *Harvard Study,* describing a survey of 3,299 households and reporting a death count estimate of 4,645 (95% CI, 793-8,498). It also reported an extensive loss of services after the hurricane. Perhaps most importantly, the study showed evidence of a sustained increased effect on mortality throughout this extended period. These findings underscored the importance of a careful analysis to determine if, for example, there was a systematic increase in deaths due to indirect effects, if a specific demographic was at greater risk, and what type of medical conditions needed most attention.

The Harvard study received worldwide media coverage and three days after its publication, while under significant public pressure and facing a lawsuit^{7}, the government finally made the data public and acknowledged the possibility of a higher death count of 1,427^{8}. This number is consistent with the value one obtains by subtracting the 2017 monthly counts from the 2013-2016 average using the newly released data (Supplementary Table 1): (2928-2399)+(3040- 2514)+(2671-2418)+(2820-2701)=1427. Santos-Lozada and Howard^{9} updated their previous estimate to 1,139 (95% CI, 1,006-1,272) for September, October and November. However, neither of these estimates took into account the population displacement described by the Harvard study and others^{10,11}. An analysis, posted online, took into account population displacement and suggested a larger count^{12}. A government commissioned study came to a similar conclusion and provided an estimate of 2,975 (95% CI:2,658-3,290) for the total study period of September 2017 through February 2018^{13}.

Here we use these daily counts and individual level mortality data to provide a detailed picture of the effect Hurricane Maria had on mortality in Puerto Rico. We compare the death rate increases to those observed in Hugo and Georges, two previous hurricanes in Puerto Rico, Katrina in Louisiana, Irma in Florida, and Sandy in New Jersey. We find a disturbing pattern in the Puerto Rico data.

## Methods

### Data

We set out to obtain detailed mortality and population size data related to hurricanes Hugo, Georges, and Maria in Puerto Rico, Katrina in Louisiana, Sandy in New Jersey, Harvey in Texas, and Irma in Florida. We requested individual death information, but this was not always available. We used whatever data was made available, which resulted in a mix of individual, daily and monthly data.

### Hurricanes Hugo, Georges, and Maria (Puerto Rico)

We requested daily death count data from the Department of Health of Puerto Rico and obtained data from January 1985 to December 2014. We also requested individual level information with no personal identifiers from the Department of Health of Puerto Rico and obtained individual records including date, gender, age, and causes of death from January 2015 to June 2018. We used these data to construct the daily counts for the 2015-2018 period. Exploratory data analysis showed that data were incomplete after May 31, 2018 (Supplementary Figure 1) and discarded data past this date. Yearly population estimates for the island were obtained from the Puerto Rico Statistical Institute. We computed daily population estimates via linear interpolation (Supplementary Figure 2A). To obtain an estimate of the population displacement after Hurricane Maria, we used population movement estimates from Teralytics Inc.^{14} (Supplementary Figure 2B). We combined these two datasets to obtain a final estimate of the population (Supplementary Figure 2C).

### Hurricane Irma (Florida)

We requested daily death counts from Florida’s Vital Statistic System and obtained data from 2015 to 2018. For consistency, we discarded data past May 31, 2018. Yearly population estimates were obtained from the US Census for 2015-2017. We computed daily population estimates using interpolation and extrapolated using a linear model for 2018. Furthermore, we used data provided by Teralytics Inc. to estimate changes in population due to immigration from Puerto Ricans due to Hurricane Maria (Supplementary Figure 2D).

### Hurricane Katrina (Louisiana)

We requested daily death counts from Louisiana’s Vital Statistic System and obtained data from 2003-2006. For Louisiana, we also obtained monthly death counts from the Underlying Cause of Death database through CDC WONDER for 2000-2008^{15}. These two datasets did not match for the months following Hurricane Katrina (Supplementary Figure 3). Since the data for August 2005 matched, we used the daily data to divide the monthly counts for August into before, during, and after the hurricane counts. We obtained population estimates from the US Census and computed daily population estimates via linear interpolation (Supplementary Figure 2E).

### Hurricane Sandy (New Jersey)

We obtained monthly death counts for New Jersey from the Underlying Cause of Death database from 2000-2016. We also obtained yearly population estimates from the US Census and interpolated to obtain monthly estimates. (Supplementary Figure 2F)

### Hurricane Harvey (Texas)

We requested daily death counts from Texas’ Vital Statistic System, but our petition was denied. The Underlying Cause of Death database does not have data available for 2017.

### Statistical Methods: Daily counts

We assumed that the death counts *Yi,j* for the *j*-th day of the *i*-th year follow a Poisson distribution with rate:

Here *N i,j* is an offset to account for the changing population size, α_{i} accounts for the year-to-year variability not due to hurricanes, *s(j)* is the seasonal effect for the *j-*th day of the year, *t*_{i,j} = 365 * (*i* - 1) + *j* is time in days, and *f(t*_{i,j}*)* accounts for the remaining variability not explained by the Poisson variability. So, for example, a virus epidemic will make *f(t*_{i,j}*)* go up slowly, eradication of this epidemic will make *f(t*_{i,j}*)* go down slowly, and a catastrophe will make *f(t*_{i,j}*)* jump up sharply. We therefore assume *f(t*_{i,j}*)* is a smooth function of *t*_{i,j} except for the days hurricanes make landfall in which the function may be discontinuous.

Since *s*(*j*) is seasonal, we use Fourier’s theorem and model it as:

We include an intercept *μ*, which represents the baseline rate for the entire period being studied. We assume that *f(t*_{i,j}*)* is a natural cubic spline with *L* equally spaced knots *τ*_{1}, …, *τ*_{L}, except that the closest knot to the hurricane day is changed to be exactly on the hurricane day and we permit a discontinuity at this knot. Since natural cubic splines can be represented as a linear combination of basis function and *S(j)* is a linear combination of known functions, ours is a generalized linear model (GLM) and, in theory, we can estimate.α_{i}, *s*, and *f* using maximum likelihood estimates. However, because we want *f* to be flexible enough to capture relatively high-frequency signals, we instead implement a modular approach that first estimates.α_{i} s the *s* and 0, then uses these as offsets to estimate *f* Specifically, we assume that for the non-hurricane years, α_{i}+ *f(t*_{i,j}*)* average out to 0 across years and use this assumption to obtain the MLE for *S(j)*, denoted with . We then use . as an offset to estimate the year-to-year deviations α_{i} using only months not affected by hurricanes (March-August). We then use the estimate as an offset, estimate *f(t*_{i,j}*)* with the MLE and use standard GLM theory to estimate standard errors. Finally, due to lack of data to estimate α_{i} for 2018, we extrapolated from the estimates from the previous year (Supplementary Figure 4).

For the seasonal effect we use *K=3* since it results in an estimate that captures the general shape of the seasonal trend (Supplementary Figure 5). We used 4 knots per year to model *f(t*_{i,j}*)* as this results in a smooth estimate that captures the trend observed in the data (Supplementary Figure 6). We removed years 2001 (Supplementary Figure 6D) and 2014 (Supplementary Figure 6F) for the computation of the seasonal effect: in 2001 there appears to be undercounts in January and in 2014 we see an increased mortality rate in agreement with the Chikungunya epidemic. Diagnostics plots for the residuals, after removing 2011, further show that the model fits the data (Supplementary Figure 7). For the monthly data, we fit a monthly version of this model (See supplementary methods for details).

### Excess death estimates

We first determined the period of indirect effect for each hurricane. To do this, we defined the interval starting on landfall day, denoted here with *t*_{0}, until the first day, *t*_{i,j}> *t*_{0}, for which *f(t*_{i,j}*)* ≤0. Because we do not observe *f(t*_{i,j}*)*, and instead obtain an estimate, denoted here with , we took the conservative approach of defining the period with the day for which the lower part of a marginal 95% confidence interval crosses 0: . We denoted this interval with *I* and defined the excess deaths by adding the observed deaths minus expected deaths for every day in the interval:

We construct a 95% confidence interval using the following approximation for the Poisson model:

### Natural variability

We quantify the day-specific variability with the observed standard deviation across non-hurricane years for *f(t*_{i,j}*)*. Because of years like 2001 and 2014, we estimate this standard error with the medial absolute deviation. We refer to this as the *natural variability.*

### Cause of Death

To examine if any cause of death was more prevalent after the hurricane, we used the individual records data spanning 2015-2017. We did not include 2018 data because it appears incomplete (Supplementary Figure 8). We divided causes of death into 30 categories (Supplementary Table 2) and, for each of these, we computed the observed death rate during the September 20 - December 31 period for 2017. We then compared these to the expected rates computed with the 2015-2016 data for that same period. We used the Poisson model to compute confidence intervals. To estimate a daily effect for a cause of death, we fit the Poisson GLM model described above.

### Code

All the code used to implement the procedures described above are included in a GitHub repository: (https://github.com/rafalab/maria). We conducted our analysis using R version 3.4.4.

## Results

### Indirect effects

The death rate in Puerto Rico increased by 73.9% (95% CI, 63.1%-85.4%) the day after Hurricane Maria made landfall (Figure 1, Supplementary Figure 6G) and did not return to the historical average until at least April 15, 2018. During this period, the average increase was 22% (Figure 1, Supplementary Figure 6G). The effects of Katrina were much more direct. On August 29, 2005, the day the levees broke, there were 834 deaths, which translates to an increased in death rate of 689%. However, the increase in mortality rates for the four months following this catastrophe were 17%, 9%, 11%, and 5% percent respectively (Supplementary Figure 9). For Georges, we observed a similar pattern to Maria: a sharp increase to 33% (95% CI, 25%-41%) on landfall day and the death rate not returning to the historical average until January 5, 1999 (Figure 1, Supplementary Figure 6C). The average percent increase in this period was 9%. None of the other hurricanes examined had noticeable indirect effects.

### Excess deaths

We estimated excess deaths of 3,433 (95% CI, 3,189-3,676) for Maria in Puerto Rico, 1,832 (95% CI, 1,600-2,064) for Katrina in Louisiana, and 1,427 (95% CI, 1,247-1,607) for Georges in Puerto Rico (Figure 2). Note that the numbers in the abstract are rounded to the closest 100 for consistency with the precision implied by standard errors of about 100 deaths. These estimates were calculated over periods of 207, 125, and 106 days after landfall respectively. We note that the way in which these deaths accumulated through time were distinctively different. Namely, 39.9% of the excess deaths associated with Katrina occurred on the day the Levee’s broke, while for the Puerto Rico hurricanes, the excess deaths accumulated slowly through a period of months after landfall (Figure 1, Supplementary Figure 6C, Supplementary Figure 6G).

### Cause of Death

Increases in rates were not uniformly seen across all cases of death after Hurricane Maria. Instead, we observed increases for a subset of the causes (Figure 3A). Not surprisingly, storm related deaths showed the largest increase. However, in terms of total excess deaths diseases of the circulatory, nervous, endocrine and respiratory systems explained well over 65% of deaths until at least December 31, 2017 (Table 1, Figure 3B). The indirect effects were for these causes were substantial (Figure 3C).

### Death Rate by Age and Gender

The most affected demographic groups were, from most to least: 80 years and older, 70-79, and 60- 69 and 50-59 with this last group mostly affected by indirect effects (Figure 4). Although younger demographics were not significantly affected (Supplementary Figure S10), these results demonstrate that a large proportion of the population was indeed affected by the indirect effects. We found no significant differences between genders (Supplementary Figure 10).

### Natural variability in excess death rates

The standard deviation of taken across non-hurricane years translated to increases of about 4%, with values as high as 6% in parts of the winter (Supplementary Figure S11A). As a result, for a period of, for example, 103 days (September 20 to December 31) these levels of variability translate into a standard deviation of excess deaths of about 300. For a period of 207 days, the standard deviation grows to about 500 (Supplementary Figure S11B).

## Discussion

On August 29, 2005, a surge due to Hurricane Katrina breached levees and flooded several residential areas in the New Orleans area. This was catastrophic and caused over 600 direct deaths and more to 1,500 indirect deaths in the following four months. In a June 2006 report^{16}, the U.S. Army Corps of Engineers admitted that faulty design specifications, incomplete sections, and substandard construction of levee segments, contributed to the damage and $14.5 billion^{17,18} has been invested in constructing stronger levees^{18}. On September 21, 1998, Hurricane Georges made landfall in Puerto Rico causing great damage to an already fragile electrical grid. The raw mortality data shows a disturbing increase in mortality rates (Supplementary Figure S12) and using a formal statistical model, we estimated close to 1,400 excess deaths due to this hurricane, an overall impact similar to that of Katrina. In contrast to the response in Louisiana, as far as we know, no systematic effort was put in place to improve Puerto Rico’s electrical grid nor its fragile health system. On the contrary, it appears that the grid continued to deteriorate for the next 19 years^{19}. Tragically, after Hurricane Maria made landfall the electrical grid was destroyed leaving 100% of the population^{20}, including health facilities, without electricity. It has been well documented that restoration of the electrical grid has been slow^{21}, with some estimates reporting that only 30% of the population had electricity a month after the tropical storm^{22}. We estimate that, as a result of this fragile health system, a large proportion of the population was affected and as many as 3,000 excess deaths occurred. The insights presented in our analysis should be considered in preparation efforts for the next hurricane.

It is important to note that our estimates and confidence intervals are for the excess deaths above the expected count and that, given the observed variation in non-hurricane years, we can’t attribute all these deaths exclusively to the hurricane, nor can we say that the toll was actually much higher. The same is true for other published estimates^{4,9,13}. If we include this natural variability in the uncertainty estimates, the confidence intervals would grow by as much as 1,000 on each limit. This underscores the importance of going beyond analyses based on just monthly counts and historical averages. For example, the sharp jump and then slow decrease in death rate observed on landfall day (Figure 1) provides strong evidence that the observed increases are in fact due to hurricane. Studying specific causes of the excess deaths (Figure 3, Table 1), which do not show increases for causes such as murders, viruses, and cancer, provide further support for the hurricane being the main cause.

In our analysis, the number of knots defining our splines, the number of harmonics in the seasonal effect, and the way we extrapolate to estimate the 2018 death rate were chosen by visual inspection. We therefore performed sensitivity analysis on these choices. Changing the number of knots or the number of harmonics had negligible effects (Supplementary Figure S13 and Supplementary Figure S14). The choice of the expected post Maria death rate did have a larger effect. If we assumed a lower death rate (consistent with unhealthy individuals leaving) the excess death estimate increased and if we assumed a higher death rate (consistent with younger healthier individuals leaving) the excess death estimate decreased (Supplementary Figure S15). We also examined how the estimate changed with different population estimates (Supplementary Figure S15). None of the changes led to a different overall conclusion. The code needed to recreate all the figures and tables is available at: https://github.com/rafalab/maria and we invite others to use our publicly available code and data to try out other approaches.

## Acknowledgments

We thank María M. Juiz Gallego and Jose A. López Rodríguez from the Department of Health of Puerto Rico for diligently providing all the data we requested. We thank the Puerto Rico Institute of Statistics for providing population data. We thank Canay Deniz, Andrea Samdahl, Lara Montini and Ilya Vasilenko from Teralytics for sharing their data and providing helpful explanations. We thank Matthew Kiang for many valuable demography insights and Deepak Lamba Nieves for suggesting readings on Puerto Rico’s electrical grid. We thank Raul Figueroa for critiques that led us to perform the sensitivity analysis. Finally, we thank all the authors of NEJM paper whose publication appears to have resulted in the government releasing the data. In particular, we thank Caroline Buckee for getting it all started.

This findings and conclusion are those of the authors and do not necessarily represent the official position of the Florida Department of Health.

## Footnotes

* rafa{at}jimmy.harvard.edu