REPLICCAR II Study: Data Quality Audit in the Paulista Cardiovascular Surgery Registry

Bianca Maria Maglia Orlandi; Omar Asdrubal VilcaMejia; Maxim Goncharov; Kenji Nakahara Rocha; Lucas Bassolli; Luiz AF Lisboa; Luis AO Dallan; Fábio B Jatene

doi:10.1101/777243

Abstract

Background Electronic health records databases are important sources of data for research and health practice. The aim of this study was to assess the quality of the data in REPLICCAR II, the Brazilian cardiovascular surgery database based in São Paulo State.

Study Design The REPLICCAR II database contains data from 9 institutions in São Paulo, with more than 700 variables. We audited data entry at 6 months (n=107 records) and 1 year (n=2229 records) after the start of data collection. We present a modified Aggregate Data Quality Score (ADQ) for 30 variables in this analysis.

Results The agreement between the data independently entered by a database operator and a researcher was good for categorical data (Cohen κ = 0.70, 95%CI 059, 0.83). For continuous data, the intraclass coefficient was high for all variables, with only 2 of 15 continuous variables having an ICC of less than 0.90. In an indirect audit, 74% of the selected variables (n = 23) showed a good ADQ score, regarding completeness and reliability.

Conclusions Data entry in the REPLICCAR II database is satisfactory and can provide accurate and reliable data for research in cardiovascular surgery in Brazil.

Introduction

The very foundation in Healthcare Science of its clinical studies, trials, and follow-ups is the quality of the data collected. Despite the lack of consensus regarding a standardized method to measure healthcare data quality, it is of utmost importance to establish the confidence and validity of the outcome. Hence, the research design, the variable selection and the gathering of the data are pivotal points to assert the reliability of the conclusion achieved¹.

As we may all know, observational studies are subject to bias, confounding, and a badly established retrospective registry. Publications like Zhang et al., 2014, Salati et al., 2016, and Dreyer et al., 2016 reported various approaches on how to devise data validation tools aimed at guaranteeing the quality of the results needed for decision making^2-5. In the same way, the developing of reliable databases in healthcare is essential, because they will not only be used as the basis for much academic research, but also to evaluate and derive guidelines, leading to the improvement of healthcare decision making^6-10. In the Cardiovascular and Thoracic Surgery field, the initiatives taken by the STS and European Society of Thoracic Surgeons (ESTS) Databases should, thus, be emphasized, because they both have aimed, since their inception, at gathering not only massive but also reliable data. This has direct implications for clinical outcomes, especially regarding mortality. The classical paradigm between volume and successful outcomes in cardiovascular surgery is currently being questioned regarding low-volume and quality emphasis programs^11-18, with some showing that just adhering to a quality improvement initiative could already impact mortality rates¹⁹.

The development of the Paulista Registry of Cardiovascular Surgery (REPLICCAR II), a multicenter prospective cohort study coordinated by the Instituto do Coração do Estado de São Paulo (InCor) aimed at evaluating morbidity and mortality predictors in patients undergoing coronary artery bypass graft (CABG) surgery and constitutes a definite example of the concept. Data collection and analysis were performed according to its guidelines set by professionals from different areas forming an interface between research and clinical practice. The adoption of quality-oriented data analysis then becomes imperative to assure the validity of its outcomes with the intent of enhancing its prospective clinical impact⁶.

The aim of the present study was to present the results of direct and indirect audits of the data quality of the registries included in the REPLICCAR II database after 6 months and 1 year.

Material and Methods

Data source and collection

This project included 9 institutions in the State of São Paulo: (i) Instituto do Coração do Hospital das Clínicas da Faculdade de Medicina da USP (InCor), (ii) Hospital Beneficência Portuguesa de São Paulo (Hospital BP), (iii) Hospital TotalCor, (iv) Hospital de Base de São José do Rio Preto (HBSJRP), (v) Hospital Albert Einstein da Sociedade Beneficente Israelita Brasileira (HIAE), (vi) Instituto Dante Pazzanese de Cardiologia (IDP), (vii) Santa Casa de Misericórdia de São Paulo (SCSP), (viii) Santa Casa de Misericórdia de Marília, and (ix) Hospital de Clínicas da Universidade Estadual de Campinas (HC-UNICAMP). The study thus strived to analyze public and private reference hospitals linked to institutions like philanthropic organizations and universities.

REPLICCAR II includes more than 700 variables, among which are factors related to general facts about the patients, their pre-, intra-, and postoperative assessments and their 30-day follow-up. Data for this study began being collected in August 2017, with each participant center responsible for mobilizing a team for the task, as well as being free to designate the person responsible, usually a medical resident. All researchers responsible for the gathering were previously trained on how to fill out the forms correctly.

Data gathering was performed by using the online platform REDCap-HCFMUSP (Vanderbilt, Tennessee, EUA/https://redcap.hc.fm.usp.br/), accessible from any computer with an internet connection, with access restricted to selected researchers. The data are stored in real time at a safe server at the University of São Paulo Medical School. This project was approved by this institution’s ethics committee, under the protocol number 2016/15163-0. Funding was provided by FAPESP (PPSUS).

Direct audit

A direct audit was carried out after 6 months of data collection; 7% (107 records) of the data collected at each center until February 2018 was randomly selected with STATA 13.1 software (StataCorp, Texas, USA), and for re-collection as performed by experienced independent investigators (auditors) within the team, who visited each center for this task. The auditors, with full access to each center’s own previously available database, re-collected these data, under two fundamental conditions: (i) that they were blinded to the original record and (ii) that each one did not re-collect the same data they had originally input. The original and the re-collected data then underwent statistical analysis to check for accuracy in data collection.

Indirect audit

After 1 year of data collection, due to the amount of data and the lack of financial and human resources, a direct audit was impractical. Thus, an indirect audit was performed. This time, all the 2229 records were analyzed, and 30 variables related to pre-, intra-, and postoperative factors were selected. These data then underwent statistical analysis.

Statistical analysis

For the direct audit, the data were analyzed using the program STATA version 13.1 (StataCorp, Texas, USA). Cohen’s kappa coefficient (κ) was taken for categorical variables to estimate the change in agreement occurring simply at random between raters and the observed variables.

Kappa coefficient (κ) was reported as⁷: (i) fair, when between 0.21 and 0.40; (ii) moderate, when between 0.41 and 0.60; (iii) substantial, when between 0.61 and 0.80; and (iv) almost perfect, when between 0.81 and 1.00⁷. The Intra Class Correlation Coefficient (ICC) was determined for continuous data variables by the assumptions of raters with similar characteristics and for evaluating rater-based clinical assessment methods (2-way Random-Effects Model for reliability of agreement). ICC varies between 0 and 1, with the first one suggesting no agreement, whereas the second one suggests perfect agreement. Values lower than 0.50 are indicative of poor reliability. Those between 0.5 and 0.75 were considered moderate, those between 0.75 and 0.9 good, and those higher than 0.90 were considered of excellent reliability⁸.

For the indirect audit, we focused on the analysis of the completeness and reliability of all the data included in REDCap. In this way, we adapted the methodology suggested by Salati et al, implemented by the REPLICAR II responsible team as follows³:

Completeness (COM) = (1 – [‘null values’/total expected values]) × 100

Reliability (REL) = (1− [‘inconsistent values’/ total expected values]) × 100

Rescaled COM = COM of the Unit − (average COM of all the examined Units/standard deviation of all the examined Units)

Rescaled REL = REL of the Unit − (average REL of all the examined Units/standard deviation of all the examined Units)

Aggregate Data Quality Score (ADQ) = Rescaled COM + Rescaled REL

The ADQ value illustrates the final score for both completeness and reliability. Thus, the closer it gets to zero, the closer the data will be to the expected average, showing the quality of data in a simplified way.

Results

Direct auditing

A total of 107 random records for direct data analysis collected in the initial 6 months of the REPLICCAR II Study (10% of the total sample) were audited. Table 1 summarizes the data of the direct audit of the variables selected for analysis. The observed inter-rater agreement occurring by chance (randomly) had an average κ of 0.70, with a standard error of 0.06 (95%CI 059-0.83). The analysis of each variable was mostly substantial (n = 4) to almost perfect (n = 3) κ coefficient, whereas 2 variables had a moderate κ coefficient.

View this table:

Table 1.

Direct Audit: Inter-Rater Agreement and Estimated κ coefficient of categorical variables. REPLICCAR II, 2019.

Table 2 presents the Intraclass Correlation Coefficient, analyzing the acceptable reliability of the direct audited variables. Preoperative hemoglobin had an average ICC of 0.70, but it later became clear that there were many different admissions in laboratory examinations, hence promoting disagreement about this situation.

View this table:

Table 2.

Direct Audit: Intraclass Correlation Coefficient (ICC) of numerical variables with, two-way random effects model. REPLICCAR II, 2019.

In this sample, the data for glycated hemoglobin, total bilirubin, and albumin levels were insufficient for analysis, but these variables are not mandatory in the registry.

Indirect auditing

After 1 year of data collection, an indirect audit was conducted, regarding completeness and reliability for 30 variables selected as relevant for risk analysis for CABG mortality.

Table 3 provides completeness and reliability of the variables included for the current evaluation and the ADQ of the study. Among the variables with less than 90% completeness (COM) and low ADQ score in the preoperative period were (i) total bilirubin (14.02%), (ii) total albumin (21.77%), (iii) HbA1c (41.73%), (iv) glucose (60.50%), and (v) ejection fraction (75.13%). In the postoperative period, there were only 2 variables in this condition: (i) ejection fraction (22.32%) and (ii) glucose (73.87%). The ADQ score for BMI on the other hand, was −4.7 but was related to the reliability. The remaining variables had more than 90% completeness and reliability.

View this table:

Table 3.

Aggregate Data Quality (ADQ) composite of completeness and reliability of REPLICCAR II database, 2019.

Figure 1 shows that 74% of the records (n = 23) had an acceptable ADQ score, considering that the positive values had an ADQ score larger than the sample mean, which can be considered as good data quality. The values under the first quartile were considered relevant for review.

Figure 1.

ADQ scale for the variables analyzed in the indirect audit. REPLICCAR II, 2019.

We propose criteria and definitions (Table 4) for some variables in the REDCap tool, including the BMI variable, with ranges for weight and height. The investigators will then receive an alert for each data imputation when the data are out of these determined values, thus, guaranteeing better reliability, data consistency, and acceptable ranges. The other variables with low completeness are not mandatory but reflect our reality.

View this table:

Table 4.

Rules applied to the RedCAP tools after the indirect audit considering better criteria and definitions to improve data quality in REPLICCAR II Study, 2019.

In summary, the REPLICCAR II study had satisfactory concordance and correlation in the first stage. However, the results of the indirect analysis were essential to develop methods for data confidence and quality improvement.

Discussion

The outcome variable (operative death) had 85% completeness and 92% reliability. Among the inconsistencies related to mortality, we verified that the cases of intraoperative death were negligible for the variable operative death. To rectify such in reliability, we inserted in the system a script that considers the cases of surgery without admission to the intensive care unit at the immediate postoperative period to count as death on the day of surgery. “Mortality in Hospital” or “30-Day Vital Status” had 96% completeness and 99.7% reliability.

Patients with unknown or incomplete 30-day mortality status could potentially introduce bias. However, “in-hospital mortality” completeness was almost fully recorded, and represents the vast majority of 30-day deaths. The use of simulations suggests that any errors committed by the false assumption that a patient with missing or unknown mortality status is alive have negligible influence on hospital mortality results, compared with random sampling error¹⁴; however, that doesn’t mean the registry is completely biased or inconsistent with reality.

The STS established a conceptual framework of quality measurement in Adult Cardiac Surgery with a comprehensive methodology for assessment of adult cardiac surgery quality of care. Among the quality indicators are the possibility of temporal evaluation (pre-, intra- and postoperative) and consideration for structures, processes, and outcomes. The quality scores should be interpretable and actionable by providers. The design of comprehensive quality measures involves clinical, health policy, and statistical considerations⁹. Due to several limitations, there is a high demand for the development of data quality analysis tools in healthcare. The use of these tools to assure completeness, reliability, and accuracy are essential to the validity in observational research, considering that those studies already have possible bias.

Lauricella et al, 2018⁶, published a data quality analysis on a similar initiative of developing such a database. The Paulista Lung Cancer Registry (PLCR), also developed by InCor, cannot be directly compared with REPLICCAR II, because the parameters are different. However, we can analyze some parameters, such as the COM. Within their 511 analyzed records, 21 of 105 variables (20%) had a COM < 0.90%. In our study, 7 of the 30 variables (23.33%) had the same results. The work by Salati et al, 2011¹, showed that 5 of 15 variables (33.33%) selected for the completeness study had COM < 90%. This way, we can consider that, in this parameter, the REPLICCAR II study showed excellent results. Considering that recent works are proposing this parameter as one of the most accessible for initial data quality studies¹, our study is within this trend.

In analyzing the direct audit results, it is important to remember that ICC considers the fact that close numerical values might be concordant even though they are different. This has important implications in a clinical study because different values within a close range (normal variance) will show good ICC values. Considering that different researchers (or even the auditor) may collect exam values from different dates for the same subject, these values may show good ICC results in the statistical analysis⁶. We can say that our ICC had good values, with only 2 of 15 variables (13.33%) showing an ICC of less than 0.90. Our lowest ICC value was 0.70 for preoperative hemoglobin. In Lauricella et al, 2018, the same values were found in 5 of their 12 numerical variables (41.66%), and their lowest ICC value was 0.51 for time from first symptom. The comparison, however, cannot be applied directly between our groups, because completely different parameters were studied in each work.

However, we must remember that these results do not present improvement strategies for the quality of the records because we cannot pinpoint the data collection error solely based on these parameters. Newly proposed parameters, such as the ADQ, may provide faster, more practical and low-cost analysis of generic data quality. Another evaluation that could be performed was the ADQ by each center, which could then be used to guide the centers about the strengths and weaknesses of their study variables and thus help them improve the quality of their data.

In the United States, there has been much discussion about the applicability of Big Data analysis to Healthcare. There is still great reluctance to adhere to the adoption of electronic systems in clinical practice, usually remaining restricted to administrative and financial analyzes¹¹. In 2009, the Health Information Technology for Economic and Clinical Health Act (HITECH) invested between US$ 14b and US$ 27b in the adoption of electronic systems in hospitals⁹. Consequently, the generation of massive data presents its own analysis challenges, especially when it comes to information security. In the United States, it is estimated that in 2011 there were already 150 exabytes of health-related information¹². Such reality clearly fits Big Data systems, already well established in different industries and with clearly effective results, like Google’s search engine, Amazon’s buying suggestions, and even in election campaigns¹³.

In Brazil, the Ministry of Health, through the Research Program of Sistema Único de Saúde (PPSUS), is an example of not only the potential of applying technology and databases in healthcare, but also the relevance of initiatives like REPLICCAR II. In São Paulo, Brazil’s largest city, there is a large concentration of big centers, researchers, skilled labor, and a massive convergence of people from various regions of the country who come to seek care in its hospitals. Thus, although it is a registry from the State of São Paulo, the diversity of the served population allows for greater data refinement. In the end, with investment and the management of a single project, we can obtain applicable analysis to different subpopulations, as recommended by the PPSUS Technical Guidelines, 5th edition, of 2014. This work also points to the importance of changing the health sciences data paradigm. By providing a qualified database, such as the STS or ESTS, the personal cost of exposing your data comes back in the long term by providing reliable community data, thus allowing the elaboration of larger and safer guidelines.

For decades, cardiac surgeons have collected and analyzed data systematically to continually improve outcomes in quality of care⁹. Health information technology (IT) has the potential to improve individuals’ health and provide better clinical performance, better healthcare quality, lower costs, and greater involvement of patients in their own health services⁹. The growing development of Big Data and therefore its use in healthcare science seems inevitable, and many initiatives are showing a prospective future in this area. However, if we cannot ensure the quality of the data in the analysis¹⁵, we will not succeed in deriving trustworthy and, ultimately, valid conclusions, regardless of the analytical model we may use¹⁶.

Among the benefits of artificial intelligence (AI) are (i) better knowledge of the general population, (ii) the comparison of patient samples in each institution, (iii) the evaluation of association and risk with predictive instruments for outcomes of interest, (iv) the elaboration of risk scores, (v) better planned strategies for the improvement of quality and safety in healthcare, and (vi) protocols based on reality and the available resources for each population subgroup.

Limitations

As expected from such a pioneering project, there were many challenges regarding the education of the professionals engaged in the collection of data and to ensure REPLICCAR data quality, as shown by the unexpected discrepancies in our results. Considering solely the analysis made (κ coefficient and ICC), we cannot point out the causes of these errors.
We found satisfactory concordance and ICC, but these results only show the capacity of the investigators to collect data in the first phase of the study (6 months after beginning). Considering that most centers designated their medical residents for the collection of data, we cannot ensure the long-term adherence of the centers and professionals to our proposition, because it is expected that a short- to medium-term rotation of these professionals will occur.
The work was limited due to its financial and human resources. Thus, a direct audit or a more restricted follow-up of the centers was impracticable. Nonetheless, faced with this, our team looked further for new perspectives in the data quality analysis, such as the ADQ, thus contributing to the development of this area.
This work points out that there is still a long way to go before developing a Brazilian national database comparable to the STS or the ESTS databases. We still cannot assure the adherence of professionals, researchers, healthcare centers, as well as the government to the promotion and adoption of electronic registries; nonetheless, the development of a consensus for a broad database is growing.

This work shows the seriousness and commitment to this project, already concerned not only with its development and implementation, but also the quality of its data.

Conclusion

The reliability and completeness of medical records are essential to the validity and reliability of the results obtained. Indirect auditing gave clear directions for data improvement, without the need to recollect a sample to evaluate concordance.

The best strategies based on our experience to improve data quality in a way the information can be reviewed in the moment the investigator is filling data, are periodical reports with detailed feedback and, above all, to maintain a sound scientific partnership with regular meetings to integrate well with working groups in each institution.

Findings of a discrepancy between the data only reinforces the need for quality-oriented statistical studies, because it directly influences the validity, the analysis, and conclusions performed in research. In places where such studies and their application are still underdeveloped, like in Brazil, studies in this field become even more indispensable. Focus on data quality is a sure factor that ultimately leads to a more efficient and safer healthcare system, and it will surely play an increasing major role in its development. The main objective of the present work was to implement improvement actions in a way that guarantees safety and validity to the results, as well as allowing feedback on REPLICCAR II itself. As an STS-based database, this project could provide the basis for a wider and reliable quality-focused program, with the prospect of a positive impact on clinical outcomes.

Our experience reinforces the importance of training, incentives, and standardization of the staff who collect the data and fill out the forms, which brings greater benefits and substantially lower costs than the direct auditing with the still traditional Raters Agreement Analysis. The latter demands more investigators to collect the data at each institution, extensive data analysis periods, and results related to the understanding of concepts and criteria. The indirect auditing was more practical in elaborating strategies for data quality improvement. ADQ considers the completeness and reliability of each variable in the study and shows the best parameters of data quality in prospective observational studies. It is therefore expected that it will attract more attention in studies yet to come, although there is still a lot of room for research in parameters for measuring data quality in the healthcare sciences.

Funding

This work was supported by the Programa de Pesquisa do SUS (PPSUS).

Conflict of interest

none declared.

Acknowledgements

REPLICCAR II Study Group, Proofreading by Mr. Rafael Frate.

Abbreviations

ADQ: Aggregate Data Quality Score;
CABG: coronary artery bypass graft;
ESTS: European Society of Thoracic Surgeons;
ICC: Intra Class Correlation Coefficient.

References

1.↵
Salati M, Brunelli A, Dahan M, et al. Task-independent metrics to assess the data quality of medical registries using the European Society of Thoracic Surgeons (ESTS) Database. Eur J Cardiothoracic Surg 2011;40(1):91–8.
OpenUrl CrossRef PubMed
2.↵
Zhang R, Jayawardene V, Indulska M, et al. A data driven approach for discovering data quality requirements. ICIS 2014 Proc. 2014;1–10.
3.↵
Salati M, Falcoz PE, Decaluwe H, et al. The European thoracic data quality project: An Aggregate Data Quality score to measure the quality of international multi-institutional databases. Eur J Cardiothoracic Surg 2016;49(5):1470–5.
OpenUrl CrossRef PubMed
4.
Dreyer NA, Bryant A, Velentgas P. The GRACE Checklist: A validated assessment tool for high quality observational studies of comparative effectiveness. J Manag Care Spec Pharm 2016;22(10):1107–13.
OpenUrl
5.↵
Seder CW, Falcoz PE, Salati M. International General Thoracic Surgery Database Collaboration. Thorac Surg Clin. 2017;27(3):303–13.
OpenUrl
6.↵
Lauricella LL, Costa PB, Salati M, et al. Measurement of the inter-rater reliability rate is mandatory for improving the quality of a medical database: experience with the Paulista Lung Cancer Registry. J Am Coll Surg 2018;226(6):1128–36.
OpenUrl
7.↵
Shweta, Bajpai R, Chaturvedi H. Evaluation of inter-rater agreement and inter-rater reliability for observational data: an overview of concepts and methods. J Indian Acad Appl Psychol 2015;41(3 [Special Issue]):20–7.
OpenUrl
8.↵
Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 2016;15(2):155–63.
OpenUrl CrossRef PubMed
9.↵
Shahian DM, Edwards FH, Ferraris VA, et al. Quality measurement in adult cardiac surgery: Part 1-conceptual framework and measure selection. Ann Thorac Surg 2007;83(4 SUPPL).
10.↵
Buntin M, Burke M, Hoaglin M, Blumenthal D. The benefits of health information technology: A review of the recent literature shows predominantly positive results. Health Aff 2011;30(3):464–71.
OpenUrl Abstract/FREE Full Text
11.↵
Audet AM, Doty MM, Peugh J, et al. Information technologies: when will they make it into physician’s’ black bags? MedGenMed 2004;6(4):2.
OpenUrl PubMed
12.↵
Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Heal Inf Sci Syst 2014;2(1):3.
OpenUrl CrossRef PubMed
13.↵
Murdoch TB, Detsky AS. The inevitable application of big data. JAMA. 2014;309(13): 1351.
OpenUrl
14.↵
Shahian DM, Torchiana DF, Engelman DT, et al. Mandatory public reporting of cardiac surgery outcomes: The 2003-2014 Massachusetts experience. J Thorac Cardiovasc Surg 2019;158(1):110–24.
OpenUrl
15.↵
Wang RY, Strong DM. Beyond accuracy: what data quality means to data consumers. J Manage Inform Syst 1996;12:5–33.
OpenUrl CrossRef
16.↵
Calvert M, Thwaites R, Kyte D, Devlin N. Putting patient-reported outcomes on the ‘Big Data Road Map’. J R Soc Med 2015;108(8):299–303.
OpenUrl CrossRef PubMed
17.
Shahian DM, O’Brien SM, Normand SL, et al. Association of hospital coronary artery bypass volume with processes of care, mortality, morbidity, and the Society of Thoracic Surgeons composite quality score. J Thorac Cardiovasc Surg 2010;139(2):273–82.
OpenUrl CrossRef PubMed Web of Science
18.↵
Kurlansky PA, Argenziano M, Dunton R, et al. Quality, not volume, determines the outcome of coronary artery bypass surgery in a university-based community hospital network. J Thorac Cardiovasc Surg 2012;143(2):287–93.
OpenUrl CrossRef PubMed Web of Science
19.↵
Miyata M, Motomura N, Murakami A, Takamoto S. Effect of benchmarking projects on outcomes of coronary artery bypass graft surgery: Challenges and prospects regarding the quality improvement initiative. J Thorac Cardiovasc Surg 2012;143(6):1364–9.
OpenUrl CrossRef PubMed Web of Science