Novel method to determine diagnosis-defining refraction points

Diagnosis of a certain disease generally relies on definitions established by professional medical societies and comprise the patient’s history, physical examination, and test results. These include physical compositions such as body mass index (BMI), and laboratory tests such as serum creatinine and albumin in urine samples. In general, laboratory tests are based on mathematical methods, e.g. defining critical values from the mean ± kσ of a population, where k is a natural number and the standard deviation is σ (“mean ± kσ-method”). In most cases k is defined as 2, leading to reference ranges defining 95% of test results as normal. However, this method mostly depends on a normal distribution of values. Here we applied a novel method (“SoFR-method”) based on data sorting to define refraction points, which carry informative value as diagnostic criteria. Applying the SoFR-method, standard measures such as critical BMI-values are categorized by equal robustness as by the mean ± kσ-method. However, the SoFR-method showed higher validity when analyzing non-normalized values such as creatinine and albumin, as well as hepatocyte growth factor (HGF) and hemoglobin in a novel Perioscreen assay in saliva of diabetic and non-diabetic patients. Taken together, we defined a novel method based on data sorting of test results from patients to effectively define refraction points which might guide more accurately clinical diagnoses and define relevant thresholds.

Introduction 40 The determination of a pathophysiological condition or the diagnosis of a certain disease 41 usually rely on definitions established by professional medical societies. These diagnostic 42 criteria are based on the combination of patient´s signs, symptoms, thus patient´s history, 43 and physical examination and test results [1,2]. Test results are quantifiable and comprise 44 measures such as body weight and body mass index (BMI) as well as laboratory tests such 45 as serum creatinine and albumin in urine samples. 46 Deducing diagnostic value from laboratory test results depends on mathematical methods 47 [3,4]. These define -presumably -critical values from the mean ± k of a healthy 48 population, where k is a natural number and  is the standard deviation. Depending on the 49 rigorousness desired, k varies from 1 (epidemiological measures) to 3 (tumor markers and 50 cardiac troponin), but often is 2 (most laboratory tests, such as transaminases, glucose, 51 lactate dehydrogenase etc.), where the range within these thereby defined 95% is 52 considered the reference range [5][6][7]. Indeed, the broadly used current practice defines this 53 reference interval based on collected ~120 samples derived from healthy individuals in a 54 population [3]. Despite uncertainties determining "healthy" in cohorts, values between the 55 2.5 th (almost equal to mean -2) and 97.5 th percentiles (almost equal to mean + 2) or the 56 99 th percentile (mean + 3) are used for the definition of a reference interval. 57 Recent mathematical attempts were undertaken to more specifically address 58 population-based issues, in particular in populations with high percentage of various races 59 and ethnicities, such as in the United States [8,9]. When using large data basis, such as 60 electronic health records [10], reference intervals are sharpened and display differences 61 between demographics, which might have clinical impact including prognostic value. 62 Certainly, the reference range including the upper and lower reference limit guides 63 clinicians to interpret test results, knowing that these 95% of values are found in healthy Here, we define the method which will show the refraction points of the data. From the 96 original data, we will sort it in ascending order. When displayed on a graph, it may be 97 possible to find refracted points. This method for finding refraction points, we will call 98 "SoFR-Method" (this is an abbreviation of "Sorting, Finding the Refraction Points"). All data points were sorted ascendingly by its magnitude. After sorting the data, the points 135 where the trends of the sorted data were visually changed with regard to its direction, we 136 will call these points "refraction points."

137
On the other hand, the "Inflection point" is mathematically defined. This is defined as the 138 zero points of the second derivative of a smooth curve. In the case of the normal 139 distribution, there are two inflection points which are ± 1 x  away from the mean (where  140 is the standard deviation of the normal distribution).

141
Further, for testing the normality of data, Shapiro-Wilk normality test was applied. In the 142 Shapiro-Wilk normality test, the null-hypothesis is the data are come from a normally 143 distributed population. If the p-value is less than chosen level of significance, then the 144 null-hypothesis is rejected. On the other hand, if the p-value is greater than the chosen 145 level of significance, we can't refuse the null-hypothesis. In the case of the level of 146 significance 0.05, if p-value >0.05, then the null-hypothesis can't be rejected.

147
In this paper, for numerical analyses and graph drawing, the statistical software "R" was 148 used.

150
In this part, we explain four examples of using the SoFR method. The Shika-Clinic data 151 was collected for the research of the relationship between dental health and diabetes. We 152 study two cohorts; diabetic patients and non-diabetic patients. We study these two cohorts 153 in terms of the HGF and Hemoglobin in saliva. 154 We are also searching the reference intervals of the values of the HGF and Hemoglobin in 155 saliva. It is very hard to find the thresholds, but the first two examples show that SoFR 156 method will suggest several candidates of the thresholds.

157
The last two examples are the simple applications of the SoFR method to 158 albumin-to-creatinine ratio (ACR) and BMI (Body Mass Index).

160
Hepatocyte Growth Factor in saliva 161 Using the Perioscreen assay [11], we analyzed both hemoglobin and HGF concentration in 162 saliva collected in patients, separated into cohorts of diabetics and non-diabetics. As 163 shown in Figure 1A, HGF was determined within a range of 26.4 pg/ml and 13574.1 pg/ml, 164 with large variation and significant differences between the 2 cohorts. (Fig 1A) 165 almost the equivalent proportion of a Perioscreen value, separated at ≤ 2 and ≥ 3, was 177 seen in non-diabetics when their cohort-specific refraction point of 2500 pg/ml of HGF was 178 applied.

179
As shown in Figure 1B, the QQ-plots demonstrate lack of normal distribution of this data 180 set, which refuses the necessity to apply distribution functions or applying the mean ± k x σ 181 method. (Fig 1B-1 and 1B-2) 182 183 Hemoglobin in saliva 184 We further examined hemoglobin concentration in saliva specimen. 185 Following, we sorted the data and colored values from the Perioscreen assay are shown in 186 Figure 2B. We detected different refraction points for both analyzed cohorts. (Fig 2A) 187 While we observed a refraction point at 2.0 µg/ml for diabetics, there were two refraction  As shown in Figure 2B, the QQ-plots demonstrate lack of normal distribution of this data 202 set, which refuses the necessity to apply distribution functions or applying the mean ± k x σ 203 method. (Fig 2B-1

210
The original data contains samples with values of 1556.60. It is not an outlier, but we would 211 like to ignore this sample for research purposes. As shown in Figure 3A, a refraction point 212 of ACR was visualized at 30mg/g creatinine for n=123 samples when data were evaluated 213 where HGF-screening was performed. (Fig 3A) 214 We further separated the data at a refraction point α with "A" being the subset of data with 215 values <α and "B" as the complementary data set of "A" with values > α. In the case of 216 ACR, α is 30.0 mg/g creatinine with "A" as the subset of patients characterized by 217 ACR-values <α (< 30.0 mg/g creatinine). "B" is the subset of patients with ACR-values ≥ 218 30.0 mg/g creatinine. By introducing another medical characteristic, such as nephropathy, 219 we show that the number of patients with nephropathy in "B" is much greater than in "A", 220 underlining the usefulness of the SoFR-method. 221 In contrast, applying the "mean ± σ method", the mean of creatinine data of the 222 Shika-Clinic was calculated to 24.32 mg/g creatinine, 1σ to 45.76 mg/g creatinine and thus 223 "mean + 1σ" equaled 70.08 mg/g creatinine. Moreover, for the minus part of "mean ± σ 224 method", we will get minus value (-21.44), it is not acceptable. As visualized in Figure 2A, 225 there is no evidence for a refraction point at the mean + 1σ value, suggesting inferiority of 226 the "mean ± 1σ method" compared to the SoFR-method.
227 Figure 2B shows the QQ-plot of the ACR data. Apparently, ACR data are not normally 228 distributed. (Fig 3B) 229 230

231
Body weight was measured in all 694 patients and the BMI was calculated. Lowest BMI 232 was 15.5 and highest BMI was 43.2. Thereafter, data was sorted by the BMI values in 233 ascending order, depicted in Figure 4A, showing sigmoidal shape of the data and two 234 refraction points at 20 and 30. In Figure 4A also the calculated mean ± 1 x  (red lines) and 235 BMI=20 and 30 (blue lines) are visualized. According to classifications by the WHO normal 236 weight considered in the range of 18.5 to < 25 and obesity at values ≥ 30 [12], thus at the 237 same value as we determined the second refraction point after data sorting. (Fig 4A) 238 We next performed Quantil-Quantil (QQ) plots of BMI values for quantile points of normal 239 distribution and quantile point of samples ( Figure 4B). The graph visualizes data 240 distribution and shows the degree of the normality of BMI data. However, it is difficult to 241 justify the normality from this graph. (Fig 4B) 242 Using the Shapiro-Wilk normality test the p-value of 4.18 x 10 -12 < 0.05 was calculated.

426
QQ plots were generated to check whether HGF values follow normal distribution for 427 diabetic and non-diabetic patients. In any case, normality is denied, and visual judgment is 428 considered to be superior to forcibly creating a reference interval with an mean ± kσ. The same analysis as HGF was carried out for the relationship between hemoglobin value 431 and perioscreen. Again, the refraction points differ between diabetic and non-diabetic 432 patients.

433
Moreover, it can be visually observed that the value of the Perioscreen increases as the 434 hemoglobin value increases. for diabetic and non-diabetic patients. In any case, normality is denied, and visual 438 judgment is considered to be superior to forcibly creating a reference interval with an mean 439 ± kσ.  It was checked by QQ plot whether ACR value is normally distributed. It can not be said 446 that it is a normal distribution.

448
The Shika-Clinic data contains 694 patients' BMI data. This is a graph in which BMI data 449 are arranged in order of size and the refraction points are seen.

451
QQ plots were generated to ensure that the BMI data followed the normal distribution.

452
According to this, the central part shows a good fit to the normal distribution, but not the fit 453 at both ends. Therefore, Shapiro-Wilk normality test was performed, the result is that 454 normality is denied.