Management of family relationship information for a three-generation cohort study

A system for inputting and storing family information, named “BirThree Enrollment,” was developed to promote a birth and three-generation cohort study (BirThree Cohort Study), and this system was operated successfully. In the study, it was necessary to satisfy many operational demands. Input information is overwritten and changed continuously. Complex kinship information must be quickly and accurately input and corrected, and information on those families not yet recruited must be retrieved. For these purposes, many devices are needed, from an input interface to the internal data structure. In the field of genetic statistics, a simple standard expressive form is used for describing family structure. This form has sufficient information for genetics; however, we developed this form further for our purposes in conducting the BirThree Cohort Study. To provide information about family roles as required in the BirThree Cohort Study, we expanded the data structure, and constructed the system that is able to be used for the daily operation. In our system, family pedigree information is stored along with initial clinical information, and enabled the input of all self-reported information to the data base. Operators are able to input this family information before the day is out. As a result, when recruitment is completed, family information will be completed concurrently. Therefore, it is possible to immediately know a certain person’s family structure. By using our system, data correction was improved dramatically, and the system was operated successfully. This study is the first report of the method for storing three generations of family data.


44
The Tohoku Medical Megabank (TMM) Project aims to provide creative reconstruction methods and solve 45 medical problems arising from the Great East Japan Earthquake (GEJE), which occurred in

68
One of the key features of the BirThree Cohort Study is that three-generation cohorts were recruited, rather than 69 just birth cohorts. Therefore, inputting and treating family information is more complex in a three-generation cohort 70 study than in other cohort studies. The Lifelines study positively collects information not only on a pregnant woman 71 and the child, but also on the father and other family members, as much as possible. Thus, it is an important initial 72 research for treating family information. Such family information, however, is stored and maintained by a different 73 system than that used for clinical information. As a result, massive data reduction after recruitment is 74 indispensable, and much work might be needed to maintain the correspondence of clinical information and family 75 information.

77
As a result, it was necessary to devise a data structure to operate the three-generation cohort recruitment. To 78 express kinship information, a common data structure has traditionally been used in the field of statistical genetics

79
[15-17]. Because the traditional data format can describe a parent-child kinship (Fig. 1A based on the specification. The specification is chiefly organized to handle the following operational matters: "Data 86 description,", "Retrieval," "Consent withdrawal,"and "Family roles."

95
The concept of the data structure used in typical statistical genetics is described in Fig. 1A respectively. These Lines are directional, such that the mother can be retrieved only by the certain child (Related

98
Lines 1), and the child cannot be retrieved by their mother. This structure is used in the kinship2 [18] package in the 99 R statistics program and suffices to describe genetic relationships. There are some problems, however, with 100 retrieving family relationships by using the Related Lines. For instance, to find the child of a certain mother, it is 101 necessary to search all of the children in the database in the worst case scenario. Thus, we first made the Related

102
Line bidirectional (Fig. 1C). Each related line can thus be defined by two kinds of edges (ex., Child to Mother, Mother

103
to Child) in a direct acyclic graph, rather than one edge in an undirected graph, following a basic idea of network 104 theory [19][20][21]. With this new bidirectional line, it becomes easier to trace father from mother, by retrieving from 105 mother to child, and from child to father. In this way, relationships between parents are retrieved more quickly (Figs.

106
1B, D). Retrieving all members in the family becomes possible by using this line. However, retrieval might become 107 difficult when there is a member who is not participating in the family. We discuss this problem in the following 108 paragraph.

113
The necessity for registering the person who doesn't have a parent-child relationship appears, while registering a 114 family's member. One example is that of the relationship between two participants who are connected by a 115 nonparticipant, which is not expressible ( Fig. 2A). In this case, the relationship between "Newborn" and "Grand- defining all of the relationships in the cohort. Therefore, the example problem can be solved by defining Extended 135 Related Lines between the mother and grandfather-in-law (green arrow 1), the mother and the grandmother-in-law

136
(green arrow 2), the Newborn and the grandfather-in-law (green arrow 3), and the Newborn and the grandmother-in-137 law (green arrow 4) (see Fig. 2B). The relationships between these individuals can be enrolled and retrieved from 138 the mother (a pregnant woman) to the grandparents, or from the Newborn to the grandparents, through Extended Related Lines (Fig. 2B), even if the father does not participate in the cohort. Table 1 presents the number of Extended   140 Related Lines needed for all patterns of seven family members.

142
This method has operational problems, however, especially during registration. It is necessary to define many

143
Related Lines, and operators should correctly select the proper Related Lines when enrolling each participant.

144
Even in a case such as the one shown in Fig. 2B, which is not very complex, four (2 x 2 (bidirectional)) new 145 different lines are needed to make connections from the mother. This solution might work when the data input 146 operator is able to spend sufficient time or when complex family relationships need not be drawn. In the case, 147 however, in order to connect seven family members with Related Lines, it is necessary to select the correct colored 148 lines from among 21 x 2 (bidirectional) = 42 types of Related Lines (see Table 1). This operation becomes a 149 considerably time-consuming load for the operator. During the first stage of recruitment of the BirThree Cohort, this 150 idea was adopted for our system, and put into operation. However, operation of the system becomes difficult, as 151 the scale of the BirThree Cohort in the TMM became large-scale. Therefore, this idea was not adopted in the 152 present system, although we introduce the idea here for reference.  165 We have developed the BirThree Enrollment system instead of using the system of Extended Related Lines in 166 order to accurately describe complex family roles and avoid the input of incorrect data. It is thought that such a 167 family input system is indispensable to recruit the family members, and this system will become the main current in

179
The other factor is the family information batch entry screen, a new family data enrolling method (Fig. 4). The 180 batch entry screen makes it possible to design a comprehensible data registration interface (see supplemental file).

181
The entry screen not only enables the registration of relationships through nonparticipants, but also facilitates later 182 retrieval and eases registration. A family role table is newly prepared when the pregnant woman is enrolled. At that 183 time, the other six family members are added as provisional participants for whom recruitment is necessary in an 184 empty column (see Section 2.4 Fig. 4). When some of these six non-participants are recruited, and some empty 185 columns are filled, related lines are automatically drawn by the BirThree Enrollment system.

186
As in the example with the nonparticipating father shown in Fig. 2A, even in the situation whereby a pregnant 187 woman and her father-in-law cannot be connected directly, the Related Lines 1, 2, and 3 in Fig. 2A among the 188 family members are automatically drawn by our BirThree Enrollment system, and this father is automatically 189 registered as a provisional member. Therefore, lines are always connected whenever family members are 190 retrieved.
The family information batch entry screen allows one person to be enrolled in two or more role tables at the 192 same time. As a result, one person can have two different roles in two different families. Fig. 3 shows a person 193 enrolled as a mother in the family role table enclosed by the red rectangle and as a grandmother in the family role   194  table with the purple

204
The family role table corresponds to the entry screen. The input system registers data in both the family role 205 table and the Related Lines. Related Lines are automatically formed for unit members (pregnant woman (mother), 206 newborn (child), father, grandmother, grandfather, grandmother-in-law, and grandfather-in-law). Uncles, aunts, and 207 cousins are also other members of the unit; therefore, operators must draw those Related Lines by hand.

208
While retrieving the family structure, the database extracts family information by reading only the table of the 209 corresponding family role. Tracing a related line for the retrieval is not necessary. Therefore, the load on the 210 system is minimized. Moreover, it becomes easy to call a participant through the pregnant woman, because the 211 system displays whether there is a family member who has not yet been recruited during retrieval.

218
In contrast, when a member other than a pregnant woman withdraws consent, only that person's name and information are deleted. For that case, the Related Line containing that person and other members of the family is 220 not deleted, and the family role table remains. For example, when a participating father withdraws his consent, the 221 Related Lines 1, 2, and 3 in Fig. 2A are maintained to avoid re-recruiting people who have been withdrawn.

222
An important point of our system is to process the consent withdrawal in real time. A problem might not occur to 223 process the consent withdrawal after recruitment ends, when the recruitment period is short. When the system 224 continues working for a long time, however, the consent withdrawal process needs to be reflected at once. This is 225 because when the follow-up survey and re-recruiting is performed, withdrawal information must reflect the process 226 up until then.

227
Such a mechanism must be carefully implemented to ensure that it does not contradict the contents written in the 228 informed consent and withdrawal documents. This is because the level of information that should be removed is

234
We designed a function to identify and report discrepancies in the database. This function was designed for 235 important uses and had a significant influence on the quality of the cohort information. We had to report discrepancies 236 immediately when mistakes in the registration data caused severe problems. Here, we explain the check function, 237 which was used for information about family relationships. a method for displaying and browsing a 7-member family tree, and provides the data structure required to achieve 261 it. It is practically effective even when a family's information changes dynamically by withdrawal of the agreement.

262
The remaining problems include handling information on a participant who gave birth two times or more and a 263 father who divorced the pregnant woman. Our system was insufficient to decide which data to store for the person 264 who had participated two times or more. The other problem, the "father who divorced," did not actually occur for 265 pregnant women who participated two or more times with a different husband for each pregnancy. It was unnecessary 266 to connect information about the different fathers.

268
It is difficult to complete a three-generation cohort project with high efficiency, because the process of inputting 269 family information is complicated. To address this problem, computational support is extremely important. In this 270 study, many problems involved with treating family information have been solved with reference to six viewpoints.

271
We expect this study to be useful for the next third-generation cohort study.