Abstract
Biological networks often have complex structure consisting of meaningful clusters of nodes that are integral to understanding biological function. Community detection algorithms to identify the clustering, or community structure, of a network have been well established. These algorithms assume that data used in network construction is observed without error. However, oftentimes intermediary analyses such as regression are performed before constructing biological networks and the associated error is not propagated in community detection. In expression quantitative trait loci (eQTL) networks, one must first map eQTLs via linear regression in order to specify the matrix representation of the network. We study the effects of using estimates from regression models when applying the spectral clustering approach to community detection. We demonstrate the impacts on the affinity matrix and consider adjusted estimates of the affinity matrix for use in spectral clustering. We further provide a recommendation for selection of the tuning parameter in spectral clustering. We evaluate the proposed adjusted method for performing spectral clustering to detect gene clusters in eQTL data from the GTEx project and to assess the stability of communities in biological data.