Ickstadt, KatjaFermin Ruiz, Yessica Yulieth2019-09-042019-09-042018http://hdl.handle.net/2003/38202http://dx.doi.org/10.17877/DE290R-20181Understanding how proteins bind to each other in a cell is the key in molecular biology to determine how experts can repair anomalies in cells. The major challenge in the prediction of protein-protein interactions is the cell-to-cell heterogeneity within a sample, due to genetic and epigenetic variabilities. Most studies about protein-protein interaction carry out their analysis without awareness of the underlying heterogeneity. This situation can lead to the identification of invalid interactions. As part of the solution to this problem, we proposed in this thesis two aspects of analysis, one for snapshot data, where different samples of ten proteins were taken by toponome imaging and another for the analysis of time correlated data that guarantees a better approximation to the prediction of protein-protein interactions. The latter represents an advance in the analysis of data with high temporal resolution, such as that obtained through the quantification technique known as multicolor live cell imaging. The thesis here presented is divided into two parts: The first part called "Revealing relationships among proteins involved in assembling focal adhesions" consists of the development of a methodology based on frequentist methods, such as machine learning and meta-analysis, for the prediction of protein-protein interaction on six different toponome imaging datasets. This methodology presents an advance in the analysis of highly heterogeneous snapshot data. Our aim here focused on the formulation of a single model capable of identifying the relationship among different samples by summing is common results over them concerning their random variation. This methodology leads to a set of common models over the six datasets hierarchized by their predictive power, where the researcher can choose the model according to its accuracy in the prediction or according to its parsimony. The developing of this part is in Chapters 1-7 รข this part published in Harizanova et al. (2016). The second part is called "Modelling of temporal networks with a nonparametric mixture of dynamic Bayesian networks". The content of this part contemplates the advance of a Bayesian methodology regarding temporal networks that successfully enables to identify subpopulations in heterogeneous cell populations as well as at the same time reconstructing the protein interaction network associated with each subpopulation. This method extends the nonparametric Bayesian networks (NPBNs) (Ickstadt et al., 2011) for the analysis of time-correlated data by using Gaussian dynamic Bayesian Networks (GDBNs). We evaluate our model based on the variation of specific parameters such as the underlying number of subpopulations, network density, intra-subpopulation variability among others. On the other hand, a comparative analysis with existing clustering methods such as NPBNs and hierarchical agglomerative clustering (Hclust), shows that the inclusion of temporal correlations in the classification of multivariate time series is relevant for an improvement in the classification. The classic Hclust method using the dynamic time warping distances (T-Hclust) was found to be similar in precision to our Bayesian method here proposed. On the other hand, a comparative analysis with the GDBNs shows the lack of adjustment of the GDBNs to reconstruct temporal networks in heterogeneous cell populations through a single model, while our method, as well as the joint use of the T-Hclust classifications with the GDBNs (T-Hclust+), show a high adequacy in the prediction of temporal networks in a mixture. The developing of this part is in Chapters 8-16.enMeta-analysisArtificial neural networksRandom forestNonparametric mixture models310Statistical modeling of protein-protein interaction networksTextMetaanalyseNeuronales NetzKlassifikations- und Regressionsbaum