The last years have seen an immense increase in high-throughput and high-resolution technologies for experimental observation as well as
high-performance techniques to simulate molecular systems at a microscopic level, resulting in vast and ever-increasing amounts of high-dimensional data. However, experiments provide only a partial view of macromolecular processes
and are limited in their temporal and spatial resolution. On the other hand, atomistic simulations are still not able to sample the conformation space of large complexes, thus leaving significant gaps in our ability to study molecular processes at a biologically relevant scale. We present our efforts to bridge these gaps, by exploiting the available data and using state-of-the-art machine-learning methods to design optimal coarse models for complex macromolecular systems. We show that it is possible to define simplified molecular models to reproduce the essential information contained both in microscopic simulation and experimental measurements.