Data Science in R&D
Big data is not an exclusive phenomena of social media, internet services and finantial companies. In science, there are numerous examples where (big) data analysis pioneered and gave a platform to the developement of Data Science as the mature tool we know today. Astronomy, genetics and drug research are some of the areas where data cleaning, statistical analysis and predictive modelling were used since many years to make sense of big volumes of data. During my work at UvA and TNO I was lucky to get my hands on very different kinds of datasets (some small, some definitevely big) and work for very different kinds of stakeholders with diverse needs.
Here resides the beauty of data science (and ultimately of statistics and mathematics): approaches and methodologies are transversal (and thus transfereable) to the problem and often to the data. This allow data scientists to have an open mind when it comes to choose which approaches to apply to solve a particular problem: there is often more than one solution and sometimes the best models is a combination of them (model stacking).
Predictive modeling and rational design
The increasing demands for environmentally-friendly products and the current regulatory framework in Europe, challenge companies for the search, substitution and optimization of conventional commercial chemical formulations. In modern industrial chemistry, discovery of new molecules and products often involves rational design, prediction of properties of industrial interest and high-throughput experimentation and testing.
During my years as researcher at Solvay we implemented methodologies from predictive modeling and chemometrics to help this process by accelerating and automating the search of new products and formulations and by helping understand the properties and synergies of traditional and novel chemicals.
The MolDia Software
During my PhD work, I designed and implementated a virtual screening tool called MolDiA (Molecular Diversity Analysis) which assists drug design and research of novel molecules within an XML framework. The structure-based approach uses customizable weights on molecular descriptors to compute similarity and diversity measures of given datasets. Applications of this approach include the development of QSAR models, fast identification of potential lead compounds and optimal library design.