August 27, 2018

Abstract

The number of chemical compounds discovered each day continues to grow at an exponential rate due to constantly refined and optimized experimental technologies. However, the experimental determination of the chemical, physical and biological properties of compounds is often very expensive and time­consuming. As the gap between the existing data and the available information increases, one of the most pressing research issues in drug development is not the production of new compounds, but the production of properties. Computational methods that aim to apply machine learning methods to molecular structures so as to predict its biological properties have been used with some success, many times reducing the time and cost required for the development of new drugs. However most of these methods do not have shown sufficient precision for the prediction of new structural scaffolds and their precision is low. One of the reasons that have been attributed to the lack of efficiency of computational methods is the absence of accurate molecular similarity functions. As it is known that similar molecules should behave similarly, it is obvious that a reliable similarity metric should provide a reliable tool for better pharmacological property estimation.

Differently from bioinformatics where protein sequence similarity is a known and solved problem with several reliable methods being extant, (e.g. BLAST). In chemoinformatics, this is not so. Small molecules are paradoxically more difficult to compare and the structural characteristics of molecules defeat most graph alignment methods. The common solution is the use of fragment based heuristics, which are imperfect and prone to several types of errors. Recently within our lab research team we developed, implemented and tested a new molecular similarity method, NAMS, for Non­contiguous Atom Matching Structural similarity function. This new algorithm was tested in several known databases and consistently outperformed other similarity measures [TeixFalc2013]. Also a new inference methodology based on kriging over metric spaces was developed and proved to be superior to other Quantitative­Structure­Activity Relationship (QSAR) methods [TeixFalc2014]. NAMS can very reliably assess chemical similarity and on global molecular comparison and is able to give very good estimates on molecular activities of close molecules (something which was not possible before). Also, due to the nature of the algorithm, NAMS has the potential to be extended so as to not only predict biological properties but also give pharmacologists and biochemists insight on why several compounds bind to the same targets, despite being structurally different. This extension will consequently also improve the quality property predictions by highlighting the most relevant parts
of the molecules that are the cause for its activity.

Furthermore, a common critique for QSAR studies is that many chemical inference methods are only evaluated “ex post facto” with known databases of molecules and respective properties, and have never been put to the test in an actual drug development program. In the current proposal we aim to use the predictions from the model to retrieve new molecules from available chemical databasesand test them in a laboratory within two distinct pharmacological problems. The first one is the prediction of drugs that are able to improve defective CFTR trafficking to the membrane. CFTR is the protein involved in cystic fibrosis and deriving new candidate drugs is deemed to have significant impact on its treatment. The second problem is the prediction of Blood­Brain barrier penetration. The blood–brain barrier defends our central nervous system from extraneous agents, but makes difficult for many drugs to pass it and is thus a fundamental block in many drug development programs.

For this project we have assembled a trans­disciplinary team that has leading expertise in all the required fields for its completion. The core is a machine learning team with elements from LaSIGE and BioISI, having been the leading developers of NAMS. This team will work on database development, model building, testing and model implementation. Two other teams from iMEd.UL (Faculty of Pharmacy) and a Molecular Biology team (also from BioISI) will supervise the biological and pharmacological issues and all the laboratory work. These teams have world­leading expertise in the blood­brain barrier penetration and on CFTR trafficking.

Due to the nature of the project and the quality of the team, that has consistently delivered high impact research, this project has the potential for high visibility and the results of the research to be published in the best scientific journals. It is further expected that several molecules can be patented for being lead candidates for future medicines.