Abstract

The number of chemical compounds discovered each day continues to grow at an exponential rate due to constantly refined and
optimized experimental technologies. However, the experimental determination of the chemical, physical and biological
properties of compounds is often very expensive and timeconsuming. As the gap between the existing data and the available
information increases, one of the most pressing research issues in drug development is not the production of new compounds,
but the production of properties. Computational methods that aim to apply machine learning methods to molecular structures so
as to predict its biological properties have been used with some success, many times reducing the time and cost required for the
development of new drugs. However most of these methods do not have shown sufficient precision for the prediction of new
structural scaffolds and their precision is low. One of the reasons that have been attributed to the lack of efficiency of
computational methods is the absence of accurate molecular similarity functions. As it is known that similar molecules should
behave similarly, it is obvious that a reliable similarity metric should provide a reliable tool for better pharmacological property
estimation.
Differently from bioinformatics where protein sequence similarity is a known and solved problem with several reliable methods
being extant, (e.g. BLAST). In chemoinformatics, this is not so. Small molecules are paradoxically more difficult to compare and
the structural characteristics of molecules defeat most graph alignment methods. The common solution is the use of fragment
based heuristics, which are imperfect and prone to several types of errors.
Recently within our lab research team we developed, implemented and tested a new molecular similarity method, NAMS, for
Noncontiguous Atom Matching Structural similarity function. This new algorithm was tested in several known databases and
consistently outperformed other similarity measures [TeixFalc2013]. Also a new inference methodology based on kriging over
metric spaces was developed and proved to be superior to other QuantitativeStructureActivity Relationship (QSAR) methods
[TeixFalc2014]. NAMS can very reliably assess chemical similarity and on global molecular comparison and is able to give very
good estimates on molecular activities of close molecules (something which was not possible before). Also, due to the nature of
the algorithm, NAMS has the potential to be extended so as to not only predict biological properties but also give
pharmacologists and biochemists insight on why several compounds bind to the same targets, despite being structurally
different. This extension will consequently also improve the quality property predictions by highlighting the most relevant parts
of the molecules that are the cause for its activity.
Furthermore, a common critique for QSAR studies is that many chemical inference methods are only evaluated “ex post facto”
with known databases of molecules and respective properties, and have never been put to the test in an actual drug
development program. In the current proposal we aim to use the predictions from the model to retrieve new molecules from
available chemical databasesand test them in a laboratory within two distinct pharmacological problems. The first one is the
prediction of drugs that are able to improve defective CFTR trafficking to the membrane. CFTR is the protein involved in cystic
fibrosis and deriving new candidate drugs is deemed to have significant impact on its treatment. The second problem is the
prediction of BloodBrain barrier penetration. The blood–brain barrier defends our central nervous system from extraneous
agents, but makes difficult for many drugs to pass it and is thus a fundamental block in many drug development programs.
For this project we have assembled a transdisciplinary team that has leading expertise in all the required fields for its
completion. The core is a machine learning team with elements from LaSIGE and BioISI, having been the leading developers of
NAMS. This team will work on database development, model building, testing and model implementation. Two other teams from
iMEd.UL (Faculty of Pharmacy) and a Molecular Biology team (also from BioISI) will supervise the biological and pharmacological
issues and all the laboratory work. These teams have worldleading expertise in the bloodbrain barrier penetration and on
CFTR trafficking.
Due to the nature of the project and the quality of the team, that has consistently delivered high impact research, this project
has the potential for high visibility and the results of the research to be published in the best scientific journals. It is further
expected that several molecules can be patented for being lead candidates for future medicines.

Technical Description

Literature Review

Molecules are typical examples of unstructured data for which tasks such as searching, sorting, analyzing and extracting
knowledge are challenging. A molecule can have an arbitrary dimension, structure and composition, and moreover, there is not
an univocal and unequivocal way of coding and comparing these molecules. Several computational tools have been developed
over the years in pursuance of solving this issue. Fundamental observations that justify the amount of methods developed to
compare molecules derive from the fact that similarity has a context [bender2004] and the representation of molecular
structures implies information loss. Researchers have explored the concept of similarity between molecules which provides an
important approach to search databases, predict properties of compounds, design structures with a predefined set of properties and conduct structurebased drug design studies [willett2005, eckert2007]. These studies are based on the “neighborhood” premise, which states that similar molecules usually have similar activities and properties [bender2004]. The definition of
similarity for molecules consists of comparing chemical structures, specifically representing the molecules and quantifying the
similarity between them. Various methods to define structural similarity between molecules are available in the literature
[nikolova2003, bender2004, TeixFalc2013]. The most popular approaches to represent the structure of the molecules under
comparison can be divided in three broad categories, approaches based on structural descriptors (two and threedimensional),
molecular fragments and graph matching (descriptorindependent methods). A descriptor positions each abstract molecular
representation in the descriptor space. It is then possible to compare molecules, considering that the distance of the abstract
molecular representations reflects their similarity in this specific descriptor space [bender2004]. Molecular similarity is a nonlinear
problem for which there is not a set of descriptors or a similarity measure that correlates with every context of
comparisons one can perform [todeschini2009, bender2009].

A commonly used approach to predict chemical, physical and/or biological properties of chemical compounds resorts to the
structure of the molecule using data mining methods through quantitative structureproperty/activity relationships (QSPR/QSAR)
[katritzky2000, katritzky2002, doucet2011]. The three major difficulties in the development of QSPR/QSAR models are (1)
quantifying the inherently abstract molecular structure, (2) determining which structural features most influence the given
property (representation problem) [liu2004, gonzalez2008, teixeira2013] and (3) establishing and validating the functional
relationships that most accurately describe the relationship between structural descriptors and the property/activity data
(mapping problem) [tropsha2007, puzyn2009, dearden2009, tropsha2010]. Furthermore, it is acknowledged that it is not
possible to develop a model providing reliable predictions for all possible compounds [tetko2009]. Classical QSPR/QSAR
approaches have several shortcomings, namely (1) the predictive power of the model is highly dependent on the selection of
predictor variables and on the presence of correlation between these variables, (2) the prediction capacity of the model is
limited by the molecular diversity and distribution of the molecules in the training set [oprea2001], (3) the models need to be
retrained every time new compounds are added or removed. Nevertheless [TeixFalc2014]using a kriging based approach over
NAMS, questioned most of these assumptions and showed that it is possible to produce inference models for which no relearning
is necessary, are able to produce estimation errors individually for each estimation and are able to produce reasonable
estimates ever for molecules widely different. Also the initial conclusions from [Martin200] and Nikolova2003] that compounds
that are similar to known active molecules are themselves far less frequently active than one might expect, have been
challenged by NAMS [TeixFalc2013, TeixFalc2014]

A molecule can also be represented, using graph theory, as a labeled graph whose vertices correspond to the atoms and edges
correspond to the covalent bonds. The representation of molecules using graphs has some advantages, namely, graphs are
intuitive when representing a molecule since they are close to our understanding of a molecule and they have a solid
mathematical background with different existing techniques to compare labeled graphs [ehrlich2011]. However, representing
molecules as graphs raises an important issue, identical graphs do not necessarily represent identical structures and viceversa[ehrlich2011].

The goal of finding common subtructrures for property inference has been pursued[kawabata2011,
rahman2009, batista2006]. Nonetheless and despite the NPcompleteness nature of the problem (garey 1979) many
approximated heuristics have been proposed to overcome this complexity[ehrlich2011]. The approach followed by
[TeixFalc2013] is different in that it makes no assumptions on any structural components of any molecule and is able to consider
the characteristics that are not directly faced by graph theory (namely chirality, or cistrans isomerisms). Having a reliable
structural matching algorithm is only part of the solution as it is necessary to make use the detected similarity for predictions.
This has been accomplished by coupling NAMS with kriging, a metric space based method for inference[TeixFalc2014]. Kriging
models have been used previously in chemoinformatics. [fang2004, hawe2010, sun2011]showed that kriging models were able
to outperform other methods in the development of predictive models of pharmacological properties However, in all of these
studies there was always an explicit use of chemical descriptors arbitrarily chosen according to the nature of the problems.

Plan and Methods

The central goal of this proposal is to improve the molecular similarity algorithm NAMS [TeixFalc2013]. by providing topological
enhancements to the main atommatching algorithm so that the tool is able to both give pharmacologists and molecular
biologists a direct understanding of the relevant components in a family of active compounds; and secondly to use this newly
derived tool as a standalone tool for the screening phases of drugdevelopment programs. NAMS was designed to compare full
molecules, thus in a way tapping the graph isomorphism problem and providing an extremely reliable polynomial solution.
However for a molecule to function as a drug, many times only specific parts of its structure (not necessarily contiguous) are
necessary. Thus it is envisaged that NAMS can be modified by allowing the weighting of specific compositional and structure
elements that are deemed to be essential for understanding a molecule pharmacological properties. In this proposal it is aimed
to extend NAMS allowing this algorithm to differentiate between parts of the molecule allowing for the discovery of the most
relevant parts as well as its differential amplification for molecular property prediction ad drug inference. Differently from other
methods NAMS is not bound by the existence of any type of chemical descriptors and can conceivably be used in any chemical
property prediction problem. The current inference engine [TeixFalc2014] is based on kriging over the global similarity provided
by NAMS, and although providing results on the level of the best stateoftheart QSAR algorithms using virtual no information
other than the molecular structure, we believe that a topological differentiation mechanism will be key to extend NAMS over
even more different molecules that share pharmacological characteristics. The current version of the inference engine requires
no learning, as it is a kriging algorithm, taking advantage of the molecular metric space; it is further expected that the new
version will be also able to directly assess the relevant topological characteristics of the most active molecules known and be able to use this intrinsic knowledge to retrieve from large molecular databases the compounds more likely to have the desired characteristics.

The research team for the current project is an assembly of computer scientists and biochemists, pharmacologists and
molecular biologists with world leading expertise in the fields of cheminformatics, molecular modeling, bloodbrain barrier
permeability and cystic fibrosis.

This work will be divided in 3 major tasks, and each will be critical for the advancement of the project. First, it is necessary to do
a thorough evaluation of the best existing QSAR methodologies and compare the current inference engine based on NAMS and
Kriging [TeixFalc2014]. This comparison is supposed to be as exhaustive as possible, by testing each existing method over a set
of benchmarks created from data collected in publicly available databases. Secondly, NAMS is to be adapted for including
differential topological features in assessing chemical structural similarity and then this adaptation is to be optimized and tested
within a novel framework, centered on Bayesian learning. The purpose is the empirical identification of the molecular topological
characteristics that for each specific benchmark problem. Finally the topological enhanced model is to be put to the test over two
distinct problems, for which there are world leading experts within the team. The first problem is the development of new lead
compounds for enhancing trafficking of F508delCFTR to the plasma membrane. This problem is in the center of current drug
development for cystic fibrosis. The second problem aims to determine whether a new molecule has the potential for crossing
the BloodBrain Barrier. This issue is critical in most drugdevelopment programs for drugs that target the central nervous
system.
Methods

The first task of the project will create a set of benchmarks for fitting a variety of QSAR models from the literature. The
techniques typically used range from usual linear models to sophisticated ensemble and hybrid methodologies [tropsha2010]
Typically on most models there are two essential phases. The first one for finding the best possible descriptors for a given
problem [Tropsha2007] and secondly testing and validating the models from a variety of methods that range from classic neural
networks, support vector machines to hybrid elastic nets and probabilistic graphical models [Koeller2009]. The purpose of this
phase is not to find the best possible model, but instead to assess the capabilities and limitations of current QSAR models, and
to have a global view of the sources of prediction errors and the nature of the problems
This proposal will focus on a strict Bayesian approach for including localized structural similarity in probabilistic models. The
Bayesian emphasis will be critical for realistic results. The majority of molecules do not show pharmacological activity and
missing the prior probabilities is one of the key reasons why many in silico screening studies produce unreliable and nonreproducible
results [MartFalc2012]. For stringent comparison of models and the assessment of the new approach, a through
validation process will be followed, with one independent validation set created for each benchmark set that will not be used in
any phase of the training or cross validation procedures, for a stringent unbiased evaluation of all models.

The central part of this proposal is the second task where the purpose is twofold. A) incorporate the topological differentiation
mechanisms into NAMS and B) define a new inference engine that may be use this information to predict the pharmacological
potential of any molecule The purpose is not to solve the specific benchmarks problems specificities, but rather to identify
algorithmically and statistically how can we identify which structural characteristics are fundamental for each binding problem.
For this objective to be accomplished it is required a rigorous statistical handling within a Bayesian framework for adequately
assessing the relevance of each factor. Structural characteristics of a molecule are not necessarily describable by human terms
and are only topological constructs discovered by using a Markov Chain MonteCarlo approach over NAMS so that the most
relevant parts of the molecule become prominent and can be used through a dynamic weighting layer over the atoms of each
lead molecule. Secondly, the modeling phase will be centered on fitting a Markov network where the similarities to key instances
within the chemical metric space will condition the posterior probability of any unknown instance to have biological activity. Each
network will be derived in a unsupervised way directly from the existing data. Using a datadriven probabilistic graphical model
for each problem will not ensure per se that the Bayesian requirements are met, as it is anticipated that the priors will be
difficult to assess, and their determination will be one of the key challenges in the forthcoming work. This phase will require a
large computational effort that is to be accomplished with the existing hardware and the new servers to be acquired.

The third task will involve testing the new topologically enhanced similarity metric and the respective inference engine over two
current problems in pharmacology. The first one is the retrieval on molecules with the potential to rescue a genetically
malformed protein (F508delCFTR) to the plasma membrane. This issue is known to be the central factor causing cystic fibrosis.
The second problem involves discovering factors that may induce molecular penetration through the BloodBrain barrier (BBB).
This is still an ongoing problem and in silico models have had so far limited success [MartFalc2012]. The results from the new
model will be complemented with a virtual screening similarity approach [Lucas2012]. Efforts will focus on the virtual screening
of large chemical libraries of commercially available compounds libraries (e.g., ZINC, NCI). The NCI database contains around
400,000 drugs, from these, only 250 000 are available for download. The ZINC database as now around 35 million compounds,
however, we will not screen the entire database but only “In –Stock Druglike” subset (~10 million compounds).

After the lead compounds for both problems is defined these will be put to the test in vitro. The rescue of F508del-CFTR traffic in
cells treated with lead compounds will be assessed by the F508delCFTR traffic assay we have established for automated
fluorescence microscopy[Almaça2011,Farinha2013]. On the other hand BBB penetration will be assessed over primary cultures
of human brain microvascular endothelial cells derived from microvessels isolated from temporal tissue removed during
operative treatment of epilepsy. Monolayers of human brain microvascular endothelial cells show characteristically high
transendothelial electric resistance and have proven useful in multiple functional studies for in vitro modeling of the human
blood-brain barrier[Bernas2010]

Tasks

1. Benchmark creation and model testing.

Molecular information databases have grown both in number of compounds available as well as the quantity and quality of
information for each molecule. Repositories like ChemSpider, ChEMBL or ZINC, allow for easy consultation and data retrieval of
molecules with specified biological characteristics, as described and published in the literature.
The first task of this project will involve 4 main subtasks. The first one is the deployment of the computational infrastructure
which will involve the deployment of an hardware platform for computation and data storage of structural information for
molecules; secondly, from the available data required, several well defined benchmark data sets of pharmacological data will be
created. Thirdly, NAMS will be adapted for topological searching, using as a model bases the benchmarks created. Fourthly,
results will be compared with other QSAR state of the art methodologies. This process is to be developed continuously and the
resulting models are to be improved continuously.
Sub task 1.1. Deployment of the data processing infrastructure for chemical data
Along with the computational platform acquisition and deployment within the available computational framework, it is expected
to perform all the tasks of database building. Local copies of the central chemical repositories (ZINC, PubChem and ChEMBL) are
to be implemented in a common database created for easy and unencumbered access to the repositories, as well as centralizing
all the relevant information in one single repository.
Subtask 1.2.Benchmark dataset creation
The creation of reliable datasets for model testing is sufficiently important as to deserve a special distinction. The purpose of this
subtask is to identify in the literature known and reliable test problems with different types of molecules, in the field of drug
discovery. This will involve curating and classifying the problem types. Some benchmarks should be small and with little
variability while others are expected to be large and very diverse, for adequately balance and test all the models. The
conclusion of the benchmarks will represent a fundamental milestone within the project, as only then it will be possible to
evaluate
For all the datasets, all chemical descriptors are to be precomputed with the modeling software. Furthermore, as detailed
above, model appraisal and validation will be stringent, and will be executed. Each benchmark will be tested using standard nfold
cross validation, however a separate set from the main data, never to be tested or used during the model selection phase
(task 1.3) must be created beforehand. This set will be used for assessing the best models after the selection procedure has
been performed
1.3. Implementation of state of the art QSAR methodologies
Two essential modeling issues to be answered are, in the first place, the determination of the optimal subset of descriptors for
each benchmark. And b) the selection of the model framework more adapted for each specific problem. For result comparison, a
dedicated database will be created for analyzing the results as these must be compared globally facing not each model’s
adequacy for solving one specific problem but on its actual strengths and shortcomings for each specific problem set. The
current NAMS implementation using kriging for inference is to be tested as well against all the other models
1.4. Empirical In silico model validation
The best models from the previous subtask for each benchmark will be subject to a final test using the independent validation
sets (IVS) created in subtask 1.2.
It is expected at this phase that a journal paper is produced in a high profile publication detailing the results of this effort which
is a thorough evaluation of a large variety of QSAR models in a set of benchmark problems. The benchmark datasets will be
made public which will provide the scientific community with a set of problems with which to test, evaluate and develop other
models.

2. Model building for molecular pharmacol

There are two different aspects for completing this requirement. Firstly, it is to address the question of identifying the structural
characteristics that make a molecule active or nonactive for a specific biological target. The second aspect is how to use the
discovered characteristics so as to use this information for inference. So the second task of the project is essentially algorithmic and will use the benchmarks defined in task 1.2 for its completion. This is the core of the project and will be critical to its ultimate success. The sub tasks identified reflect these goals and its success depends on the conjoined success of both objectives
Subtask 2.1. Methods for the identification of structural characteristics of pharmacological activity.
Identifying the structural elements will require a significant computational effort and it is estimated may require several
thousands of hours of CPU. It will involve stochastic simulations over the benchmark sets for each individual problem as well as
across benchmark sets, where the devised models will be tested with datasets for which they were not conceived. To accomplish
this task it will be necessary to develop and include an “weighting layer” over the basic atommatching component of NAMS,
which will be further modified to efficiently test different positionalweighting schemas within a stochastic simulation framework
Subtask 2.2. Including topological characteristics for inferring pharmacological activity in the NAMS metric space
Identifying the fundamental structural characteristics within a specific pharmacological problem solves only part of the problem.
It is required to use the information derived as sources of knowledge for inference over each specific problem. Therefore it will
be necessary to develop and test new inferential methods capable of including such knowledge. NAMS has been used as a global
“graphisomorphism” like algorithm to assess global molecular similarity. Kriging has further been used for inference over the
NAMS metric space. It is an open question how the topological differentiating characteristics inferred for each problem type will
impact the kriging inference engine. Therefore the next subtask is the inclusion of the molecular topological knowledge units
(MTKUs) within the inference engine and use it for better modeling. This will involve the development of an inference framework
structured over NAMS that is able to include the diverse MTKUs for inference
Subtask 2.3. Validating the molecular topological knowledge units within NAMS
The final sub task will involve learning and making inference for all the benchmarks data sets and comparing the results to the
results accomplished in task 1.4 using the independent validation sets. Thus new models will be inferred using the developed
framework and the training sets created for each benchmark.
This will be the most critical task in this whole project, for failing to produce better results may imply it will be useless to proceed
to the most onerous phase of task 3. Therefore if the preceding efforts are unable to consistently outperform the other state of
the art methods, the goals of the project should be toned down. On the other hand, if, as is expected, the new methods are able
to produce better results then it makes sense to proceed with confidence for the laboratory work.

This task will produce the major model for the project and it is expected that at least tree journal papers are written and
published.

3. Model validation

When reaching this phase the developed methodology should be able to suggest new lead compounds for drug development
programs, and this will be tested over two well defined problems for which there is in the project team laboratory knowhow to
perform the tests and evaluate the predictions. Namely, molecular bloodbrain barrier penetration prediction and F508delCFTR
rescuing.
Sub Task 3.1 – Data selection, curation and model fitting.
The procedure for data retrieval and selection will require a through curation process by verification of all the relevant data in
the original literature and in patent databases. For each research problems, a separate dataset will be assembled and a model
built which will be the base for a subsequent molecule retrieval over the main chemical databases, where the most promising
structures will be selected. This initial set is deemed to be large and inclusive, so as to not miss any possible lead candidates.
Sub Task 3.2 – Virtual Screening
Results from NAMS will be confronted with a virtual screening (VS) similarity approach [Lucas2012]will be performed on NCI and
ZINC databases to search for compounds with the potential to either cross the Blood/Barrier or rescue F508delCFTR. If there is
any clue about the target a molecular docking will be performed using GOLD 5.2.0. GoldScore scoring function with the number
of GA runs set to 500. Standard default settings mode will be used number of islands = 5, population size = 100, number of
operations = 100 000, a niche size = 2, and a selection pressure = 1.1. Finally, the GOLD poses for ca. 1000 compounds will be
displayed (Pymol/VMD) and visually inspected for the hydrophobic and hydrophilic interactions between the ligands and active
site enzyme residues. Compounds that can predictably be metabolized will be excluded from further refinement. After the
compounds selection they will be purchased and assayed.
Sub Task 3.3 – in vitro testing of CFTR rescuing

The purpose is To assess the ability of lead compounds to rescue F508delCFTR traffic to the plasma membrane (PM) in a CF cell
line using a automated fluorescence microscopy assay established in our lab facilities [Almaça2011].
F508delCFTR traffic will be assessed in a Cystic Fibrosis Bronchial Epithelial (CFBE) cell line developed at or lab. By using
fluorescence microscopy, the total CFTR amount and the amount of PMlocated CFTR can be determined.
To determine CFTR traffic, cells will be cultured following standard procedures and seeded onto microscopygrade 96 well plates
containing the lead compounds at different concentrations including DMSO controls. F508delCFTR expression will be triggered
over time with Doxycycline. Later extracellular llag tags will be labelled and cells imaged on an automated fluorescence
microscope (Leica DMI 6000B). As positive controls, F508delCFTR traffic will be rescued by corrector VX809.

The amount of CFTR at the PM and the CFTR traffic efficiency will be determined for each cell. Compounds significantly enhancing F508delCFTR
traffic will be hits.

Sub Task 3.3 – In vitro testing of Blood Brain Barrier penetration
Generating primary cultures of human brain microvascular endothelial cells derived from microvessels isolated from temporal
tissue removed during operative treatment of epilepsy. The tissue is to be fragmented and size filtered using polyester meshes.
The resulting microvessel fragments are placed onto type I collagencoated flasks to allow HBMVECs to migrate and proliferate.
The overall process takes less than 3 h and does not require specialized equipment or enzymatic processes. Monolayers of
human brain microvascular endothelial cells show characteristically high transendothelial electric resistance and have proven
useful in multiple functional studies for in vitro modeling of the human blood-brain barrier.

After this task several journal papers are to be published and the hit molecules subject to patents.

Project Timeline and Management

Description of the Management Structure

Management plan
This is a complex project with several interdependencies, and therefore it is necessary a monitoring and control structure
capable of dealing with such a diverse team from different backgrounds.
It is envisaged a project meeting at the beginning of the project where it will be communicated to each member a detailed
calendar and tasks. On the end of each year it is further predicted project meetings with all the team members involved. A web
site will be put on that will enable all members within the team to exchange ideas, notify exceptional situations or simply ask for
advice on specific subjects. The website should function in a twofold way; in the first place as a contact platform for the project
members and secondly as a window to the public view where the most important results will be communicated to interested
parties, in the form of technical reports, scientific articles and general links to related work. For each milestone reached in the
project a full report will be elaborated by the PI and discussed in a dedicated meeting with task leaders and everyone involved
in the accomplishment of the milestone. To further ensure the project stays on track, periodical meetings within each subtask
will be scheduled. The respective task leader should be present in all meetings and communicate any exceptional situation to
the PI. Particular care must further be taken with monitoring and advising the non PhD members. The appropriate advisor
should meet at least once a week for short meetings. Progress reports will be due to be included in the project’s website
Supervising board
A supervising committee will be constituted, comprising the PI, M. Amaral and MA Brito. This team will approve reports and
decide on several of the project decisions. This committee will select and approve the scholarships candidates. It is also the duty
of the committee to take action if some of the risks outlined below happen and propose appropriate mitigation measures if some
will happen
Foreseeable risks
There are internal and external risks that may hinder the project objectives ,. The ones that were identified as the most serious
and capable of more seriously impacting the project are:
a) inhability to in a timely manner to define coherently the molecular topological knowledge units within NAMS (Task 2.2) This
will prompt a back up plan resorting to standard NAMS and kriging for screening or the use of an ensemble of the best QSAR
models, coupled with NAMS . Due to the innovative nature of this methodology, ,
b) After tasks 3.1, the selected molecules are not easily obtained (no vendors or price prohibitive). The requested budget may
have to be applied to chemical synthesis of adequate targets. Although synthesis is not envisaged in this project, we may have
to resort to it if we fail to obtain the selected molecules. Due to the eventual increase in costs, less molecules may be tested in
biological assays.

Milestones List

Date	Milestone denomination	Description
15-07-2016	Benchmarks completed	The benchmarks that are to be the basis for testing the models should be completed early within the project. These are critical to the success of task 2, as the benchmarks will not only be used to assess current models performance, but the new topologically enhanced NAMS
29-12-2017	NAMS with molecular topological knowledge units is validated	This is the most important event in the project, as in reaching this milestone (at the end of Task 2) The new model was successfully implemented and validated with the selected benchmarks. By completing this objective , the new model can be tested in real world drugdevelopment scenarios (Task 3)
28-12-2018	Lead compounds selected	This is the last validation of the models devised, as finding and testing new lead molecules for future testing is the final proof that the developed methodology is able to identify the relevant characteristics that may identify any molecue as a drug for any specific objective

MIMED

Scientific Component