Biology:Scoring functions for docking

From HandWiki

In the fields of computational chemistry and molecular modelling, scoring functions are mathematical functions used to approximately predict the binding affinity between two molecules after they have been docked. Most commonly one of the molecules is a small organic compound such as a drug and the second is the drug's biological target such as a protein receptor.[1] Scoring functions have also been developed to predict the strength of intermolecular interactions between two proteins[2] or between protein and DNA.[3]

Utility

Scoring functions are widely used in drug discovery and other molecular modelling applications. These include:[4]

  • Virtual screening of small molecule databases of candidate ligands to identify novel small molecules that bind to a protein target of interest and therefore are useful starting points for drug discovery[5]
  • De novo design (design "from scratch") of novel small molecules that bind to a protein target[6]
  • Lead optimization of screening hits to optimize their affinity and selectivity[7]

A potentially more reliable but much more computationally demanding alternative to scoring functions are free energy perturbation calculations.[8]

Prerequisites

Scoring functions are normally parameterized (or trained) against a data set consisting of experimentally determined binding affinities between molecular species similar to the species that one wishes to predict.

For currently used methods aiming to predict affinities of ligands for proteins the following must first be known or predicted:

  • Protein tertiary structure – arrangement of the protein atoms in three-dimensional space. Protein structures may be determined by experimental techniques such as X-ray crystallography or solution phase NMR methods or predicted by homology modelling.
  • Ligand active conformation – three-dimensional shape of the ligand when bound to the protein
  • Binding-mode – orientation of the two binding partners relative to each other in the complex

The above information yields the three-dimensional structure of the complex. Based on this structure, the scoring function can then estimate the strength of the association between the two molecules in the complex using one of the methods outlined below. Finally the scoring function itself may be used to help predict both the binding mode and the active conformation of the small molecule in the complex, or alternatively a simpler and computationally faster function may be utilized within the docking run.

Classes

There are four general classes of scoring functions:[9][10][11]

  • Force field – affinities are estimated by summing the strength of intermolecular van der Waals and electrostatic interactions between all atoms of the two molecules in the complex using a force field. The intramolecular energies (also referred to as strain energy) of the two binding partners are also frequently included. Finally since the binding normally takes place in the presence of water, the desolvation energies of the ligand and of the protein are sometimes taken into account using implicit solvation methods such as GBSA or PBSA.[12]
  • Empirical – based on counting the number of various types of interactions between the two binding partners.[6] Counting may be based on the number of ligand and receptor atoms in contact with each other or by calculating the change in solvent accessible surface area (ΔSASA) in the complex compared to the uncomplexed ligand and protein. The coefficients of the scoring function are usually fit using multiple linear regression methods. These interactions terms of the function may include for example:
    • hydrophobic — hydrophobic contacts (favorable),
    • hydrophobic — hydrophilic contacts (unfavorable) (Accounts for unmet hydrogen bonds, which are an important enthalpic contribution to binding.[13] One lost hydrogen bond can account for 1–2 orders of magnitude in binding affinity.[14]),
    • number of hydrogen bonds (favorable contribution to affinity, especially if shielded from solvent, if solvent exposed no contribution),
    • number of rotatable bonds immobilized in complex formation (unfavorable conformational entropy contribution).
  • Knowledge-based – based on statistical observations of intermolecular close contacts in large 3D databases (such as the Cambridge Structural Database or Protein Data Bank) which are used to derive statistical "potentials of mean force". This method is founded on the assumption that close intermolecular interactions between certain types of atoms or functional groups that occur more frequently than one would expect by a random distribution are likely to be energetically favorable and therefore contribute favorably to binding affinity.[15]
  • Machine-learning – Unlike these classical scoring functions, machine-learning scoring functions are characterized by not assuming a predetermined functional form for the relationship between binding affinity and the structural features describing the protein-ligand complex.[16] In this way, the functional form is inferred directly from the data. Machine-learning scoring functions have consistently been found to outperform classical scoring functions at binding affinity prediction of diverse protein-ligand complexes.[17][18] This has also been the case for target-specific complexes,[19][20] although the advantage is target-dependent and mainly depends on the volume of relevant data available.[11][21] When appropriate care is taken, machine-learning scoring functions tend to strongly outperform classical scoring functions at the related problem of structure-based virtual screening.[22][23][24][25][26][27][28][29] Furthermore, if data specific for the target is available, this performance gap widens[30] These reviews provide a broader overview on machine-learning scoring functions for structure-based drug design.[11][31][32][33] The choice of decoys for a given target is one of the most important factors for training and testing any scoring function.[34]

The first three types, force-field, empirical and knowledge-based, are commonly referred to as classical scoring functions and are characterized by assuming their contributions to binding are linearly combined. Due to this constraint, classical scoring functions are unable to take advantage of large amounts of training data.[35]

Refinement

Since different scoring functions are relatively co-linear, consensus scoring functions may not improve accuracy significantly.[36] This claim went somewhat against the prevailing view in the field, since previous studies had suggested that consensus scoring was beneficial.[37]

A perfect scoring function would be able to predict the binding free energy between the ligand and its target. But in reality both the computational methods and the computational resources put restraints to this goal. So most often methods are selected that minimize the number of false positive and false negative ligands. In cases where an experimental training set of data of binding constants and structures are available a simple method has been developed to refine the scoring function used in molecular docking.[38]

References

  1. "Scoring functions for protein-ligand docking". Current Protein & Peptide Science 7 (5): 407–20. October 2006. doi:10.2174/138920306778559395. PMID 17073693. 
  2. "Docking and scoring protein complexes: CAPRI 3rd Edition". Proteins 69 (4): 704–18. December 2007. doi:10.1002/prot.21804. PMID 17918726. 
  3. "An all-atom, distance-dependent scoring function for the prediction of protein-DNA interactions from structure". Proteins 66 (2): 359–74. February 2007. doi:10.1002/prot.21162. PMID 17078093. 
  4. "Ranking poses in structure-based lead discovery and optimization: current trends in scoring function development". Current Opinion in Drug Discovery & Development 10 (3): 308–15. May 2007. PMID 17554857. 
  5. "Virtual high-throughput screening of molecular databases". Current Opinion in Drug Discovery & Development 10 (3): 298–307. May 2007. PMID 17554856. 
  6. 6.0 6.1 "Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3D database search programs". Journal of Computer-Aided Molecular Design 12 (4): 309–23. July 1998. doi:10.1023/A:1007999920146. PMID 9777490. Bibcode1998JCAMD..12..309B. 
  7. "Lead optimization via high-throughput molecular docking". Current Opinion in Drug Discovery & Development 10 (3): 264–74. May 2007. PMID 17554852. 
  8. "Towards predictive ligand design with free-energy based computational methods?". Current Medicinal Chemistry 13 (29): 3583–608. 2006. doi:10.2174/092986706779026165. PMID 17168725. 
  9. Fenu, Luca A.; Lewis, Richard A.; Good, Andrew C.; Bodkin, Michael; Essex, Jonathan W. (2007). "Chapter 9: Scoring Functions: From Free-energies of Binding to Enrichment in Virtual Screening". Structure-Based Drug Discovery. Dordrecht: Springer. pp. 223–246. ISBN 978-1-4020-4407-6. https://books.google.com/books?id=8ywRn7vSGVAC&q=fast+approximate+scoring+function+docking&pg=PA226. 
  10. Sotriffer, Christoph; Matter, Hans (2011). "Chapter 7.3: Classes of Scoring Functions". Virtual Screening: Principles, Challenges, and Practical Guidelines. 48. John Wiley & Sons, Inc.. ISBN 978-3-527-63334-0. https://books.google.com/books?id=bRcHVwCiJcoC&q=scoring+function+force+field+empirical+knowledge-based&pg=PT203. 
  11. 11.0 11.1 11.2 "Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening". Wiley Interdisciplinary Reviews: Computational Molecular Science 5 (6): 405–424. 2015-11-01. doi:10.1002/wcms.1225. PMID 27110292. 
  12. "The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities". Expert Opinion on Drug Discovery 10 (5): 449–61. May 2015. doi:10.1517/17460441.2015.1032936. PMID 25835573. 
  13. "A consistent description of HYdrogen bond and DEhydration energies in protein-ligand complexes: methods behind the HYDE scoring function". Journal of Computer-Aided Molecular Design 27 (1): 15–29. January 2013. doi:10.1007/s10822-012-9626-2. PMID 23269578. Bibcode2013JCAMD..27...15S. 
  14. "Requirements for specific binding of low affinity inhibitor fragments to the SH2 domain of (pp60)Src are identical to those for high affinity binding of full length inhibitors". Journal of Medicinal Chemistry 46 (24): 5184–95. November 2003. doi:10.1021/jm020970s. PMID 14613321. 
  15. "PMF scoring revisited". Journal of Medicinal Chemistry 49 (20): 5895–902. October 2006. doi:10.1021/jm050038s. PMID 17004705. 
  16. "A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking". Bioinformatics 26 (9): 1169–75. May 2010. doi:10.1093/bioinformatics/btq112. PMID 20236947. 
  17. "Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets". Molecular Informatics 34 (2–3): 115–26. February 2015. doi:10.1002/minf.201400132. PMID 27490034. 
  18. "A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction". IEEE/ACM Transactions on Computational Biology and Bioinformatics 12 (2): 335–47. 2015-04-01. doi:10.1109/TCBB.2014.2351824. PMID 26357221. 
  19. "Integrating docking scores, interaction profiles and molecular descriptors to improve the accuracy of molecular docking: toward the discovery of novel Akt1 inhibitors". European Journal of Medicinal Chemistry 75: 11–20. March 2014. doi:10.1016/j.ejmech.2014.01.019. PMID 24508830. 
  20. "A machine learning-based method to improve docking scoring functions and its application to drug repurposing". Journal of Chemical Information and Modeling 51 (2): 408–19. February 2011. doi:10.1021/ci100369f. PMID 21291174. 
  21. "Machine-Learning Scoring Functions for Structure-Based Drug Lead Optimization". Wiley Interdisciplinary Reviews: Computational Molecular Science 10 (5). 2020-02-05. doi:10.1002/wcms.1465. 
  22. "Support vector regression scoring of receptor-ligand complexes for rank-ordering and virtual screening of chemical libraries". Journal of Chemical Information and Modeling 51 (9): 2132–8. September 2011. doi:10.1021/ci200078f. PMID 21728360. 
  23. "Comparing neural-network scoring functions and the state of the art: applications to common library screening". Journal of Chemical Information and Modeling 53 (7): 1726–35. July 2013. doi:10.1021/ci400042y. PMID 23734946. 
  24. "Characterization of small molecule binding. I. Accurate identification of strong inhibitors in virtual screening". Journal of Chemical Information and Modeling 53 (1): 114–22. January 2013. doi:10.1021/ci300508m. PMID 23259763. 
  25. "Performance of machine-learning scoring functions in structure-based virtual screening". Scientific Reports 7: 46710. April 2017. doi:10.1038/srep46710. PMID 28440302. Bibcode2017NatSR...746710W. 
  26. "Protein-Ligand Scoring with Convolutional Neural Networks". Journal of Chemical Information and Modeling 57 (4): 942–957. April 2017. doi:10.1021/acs.jcim.6b00740. PMID 28368587. 
  27. "The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction". Biomolecules 8 (1): 12. March 2018. doi:10.3390/biom8010012. PMID 29538331. 
  28. "Generating Property-Matched Decoy Molecules Using Deep Learning". Bioinformatics 37 (btab080): 2134–2141. February 2021. doi:10.1093/bioinformatics/btab080. PMID 33532838. 
  29. "Machine learning classification can reduce false positives in structure-based virtual screening". Proceedings of the National Academy of Sciences of the United States of America 117 (31): 18477–18488. August 2020. doi:10.1073/pnas.2000585117. PMID 32669436. Bibcode2020PNAS..11718477A. 
  30. "Improving structure-based virtual screening performance via learning from scoring function components". Briefings in Bioinformatics 22 (bbaa094). June 2020. doi:10.1093/bib/bbaa094. PMID 32496540. 
  31. "From Machine Learning to Deep Learning: Advances in Scoring Functions for Protein–ligand Docking". Wiley Interdisciplinary Reviews: Computational Molecular Science 10. 2019-06-27. doi:10.1002/wcms.1429. 
  32. "Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery". Chemical Reviews 119 (18): 10520–10594. 2019-07-11. doi:10.1021/acs.chemrev.8b00728. PMID 31294972. 
  33. "Machine-Learning Scoring Functions for Structure-Based Virtual Screening". Wiley Interdisciplinary Reviews: Computational Molecular Science 11. 2020-04-22. doi:10.1002/wcms.1478. 
  34. "Selecting machine-learning scoring functions for structure-based virtual screening". Drug Discovery Today: Technologies 32-33: 81–87. December 2019. doi:10.1016/j.ddtec.2020.09.001. PMID 33386098. https://figshare.com/articles/preprint/Selecting_Machine-Learning_Scoring_Functions_for_Structure-Based_Virtual_Screening/12967160. 
  35. "Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data". Bioinformatics (Oxford, England) 35 (20): 3989–3995. March 2019. doi:10.1093/bioinformatics/btz183. PMID 30873528. 
  36. "Docking ligands into flexible and solvated macromolecules. 4. Are popular scoring functions accurate for this class of proteins?". Journal of Chemical Information and Modeling 49 (6): 1568–80. June 2009. doi:10.1021/ci8004308. PMID 19445499. 
  37. "Comparison of consensus scoring strategies for evaluating computational models of protein-ligand complexes". Journal of Chemical Information and Modeling 46 (1): 380–91. 2006. doi:10.1021/ci050283k. PMID 16426072. 
  38. "Enrichment of ligands with molecular dockings and subsequent characterization for human alcohol dehydrogenase 3". Cellular and Molecular Life Sciences 67 (17): 3005–15. September 2010. doi:10.1007/s00018-010-0370-2. PMID 20405162.