Friday, 24 April 2009
Wednesday, 15 April 2009
The following is a brief overview of the peptide content of StARlite (release 31). In total, 41,128 compounds contain the simplest possible dipeptide substructure (di-glycine), this corresponds to about 9% of StARlite; so as a first approximation it is possible to say that 9% of StARlite is peptidic in nature (this also happens to be the largest single non-trivial structural class in StARlite). A table was then built of all distinct peptide units of a given length (up to 10 amino-acids in this case). The data is as follows....
|peptide length||# length or longer||# exact length|
Considering all possible natural amino-acid dipeptides gives 400 distinct dipeptides (20^2), this compares to the 16,512 dipeptides found in StARlite, implying a very diverse and expanded set of amino-acids. It would be pretty interesting to find out what fraction of the 400 possible natural dipeptides are actually sampled. Of course, much of the variation of the dipeptides will come from groups attached N- and C-terminal to the dipeptide, but even so, the sampled variation of sidechains is pretty good. There are 8,000 distinct tri-peptides (20^3) constructed from the 20 natural amino-acids, it is clear that, even assuming the tripeptides are all simple and unelaborated) that there is poor coverage of tripeptides (6,079 vs 8,000) - chemical diversity scales very poorly! It is also pretty clear that there is a pseudo-power law distribution to the observed peptide length distribution (see below).
Here is a graph, I know it is bad practice not having labelled axes, but the x-axis is the peptide length, and the y-axis is the frequency of that class. Green is the class of that length or more, and yellow is of that exact length class.
I have also pulled back some ligand efficiency data for this peptide set, at first glance, it looks very interesting..... More later.
Monday, 13 April 2009
Anyway, a book, if the preface is not a call to arms, I don't know what is, this really is an essential book for every scientist (who lives in or visits the British Isles). It is also a book that when I reach for it from the shelves, the kids run to tidy up, or state they have homework. regardless, I will quote from the first couple of sentences from the preface....
The Population of the British Isles is less than 0.2% that of the entire earth (sic); yet this tiny fraction of human society is responsible for an enormous number of cultural advances in both the arts and sciences. Public appreciation for the men and women of Britain and Ireland who wrote, painted, composed music, etc. is evident wherever one looks, but the recognition of explorers of nature are harder to find.'
For example, did you know that the Occam of Occam's Razor, is derived from William of Ockham in Surrey! Cool!
%T A Travel Guide To Scientific Sites Of The British Isles %A Charles Tanford %A Jacqueline Reynolds %D 1995 %I John Wiley & Sons %O ISBN 0-471-95070-2
Monday, 6 April 2009
The image is of the Starlite rooms cocktail bar in Tujunga Village, Los Angeles. I have not visited there (yet), but given Tujunga's Utopian Socialist roots, it seems a mighty fine bar to have a drink in.
Thursday, 2 April 2009
%T The Automation of Science %A Ross D. King %A Jem Rowland %A Stephen G. Oliver %A Michael Young %A Wayne Aubrey %A Emma Byrne %A Maria Liakata %A Magdalena Markham %A Pinar Pir %A Larisa N. Soldatova %A Andrew Sparkes %A Kenneth E. Whelan %A Amanda Clare %J Science %D 2009 %V 324 %P 85-89
Wednesday, 1 April 2009
Here is some (truly appalling, almost prose it has been noted) pseudocode, in which one wants to find possible replacements for a particular Functional Group (for example, a nitro, a vinyl halide, a sulphonamide, etc.)
1. Search StARlite for the all examples of the Functional Group 2. Identify all fragments that these Functional Groups are attached to (call these 'Contexts') 3. Search StARlite for all Contexts, then identify the corresponding Replacement Functional Groups 4. Build a table of Replacement Functional Groups and the count the frequency of each type of interchange (this frequency list is pretty useful in its own right) 5. Retrieve quantitative values of binding energy difference (using endpoints such as IC50, Ki, Kd, etc., constraining the comparison to the same assay_ids from the same doc_ids 6. Use these binding energy differences to compute an expectation value for the binding energy difference between the Functional Group and the Replacement Functional Group
So a good bioisostere would preserve (or improve) binding energy, these are then pretty easy to identify from the tables generated above. Of course, with the multiple end points stored in StARlite, and the generality of the approach, the same basic workflow can be used to identify functional group replacements that can improve half-life, solubility, logD, etc., etc.
Here is an old slide of a real case, the replacement of a carboxylic acid with other functional groups. Hopefully, with the background above, the figure is self explanatory....
The picture used in the header of the post is from the excellent and very amusing B'eau Bo D'Or blog, and I think perfectly illustrates bioisosterism - albeit in a context that is completely opaque to anyone not steeped in the 70's and 80's popular culture of the United Kingdom.