The Ellman group has pioneered many important methods for combinatorial synthesis. These include the solid-phase synthesis of 1,4-benzodiazepines, which was the first reported example of small molecule library synthesis, and the development of new conceptual strategies for attaching compounds to solid supports such as traceless and diversification linker strategies for a range of chemical biology applications, including library synthesis. The linkers that we have developed, which include sulfonamide safety-catch, tetrahydropyranyl and “traceless” silicon linkers are currently marketed by numerous resin and chemical supply companies and are extensively used in academics and industry.
The initial sequencing of the human genome suggests the presence of 30,000-40,000 genes. Tens of thousands more genes have been identified from the genomic analysis of other organisms. The number of proteins encoded by these genes is much greater, when splice variants and post-translational modifications are taken into consideration. For years to come establishing the function of these proteins will be one of the most important goals of research in the biological sciences. While this is a daunting task, the goal becomes more realistic if we consider that a majority of proteins may be categorized into a much smaller collection of protein families based upon sequence homology, and consequently, structure and mechanism. Chemical methods designed to exploit common structural or mechanistic features of protein families can play a critical role in establishing protein function because a well-designed library synthesis method can potentially be applied to any member of a protein family.
To demonstrate this approach we have focused on the proteases, which play a critical role in regulating a majority of biological processes, including cell differentiation, blood coagulation, the life cycles of bacterial, parasitic and viral pathogens, and apoptosis (programmed cell death). Proteases have also served as critically important drug targets for the treatment of AIDS, cardiovascular disease and diabetes. As summarized below we have developed two general types of combinatorial methods to systematically establish protease function.
In collaboration with Charly Craik's group at UCSF we have developed positional scanning libraries of fluorogenic substrates (Figure 1) to rapidly determine the N-terminal substrate specificity of proteases, i.e., the combination of amino acid side chains a protease cleaves at. This information greatly facilitates the identification of the physiological substrates of a protease, providing tremendous insight into the biological function of that protease. The determination of substrate specificity of a series of closely related proteases also provides extremely useful information for the development of selective protease inhibitors as therapeutic agents. The methods that we have developed have been used to establish the substrate specificities of over 150 proteases. Notable examples include providing an understanding of the mechanisms by which the AIDs virus maintains virulence upon becoming resistant to HIV protease inhibitors, and providing characterization of the -tryptases, which are implicated in autoinflammatory diseases and cancer. The importance to drug discovery is most clearly indicated by the use of our libraries by Dr. Nancy Thornberry and coworkers at Merck to define the differences in substrate specificity of the closely related dipeptidyl peptidases. This specificity information proved to be critical for the rapid design of a potent and selective inhibitor of dipeptidyl peptidase IV that has entered clinical trials for the treatment of metabolic disorders, including diabetes.

Figure 1. Fluorogenic substrates
Recently, we have begun to replace positional scanning libraries with a fluorogenic substrate microarray format (Figure 2). This format is advantageous because it provides information on discrete substrates as opposed to substrate mixtures but still maintains the efficiency of the positional scanning method. We will also explore the potential of this microarray-based method for diagnostic applications. Additionally, we have developed fluorogenic substrates as chemical tools to monitor protease activity in cells. For example, in one application we are using fluorogenic substrates to study neutrophil-mediated cell killing.

Figure 1. Characterization of the substrate specificity of (A) thrombin and (B) Granzyme B
We have also established systematic methods to identify inhibitors to any member of a protease family by preparing mechanism-based combinatorial libraries. For example, using a common library synthesis method we have identified potent small molecule inhibitors of multiple members of the aspartyl protease family, including cathepsin D, which is implicated in neurodegenerative disease, beta-sectretase, which is an important target for the treatment of Alzheimer's disease, and the plasmepsins, which are essential proteases of the malaria parasite. The identified inhibitors have served as useful chemical tools to study the respective proteases. For example, in collaboration with Professor Gary Lynch of UC Irvine, we have established that cathepsin D is likely to be responsible for a proteolytic process leading to the formation of neurofibrillary tangles in Alzheimer's disease and Neuman-Pick's disease, suggesting novel therapeutic strategies for these neurodegenerative diseases.
Recently we have developed a powerful new method for inhibitor discovery called Substrate Activity Screening (SAS), which is the first substrate-based method for fragment discovery and optimization. We have broadly applied the SAS approach to develop inhibitors of proteases, which as described previously represent hugely important therapeutic targets. For example, we have identified novel, druglike and completely selective inhibitors of cathepsin S (Figure 3), which is implicated in autoimmune disorders. Structures between of these novel inhibitors in complex with cathepsin S show unprecedented binding modes (see Main Page graphic for representative structure). In unpublished work, we have also identified potent inhibitors of cruzain that in cell culture completely eradicate the parasite responsible for Chagas’ disease. These inhibitors hold considerable promise because in South America Chagas' disease is a leading cause of inflammation, heart disease and death from these maladies, and viable treatments for this disease are currently not available. We have also identified potent inhibitors of the caspases, key mediators of apoptosis. In very recent efforts, we have expanded the application of the SAS method to the phosphatases, which like proteases, play essential roles in the regulation of many biological processes. We have in particular directed our efforts at the development of inhibitors of phosphatases encoded by neglected disease pathogens and have already developed a potent and selective inhibitor of the tuberculosis phosphatase PtpB (Figure 3).

Figure 3. Potent inhibitors of therapeutically relevant proteases identified by the SAS method