Webapps: Making a (Zero Footprint) Mark on Applied Research

June 25th, 2009 by Max Petersen, Ph.D.

It is now a couple of weeks ago since Accelrys offered its customers and prospects to try out a web-enabled version of Synthia, a popular tool for quick estimations of polymer properties. For me, the intellectual fallout of this exercise was a close view on what companies are looking for when thinking about using web applications for their day-to-day research needs. Here are two points I wanted to discuss in this blog:

  • Hosted services vs. IP protection
  • Zero footprint molecular editors

Hosted services vs. IP protection: As expected for a tool that allows users to test viability of new materials prior to experimental synthesis, our web tool allowed users to sketch their own molecular structures, save them, and run predictions based on those structures. This almost immediately prompted customer responses that an Accelrys hosted trial of this functionality was not an option. Instead, these customers preferred running a trial version within the comfort of their own intranet.

On the other hand, I had the chance to see CCDC’s new web based version of the CSD WebCSD, at this spring’s ACS meeting. Here, hosted services give a great advantage over bi-annual or annual distributions of their popular crystal structure database.  Updates to the hosted DB become immediately available to users and IP issues are non-existent, as it is virtually impossible to relate a CSD query to a specific compound in development. Unless of course you use the CSD framework as a repository for your proprietary structures. In this case, while hosting an in-house database install, still all the other benefits of a web application can be enjoyed

Zero footprint molecular editors: Web enabled research tools that cater to chemists and materials scientists will seldom avoid displaying an atomistic representation of a structure at some point. Generally, a scientist will also have to edit a structure or create one from scratch.

Our approach was to allow users pick a molecular editor of their choice that can be invoked from within the webapp. This has the benefit that users can use a tool they are comfortable with and the disadvantage that the web app is “not so zero footprint” any more.

Structure editor for polymer properties web application

Figure 1: We allowed users to choose between a selection of popular molecular editors. These are of course not thin clients, but traditional thick clients that need to be installed on individual machines.

Looking outside the Accelrys box, the promising candidates to become standards in web-based molecular viewers/editors are Jmol, OpenAstexViewer, and JME. While Jmol follows the classic OpenSource community development model, the AstexViewer and JME have their roots directly in industry research. They have impressive deployments (JME states 10.000 users in over 150 companies) and are used as 3rd party tools by commercial institutions, including the above mentioned CCDC and Accelrys.

The fact that the industry invests, deploys, and shares these tools with peers is the most impressive testimony that scientific web applications are starting to make a mark on today’s R&D environments, even if it’s a zero footprint one.

  • Share/Save/Bookmark

Look Before You Learn

June 23rd, 2009 by Dana Honeycutt, Ph.D.

Part of my job is creating and maintaining learner components for building statistical models in Accelrys’s Pipeline Pilot product. A statistical model is an empirically derived equation or set of rules for predicting some unknown property (say the toxicity of a chemical compound) from a set of known properties (say descriptors derived from the compound’s structure).

A statistical model–as contrasted to a mechanistic model–is built from a specific set of data, called the training set, using a specific learning algorithm (such as linear least-squares, recursive partitioning, etc.). The quality of the model is crucially dependent on the quality of the training data.

Pipeline Pilot makes it really easy to build statistical models from your data. All it takes is dropping in a data reader component, choosing an appropriate learner component, and specifying the variables you wish to use. Because
of this ease, you may be tempted to build models from a data set before taking a look at the data.

Don’t do it!

Here’s why: more often than you might think, data sets are dirty. Some values are missing or invalid. What you thought was a scalar property appears as an array in the data. A few extreme outliers are present which (depending on the learner) may seriously skew the results. Extra commas in your CSV file have shifted some values to the wrong columns. You’re trying to build a classification model, but all data records have been assigned the same class. You thought that your data set contained only small organic molecules, but somehow a few organometallics got in there. Unbeknownst to you, the creator of the data set used 99 as a missing value tag. And so on.

Pairs Plot of Contaminated Data Set

Pairs Plot of a Contaminated Data Set

I am sometimes called upon to diagnose problems that customers or colleagues have when trying to build a model. Often the root of the problem is that something is wrong with the input data. In many such cases, just looking at the data in a table makes the problem obvious. Other times, simple analysis (such as univariate analysis) or plots (such as pairs plots) show what’s wrong.

The more worrisome cases are the ones we may never hear about. Not all problems with a training data set will make a learner fail or produce obviously incorrect results. So even if you have gone ahead and successfully built a model before looking at the data, you should still look at the data afterward.

Whether you build models in Pipeline Pilot, R, Weka, or some other program, remember to Look before you Learn.

  • Share/Save/Bookmark

Yesterday & Tomorrow: Two Tremors in the Pharma Industry

June 2nd, 2009 by Accelrys Team

There are two tremors in biotech and pharma today, June 2, 2009, that deserve some special attention.

Either one could easily develop into a full earthquake, having the potential to shake the very foundations of the industry.

billgatesThe first event is an aftershock. Yesterday, Microsoft announced that it will acquire Merck’s Rosetta. The size of the deal wasn’t made public, but several industry watchers speculate that the price tag was far less than the $630M that Merck spent to acquire Rosetta in 2001. For much of the past decade, industry observers have been scratching their collective heads over the Merck/Rosetta alliance. With Microsoft’s official unveiling of its Amalga platform at Bio-IT World Expo last month, the Rosetta misfit has become a perfect fit, providing some of the research capabilities that the Amalga effort lacked. With last week’s announcement of Celera closing its few remaining offices in Rockville, the epicenter of life science informatics innovation has now clearly moved from Washington DC to Washington state.

icahnThe second event is a pre-shock. Tomorrow, Carl Icahn makes another bid for seats on the Board of Directors at Biogen IDEC. Previously, Icahn, the New York billionaire activist investor, has made his presence felt on behalf of shareholders at a number of biotech companies. But what is really significant now is that Icahn has suggested the breakup of the Biogen & IDEC merger from 2003. If events actually proceed as speculated, we might have witnessed the high tide mark of pharma M&A with the recent Wy-Phi merger.  

  • Share/Save/Bookmark

Another Oncology Win in Personalized Medicine

June 1st, 2009 by Accelrys Team

header09_r1_c12At the World Biomarker Congress in Philadelphia this past week, I had a chance to catch up with a colleague, now a senior executive at a European-based global pharma company. Our conversation inevitably turned to Personalized Medicine, a shared interest of ours.

He updated me on some pre-release data that will be shown at Sunday’s meeting of the American Society of Clinical Oncologists (ASCO), concerning Iressa, Astra Zeneca’s lung cancer medication. Previously thought to be a commercial and scientific failure, Iressa is now at the vanguard of therapies “rescued by targeting.” iressa_logo

Targeted therapies are drugs that are shown to have a particular effectiveness in a subset of the overall population. “Targetting” in this context refers to the identification of a genetic difference which corresponds to better clinical outcomes for a particular group of patients. In the case of Iressa, the data that will be presented tomorrow shows that cancer progression is halted for over nine months in patients with the mutation, compared with 6 months for patients receiving chemotherapy (median values).

About one in ten cancer patients has this mutation. Overall, lung cancer kills 1.3 million people per year.  

But what if a patient doesn’t have the mutation? Then Chemotherapy is the better treatment option, which can hold back the cancer progression for five months, compared to only 1 month for Iressa (again, median values).

logo-herceptinIf these results are confirmed, Iressa will join the growing list of personalized medicine success stories in Oncology, which began with world’s first targeted therapy, Genentech’s Herceptin, in 1998. Since Herceptin, we’ve seen other highly publicized therapies with genetic targeting, such as Erbitux.

imcloneBut why so much genetic targeting in cancer, and not (yet) in other indications? Is it the serious nature of the disease or the quality of the data (high compliance in large populations) or something else? The Iressa story gives us a clue.

Astra Zeneca didn’t give up on it, and pursued semi-anecdotal findings of efficacy in some patients, even though it was not effective in the larger clinical trial population. Because of the large potential revenues resulting from effective cancer treatments, it becomes economic for companies to invest in risky clinical trials for treatments that might only be effective in 10% of the population.

Rittenhouse Square

Rittenhouse Square

As my colleague summed it up during lunch at a café on Rittenhouse Square in Philadelphia, “It’s about the money, of course. Cancer kills, and so cancer treatments cost.”

 

Oncology is clearly the vanguard for targeted therapies. But as scientists and marketing executives become more familiar and comfortable with a development process that results in a fragmented market, the techniques will inevitably be replicated in treatments that can only demand lower prices because they treat less serious diseases.

 

And that will make us all winners, no matter what mutations we have.

  • Share/Save/Bookmark