Machine Learning – “What if” it enabled exploratory analysis in High Content Screening

January 22nd, 2010 by Tim Moran

A roundtable discussion took place near the close of this year’s HCA meeting in San Francisco. The topics of  Data Analysis and Management,  Image Analysis and Computational Biology were folded into a single discussion. This roundtable was facilitated by Karel Kozak. Participants included:

Karel Kozak (Swiss Fed. Institute Of Technology)

Lisa Smith (Merck)

Peter Horvath (Swiss Fed. Institute Of Technology)

Achim Kirsch (PE/Evotec)

Ghislain Bonamy (Novartis GNF)

Abhay Kini (GE Healthcare)

Jonathan Sexton (North Carolina Central University)

Mark Bray (Broad Institute)

Chris Wood (Stowers Institute for Medical Research)

Pierre Turpin (Molecular Devices)

Mark Collins (ThermoFisher/Cellomics)

The opening shot from Schmerck (Lisa Smith  from Schering now Merck) was fired at the vendors. The bullet in question? “Why tools for pattern recognition and machine learning on image data were not more rapidly addressed for vendor systems?”  Vendors replied with their own question, “Why is this a better approach than algorithmic quantification of a known endpoint?” The result of the ensuing discussion was that the end-users want the ability to extract any additional information from their data that is not derived by the designed analysis algorithm, i.e., look for natural classes in the data, spot outliers, correlate to chemical structure of test compounds, etc. This does not necessarily have to be correlated to known biological endpoints – it can be purely exploratory. Vendors said “that’s why we need companies like Accelrys and products like Pipeline Pilot”. The marketplace needs a third-party environment which provides turnkey or almost-turnkey access to the data, and an exploratory environment like PLP in which users can develop methods to ask “what-if” questions of their data. When users clearly demonstrate that these techniques have merit, they will find their way into the instrument vendors’ products.

One other aspect of the above discussion which became apparent is that many, if not most, HCS users have no idea what the difference is between PCA, Classification, Support Vector Machines, genetic algorithms, Self-organizing maps, etc., let alone where or when to apply these methods. What they want, and need, is a kind of wizard which walks them through a process of determining what they want to learn from their data, and selecting internally the best method to do that. An analogy was drawn to curve-fitting programs which apply hundreds or thousands of models to a data set, and tell the user which ones produced the best fit. This idea of “opening up to the wider science community methods previously available only to discipline experts”, specifically in computational biology, is by no means in its infancy (see The Future of Computational Science, Scientific Computing World: May / June 2004).

The momentum in machine vision – learning, clustering, modeling, predicative science and ease of use was foreshadowed in the HCA East conference held in 2009 and will likely continue to be the area that enables researchers in High Content Screening and Analysis to make better informed decisions earlier in the discovery process.

Special thanks to contributing author Kurt Scudder.

  • Share/Save/Bookmark

If a picture paints a thousand words, can I learn from an Image?

December 3rd, 2009 by Tim Moran

Machine learning continued as a growing theme at this years HCA conference.

This first HCA conference east held in Boston, September of this year, showed promise of the increasing use of machine vision tools. These tools are making their way in to the hands of the biologist for everything from subcellular classification and pattern recognition to predictive mechanism of action based on a multivariate image output. The theme continues to grow and will be a major focus at the upcoming HCA 2010 conference in January, as is evidenced by numerous talks around the subject. Mark-Anthony Bray, Ph.D., Computational Biologist, Imaging Platform, Broad Institute, will talk on quantifying  image-based phenotypes with machine learning algorithms. Peter Horvath, Ph.D., Image Processing Scientist, Light Microscopy Centre, ETH Zurich, will also discuss  machine intelligence both for classification as well as for quality control. Pattern Recognition will be applied to Image-Based Small Molecule Screening Data by John McLaughlin, Ph.D., Scientist & Manager, Biology, Rigel Pharmaceuticals, Inc. Numerous other talks by Acclerys, Novartis and Carnegie Melon, to name a few, will also have repeating themes of learning.  I can’t help but wonder if the growth in this area is due primarily to the need or if the adoption has been increased by the growing  number of informaticians working alongside the High Content Screening biologist.

For some good background on machine learning by sure to follow Dana Honeycutt’s blog postings, here’s a link to get you started. Good Models Require Good Data October 1st, 2009 by Dana Honeycutt, Ph.D.

  • Share/Save/Bookmark

Bio-IT World Europe

October 16th, 2009 by Tim Moran

Cloud nein oder Cloud ja? Are clouds just for the birds?

This years first ever Bio-IT World in Europe was hosted by the charming city of Hanover. The cities exhibition center, with its high tech architectural engineering was a fitting venue for the IT experts that gathered from across the globe. The conference was held in conjunction with BioTechnica. A multitude of vendors provided ample exhibition of their goods.

Emerging conference themes included  large data volume storage and (externally hosted) cloud computing. There was general agreement that some enhancements to the GUI, perhaps thru a platform like Pipeline Pilot,  would bring faster adoption. Some of the live demos presented a difficult to use command line interface. Chris Dagdigian of Bioteam says Amazon looks strong as an emerging leader in cloud providers. Many attendees seem to be taking a “wait and see” attitude as early adopters begin to pave the way.  Perhaps, as we have seen in other domains such as High Content Screening, informaticians will take advantage of large data volume for predictive modeling and begin to entice others out of the nest and into the clouds.

  • Share/Save/Bookmark

Model turnout at the High Content Analysis East conference. A picture is worth a thousand words.

October 8th, 2009 by Tim Moran

HighContent

CHI’s High Content Analysis East conference was a great success. The conference had overriding themes in data informatics, supervised and unsupervised learning and the use of predictive modeling.
Anne Carpenter shared a user friendly interface in Cell Profiler for an interactive semi supervised cell level learning tool. Neil Carragher of Astra Zeneca showed some really interesting work on predicting mechanism of action thru the use of learning on multivariate image descriptors. And CMU’s Robert Murphy presented the use of learning and modeling on protein patterns in the cell. All in all we, are really beginning to see the “high content” promise of the industry come to fruition. It is finally becoming more common to see research combining hundreds of variable readouts from the cells for extracting more sophisticated knowledge. Hopefully we will see these themes advancing at the upcoming HCA event in January, from the 11-15th at the Fairmont hotel in San Francisco,

Also of interest were several advances in managing High Content Screening Data. Present were Michael Sjaastad of Molecular Devices, Martin Daffertshofer of Perkin Elmer (Evotec), and Karol Kozak,, LMC-RISC, Institute for Biochemistry . They all presented promising offerings in managing HCS data. It was interesting to note that Thermo (Cellomics) did not introduce any new tools in managing their HCS data, perhaps telling of the great foresight that Mark Collins had, when originally designing Store which has stood the test of many years.

  • Share/Save/Bookmark

Content with your Content?

September 21st, 2009 by Tim Moran

Informatics in High Content Screening (HCS) is reshaping the mix of scientists driving drug discovery efforts. In the early days of HCS I worked closely with electrical, mechanical and software engineers to develop better systems for image acquisition and processing. My responsibilities as an HCS biologist involved painstaking hours of sample preparation and cell cultures and constant enhancements to my materials and methods section for preparing my biological specimens for imaging. I was motivated by the many new collaborative efforts that began with the software engineers, the systems engineers and the machine vision scientist developing HCS systems. I found myself teaching basic concepts of biology as I learned about illumination and optics, piezoelectric drives for auto focusing and, of course, the strings of zeros and ones that would eventually tell me what happened to my protein. It was exciting for me to be part of a cross functional team developing new applications by piecing together advances in hardware, image processing and biological assay technologies.

High Content Screening systems and vendor software has come along way since my introduction to the technology ten years ago. Vendors struggled between giving end users powerful, flexible systems and ease of use (1). The bottleneck has shifted from application development to data informatics . Software systems in HCS have evolved to integrate databases and other related sources for chemical structures, target characteristics, and assay results. Today, I collaborate with colleagues in HCS in new areas that include data mining, principal component analysis, Bayesian modeling, decision trees, and data management. The mix of HCS conference speakers and attendees has shifted from what had primarily been assay developers to a growing population of informaticians and IT experts. Talks have moved beyond assay design and system development to incorporate more downstream data processing. We have worked on complex fingerprinting methods for predicting characteristics of a compound for such things as predicting mechanism of action or how it might affect a particular biological pathway involved for example, in neuronal stem cell differentiation. Vendors are moving to more open systems for image processing and are integrating more third party applications into their HCS acquisition systems to keep up with the shifting bottlenecks and emerging solutions. Informaticians have been able to improve data analysis efforts and significantly reduce the number of man-hours required for downstream data analysis (2).  I’ve been fortunate in having been able to develop relationships with experts at most of the leading HCS instrument companies.  My journey has been one of constant growth and continuous learning. I’m anxious to know what’s coming next in High Content Screening and eager to learn from my ever growing network of scientific experts.

1.  High-Content Analysis: Balancing Power and Ease of Use by Jim Kling

2. Data Analysis For A High Content Assay Using Pipeline Pilot™: 8x Reduction in Manhours from a poster by L. Bleicher, Brain Cells Inc

  • Share/Save/Bookmark

Is your High Content Screaming?

September 17th, 2009 by Tim Moran

Attendance for the first annual High Content Analysis East conference is looking great.  There is quite a bit of focus at this meeting on data integration, analysis and management. And what a great time! There’s been an explosion in informatics related software for High Content Analysis this year. I’m expecting to see some really cool new software features in informatics from several experts in the field including; Jason Swedlow (OME), Michael Sjaastad (MDS),  Oliver Leven (GeneData),  Neil Carragher (AstraZeneca) and the queen of High Content Analysis herself, Ann Hoffman (Roche).

Our own long time HCA veteran Kurt Scudder will be presenting “Beyond Basic HCS Data Management: Learning, Modeling, and Advanced Data Analysis using Pipeline Pilot”. Stop by the booth (lucky # 7) and say hi. I’ve got some wiz bang applications for a High Content Analysis Integrated QC Drill Down, as well as easy to use single cell and compound modeling apps.

  • Share/Save/Bookmark