A roundtable discussion took place near the close of this year’s HCA meeting in San Francisco. The topics of Data Analysis and Management, Image Analysis and Computational Biology were folded into a single discussion. This roundtable was facilitated by Karel Kozak. Participants included:
Karel Kozak (Swiss Fed. Institute Of Technology)
Lisa Smith (Merck)
Peter Horvath (Swiss Fed. Institute Of Technology)
Achim Kirsch (PE/Evotec)
Ghislain Bonamy (Novartis GNF)
Abhay Kini (GE Healthcare)
Jonathan Sexton (North Carolina Central University)
Mark Bray (Broad Institute)
Chris Wood (Stowers Institute for Medical Research)
Pierre Turpin (Molecular Devices)
Mark Collins (ThermoFisher/Cellomics)
The opening shot from Schmerck (Lisa Smith from Schering now Merck) was fired at the vendors. The bullet in question? “Why tools for pattern recognition and machine learning on image data were not more rapidly addressed for vendor systems?” Vendors replied with their own question, “Why is this a better approach than algorithmic quantification of a known endpoint?” The result of the ensuing discussion was that the end-users want the ability to extract any additional information from their data that is not derived by the designed analysis algorithm, i.e., look for natural classes in the data, spot outliers, correlate to chemical structure of test compounds, etc. This does not necessarily have to be correlated to known biological endpoints – it can be purely exploratory. Vendors said “that’s why we need companies like Accelrys and products like Pipeline Pilot”. The marketplace needs a third-party environment which provides turnkey or almost-turnkey access to the data, and an exploratory environment like PLP in which users can develop methods to ask “what-if” questions of their data. When users clearly demonstrate that these techniques have merit, they will find their way into the instrument vendors’ products.
One other aspect of the above discussion which became apparent is that many, if not most, HCS users have no idea what the difference is between PCA, Classification, Support Vector Machines, genetic algorithms, Self-organizing maps, etc., let alone where or when to apply these methods. What they want, and need, is a kind of wizard which walks them through a process of determining what they want to learn from their data, and selecting internally the best method to do that. An analogy was drawn to curve-fitting programs which apply hundreds or thousands of models to a data set, and tell the user which ones produced the best fit. This idea of “opening up to the wider science community methods previously available only to discipline experts”, specifically in computational biology, is by no means in its infancy (see The Future of Computational Science, Scientific Computing World: May / June 2004).
The momentum in machine vision – learning, clustering, modeling, predicative science and ease of use was foreshadowed in the HCA East conference held in 2009 and will likely continue to be the area that enables researchers in High Content Screening and Analysis to make better informed decisions earlier in the discovery process.
Special thanks to contributing author Kurt Scudder.

