Indexing for Success: Effective and Efficient Analysis of Biological Data
ABSTRACT: Modern life sciences applications need to analyze and manage
large volumes of complex biological data. Many of these datasets are
growing at a rate faster than Moore's Law. Unfortunately, existing
methods for analyzing these datasets often do not scale with
increasing data sizes. To make matters worse, biologists want to
perform increasingly complex analyses on these datasets. Existing
solutions are inadequate to meet these demands and threaten to slow
the rate of progress in modern data-driven life sciences applications.
The central premise of this talk is that methods inspired by the
"database-style" of analyzing large datasets can provide viable
solutions to many of these problems. This talk describes ongoing work
in the Periscope project that aims to build efficient, effective, and
expressive tools for querying biological data. This talk will
highlight indexing and query processing techniques that we have
developed for analyzing biological graphs and sequences. Compared to
existing methods, our techniques are often orders of magnitude faster.
A more significant aspect of our work is that these methods are far
more expressive and effective in terms of the quality of results,
which has allowed biologists to generate insights from data-driven
analysis that was not possible with other existing tools.