A new algorithm for quantifying binding site pattern similarity with applications for Next Generation Sequencing

Document Type


Publication Date



New sources of regulatory data, such as transcription factor ChIP-seq experiments, can yield important insights into biological function through downstream analysis of motifs. Position Frequency Matrices (PFMs) are a standard format for representing transcription factor binding patterns. Comparison measures between these binding patterns are necessary to allow more sophisticated detection and classification of regulatory sequences. In this work we have developed a novel algorithm for gapped alignment of PFMs called PfmSim. We compare our measure with a standard measure, Sandelin and Wasserman, on similarity and classification tasks. Our measure gives better similarity values as evaluated by multiple tests.