Seminar: High-performance grammar-based stream processing

2016-07-07: by Fritz Henglein (DIKU). The seminar will take place July 7th, 13.15-14.00 at University of Copenhagen, SUND/SCIENCE, Grønnegårdsvej 7, 2nd floor library, Frederiksberg C.

Refreshments will be provided and registration is not necessary.

We present Kleenex, a domain-specific language consisting of context-free grammars with embedded output actions for specifying nondeterministic finite transducers, and its compilation to streaming string transducers.  It combines worst-case linear-time performance, sustained high throughput and optimal (earliest possible) output action scheduling.  Kleenex achieves consistently high single-core throughput rates of 1 Gb/s or more on stock hardware. It performs well compared to both more specialized and to general-purpose sequence processing tools such as AWK, sed, RE2, Ragel and regular-expression libraries. 
We illustrate Kleenex using an exact DNA-matching problem, present its application to approximate pattern matching (ongoing work), and discuss its possible extension (based on work at MSR Redmond and in HIPERFIT to achieve 1 Tb/s throughput on stock hardware using massive (GPU- and cluster-based) data parallelism (possible future work). 
Joint work with Bjørn Bugge Grathwohl, Ulrik Terp Rasmussen, Kristoffer Aalund Søholm, Peter Rosenberg Troelsen, Sebastian Paaske Tørholm,partially supported by the Danish Independent Research Council under Project Kleene Meets Church.