SCM

[#311132] Haskell GHC : parallel regex-dna : using pcre

View Trackers | Archive One | Export CSV

Date:
2008-09-22 19:38
Priority:
3
State:
Closed
Submitted by:
Don Stewart (dons-guest)
Assigned to:
Nobody (None)
Category:
Haskell GHC
Group:
regex-dna
Resolution:
Accepted
Summary:
Haskell GHC : parallel regex-dna : using pcre

Detailed description
This is a variant of the current regex-dna Haskell GHC entry, parallelised using parMap, but importing Text.Regex.PCRE instead of Text.Regex.Posix.

To use the haskell-regex-pcre library, either find and install your native
package, or just build it from hackage:

$ wget http://hackage.haskell.org/packages/archive/regex-pcre-builtin/0.94.2.0.7.7/regex-pcre-builtin-0.94.2.0.7.7.tar.gz
$ tar xzf regex-pcre-builtin-0.94.2.0.7.7.tar.gz
$ cd regex-pcre-builtin-0.94.2.0.7.7
$ runhaskell Setup.hs configure --prefix=$HOME
$ runhaskell Setup.hs build
$ sudo runhaskell Setup.hs install

The program should be compiled with:

ghc B.hs --make -O2 -fglasgow-exts -package regex-posix -optc-O3 -threaded

And run with:

+RTS -N4 -qw -RTS

Observered parallelism:

$ time ./B +RTS -N4 -qw < /tmp/data
agggtaaa|tttaccct 356
[cgt]gggtaaa|tttaccc[acg] 1250
a[act]ggtaaa|tttacc[agt]t 4252
ag[act]gtaaa|tttac[agt]ct 2894
agg[act]taaa|ttta[agt]cct 5435
aggg[acg]aaa|ttt[cgt]ccct 1537
agggt[cgt]aa|tt[acg]accct 1431
agggta[cgt]a|t[acg]taccct 1608
agggtaa[cgt]|[acg]ttaccct 2178

50833411
50000000
66800214
./B +RTS -N4 -qw < /tmp/data 27.45s user 0.19s system 284% cpu 9.711 total


Note that this is significantly faster than using Text.Regex.Posix, although the code is identical, only an import changes.

Followups: Sort comments antichronologically

Message
Date: 2008-09-23 00:54
Sender: Isaac Gouy

> either find and install your native package

libghc6-regex-base-dev

"A library containing the interface API for the Haskell regular expressions library packages regex-posix, regex-pcre, regex-parsec, regex-tdfs, regex-dfa."

Except it doesn't seem to know about pcre?
Date: 2008-09-23 00:58
Sender: Don Stewart

> libghc6-regex-base-dev

That is the core interface. It uses one of the backend engines,

* regex-posix
* regex-pcre
* regex-tdfa

and some others.
I imagine you already have the posix library installed,

libghc6-regex-posix-dev

and now you'll need:

libghc6-regex-pcre-dev

if that's available on Ubuntu, otherwise, it can be built from hackage as above.
Date: 2008-09-23 01:10
Sender: Isaac Gouy

libghc6-regex-posix-dev

doesn't seem to exist
Date: 2008-09-23 01:58
Sender: Isaac Gouy

I'm making a lot of typos today!

libghc6-regex-pcre-dev

doesn't seem to exist - can you find out why?

(I have this vain hope of getting maintainers to fix problems rather than doing my own workarounds.)
Date: 2008-09-23 05:23
Sender: Don Stewart

I will chase up the debian and ubuntu (if any) developers.
In the coming year, we'll be pushing distros to adopt the standard Haskell platform, so this won't occur again,

http://www.cse.unsw.edu.au/~dons/papers/CPJS08.html

Till the, the hackage->debian->ubuntu pipeline will continue to have holes, sadly.
Date: 2008-09-30 01:09
Sender: Isaac Gouy

Ho hum installed from hackage.

Attached Files:

Attachments:
Size Name Date By Download
1 KiBB.hs2008-09-22 19:38dons-guestB.hs

Changes:

Field Old Value Date By
ResolutionNone2008-09-30 01:09igouy-guest
status_idOpen2008-09-30 01:09igouy-guest
close_date2008-09-30 01:092008-09-30 01:09igouy-guest
File Added2844: B.hs2008-09-22 19:38dons-guest
Powered By FusionForge