SCM

[#312099] Erlang, regex-dna, Hynek Vychodil, 2009-11-21

View Trackers | Archive Two | Export CSV

Date:
2009-11-21 20:29
Priority:
3
State:
Closed
Submitted by:
Hynek Vychodil (pichi-guest)
Assigned to:
Nobody (None)
Resolution:
Accepted
Group:
regex-dna
Category:
Erlang
Summary:
Erlang, regex-dna, Hynek Vychodil, 2009-11-21

Detailed description
I have written this version inspired by regex-dna Erlang HiPE #5 program.
When I compare with it it seems about 10% faster on my dual core laptop.
Because I improved mainly serial part thus I think on quad core it should be more significant.

Main differences:
1/ Very fast Port line input instead stdio (~5x)
2/ Faster IUB code alternatives explicit expansion
using binary instead lists (~5x)
3/ Precompile regexps in data loading phase
4/ Simpler dispatch and result join code

Followups: Sort comments antichronologically

Message
Date: 2009-11-21 20:34
Sender: Hynek Vychodil

I forgot almost most important thing, how to compile and run (NOTE -noinput parameter!)

MAKE:
erlc regexdna.erl

COMMAND LINE:
erl -smp enable -noshell -noinput -run regexdna main < input.txt
Date: 2009-11-22 20:36
Sender: Isaac Gouy

NOTE how the other erlang regex-dna programs ARE run

erl -smp enable -noshell -run regexdna main 0 < regexdna-input50000.txt


{"init terminating in do_boot",{undef,[{regexdna,main,[["0"]]},{init,start_it,1},{init,start_em,1}]}}




Crash dump was written to: erl_crash.dump

init terminating in do_boot ()
Date: 2009-11-22 23:42
Sender: Hynek Vychodil

Yes, I know, but mine regex-dna have to be run in different way.
Is there any reason why all regex-dna have to be run in exactly same way?
-noinput parameter is vital for way how mine IO works. In this way mine input routine takes 1.3s and standard io takes 9.5s on mine laptop. I can change it to accept 0 as parameter, but -noinput is significant. It prevents erlang to grab stdin before mine program.
Date: 2009-11-23 00:45
Sender: Isaac Gouy

> I can change it to accept 0 as parameter

If you don't your program will hang until it times out after one hour.
Date: 2009-11-23 09:50
Sender: Hynek Vychodil

>> I can change it to accept 0 as parameter
>
>If you don't your program will hang until it times out after one hour.

1/ If there is not -noinput parameter it hangs forever. Because erlang stdio grabbed stdin instead mine program.
2/ Lack of accepting 0 causes error:
{"init terminating in do_boot",{undef,[{regexdna,main,[["0"]]},{init,start_it,1},{init,start_em,1}]}}

I can solve 2/ but I can't solve 1/ because it's vital part of io improvement. Easiest way to solve both, just run it with this command line:
erl -smp enable -noshell -noinput -run regexdna main <
regexdna-input50000.txt

There is not any constrain how should it be run in benchmark task description:

>Each program should
>
> * read all of a redirected FASTA format file from stdin, and record the
> sequence length
> * use the same simple regex pattern match-replace to remove FASTA
> sequence descriptions and all linefeed characters, and record the
> sequence length
> * use the same simple regex patterns, representing DNA 8-mers and their
> reverse complement (with a wildcard in one position), and (one pattern at
> a time) count matches in the redirected file
> * write the regex pattern and count
> * use the same simple regex patterns to make IUB code alternatives
> explicit, and (one pattern at a time) match-replace the pattern in the
> redirect file, and record the sequence length
> * write the 3 recorded sequence lengths

There is no word about 0 as parameter or any other parameters on command line, just only about reading data from stdin and mine program does. I can only repeat my question, Is there any reason why all regex-dna programs have to be run with exactly same command line?
Date: 2009-11-23 16:02
Sender: Isaac Gouy

Quarreling with me about whether your program must accept a 0 command line argument serves no purpose.

Unless it accepts a 0 command line argument your program will fail.
Date: 2009-11-23 17:26
Sender: Hynek Vychodil

Is there way to change attachment in this submit or I have to submit new one? When I add regexdna.erl in Attachments tab it ends up with:

Open transaction detected!!!

Fix is very simple, just add main/1 to exports and define

main(_) -> main().

Should I have to add file with different name or such?

Attached Files:

Attachments:
Size Name Date By Download
4 KiBregexdna.erl2009-11-21 20:29pichi-guestregexdna.erl

Changes:

Field Old Value Date By
status_idOpen2009-11-23 22:58igouy-guest
close_date2009-11-23 22:582009-11-23 22:58igouy-guest
ResolutionNone2009-11-23 22:58igouy-guest
File Added3517: regexdna.erl2009-11-21 20:29pichi-guest
Powered By FusionForge