Regex for PHP. Search for words and return data after the words -
i'm trying make regex work i've been asked i'm having no luck making efficient enough.
objective make following efficient can be.
objective number 1. separate text using sentence endings (dot, 3 dots, exclamation point...).
objective number 2 numbers appear after string 'em'
here's example of possible small string , regex it. (the real 1 can hudge)
regex: old:
(?:[^.!?:]|...)(?:(?:[^.!?:]|...)*?em (\d+))*
new:
(?:[.!?]|[.][.][.])(?:(?:[^.!?]|[.][.][.])*?\bem\b (\d+))*
works string (i made up)
(i insert . in begining)
.foi visto que batalha em 1939 foi. claro que data que digo ser em 1939 é uma farsa. em 1938 já (insert em 1910) não havia reis.
what wanted make regex not backtrack not need backtrack. making think save processing requires like... reducing 30 seconds 20s or 10s! this1, takes 1s complete.
add:
thnx answers have 1 not fail. still backtracks much. solutions?
add (to answer 1 deleted question):
unfortunately have no sample data, asked me says not have sample data still needs done "to yesterday". if give me works text efficient can be, i'm can work , covert, if needed specific work. else i'll ask here again.
although question confusing, sounds have 2 different tasks best acomplished 2 different regexes. here tested script (i'm guessing) want:
<?php // test.php 20110430_1100 // test data. $text = 'foi visto que batalha em 1939 foi. claro'. ' que data que digo ser em 1939 é uma farsa. e'. 'm 1938 já (insert em 1910) não havia reis.'; // part 1: find numbers after "em". $re1 = '/\bem\b\s*(\d+)\b/i'; $count = preg_match_all($re1, $text, $matches); if ($count) $numbers = $matches[1]; // array of number strings. else $numbers = array(); // else no numbers found. // part 2: split text sentences. $re2 = '/(?<=[.!?])\s+/'; $sentences = preg_split($re2, $text, -1, preg_split_no_empty); // print out results. $ncnt = count($numbers); // count of numbers found. printf("there %d numbers following \"em\".\n", $ncnt); ($i = 0; $i < $ncnt; ++$i) { printf(" number[%d] = %s\n", $i + 1, $numbers[$i]); } $scnt = count($sentences); // count of sentences found. printf("\nthere %d sentences found.\n", $scnt); ($i = 0; $i < $scnt; ++$i) { printf(" sentence[%d] = \"%s\"\n", $i + 1, $sentences[$i]); } ?>
here output script.
there 4 numbers following "em".
number[1] = 1939
number[2] = 1939
number[3] = 1938
number[4] = 1910
there 3 sentences found.
sentence[1] = "foi visto que batalha em 1939 foi."
sentence[2] = "claro que data que digo ser em 1939 é uma farsa."
sentence[3] = "em 1938 já (insert em 1910) não havia reis."
Comments
Post a Comment