Separate Multiline blocks

thepi · June 24, 2010

Hello everybody,

(Sorry for my english, I'm practicing !)

I was trying to figure out how to cut a whole text file with the following pattern :

Question #10/10
Question title: (1 answer)
(*) - first
( ) - Second


Question #09/10
Question title: (2 answer)
[*] - first
[ ] - Second
[*] - 3
[ ] - 4

So

1st line : Question #X/Y

2nd line : The question...always in one line

3 to ... N lines : possible answers\r\n

\r\n

1st line and so on....

So I open the content of the file :

$file = file_get_contents("file.txt");
$reg='/(?!(\r\n){3}).*(\r\n){3}/esiU';
preg_match_all($reg, utf8_decode($file ),$reg_array);

And I can get$reg_array containing all block of question/answers...

where i'm blocking right now(I know I could have do it in one single regexp...), is to separate the number of the question, and...the variable number of lines of possible answers before the 2 empty lines...

I don't necessary want the solution, but at least some good hints to continue my tries...

I'm not able to select only the first line (didnt find this part on the internet...), nor the "second", nor the "rest without the empty lines"

in advance,thanks for your patience and your help !

Thepi

ZachMEdwards · June 24, 2010

$pattern = '/(.+(?:\n){0,1})/';

thepi · June 24, 2010

$pattern = '/(.+(?:\n){0,1})/';

Ok thanks !! I edited it a bit :

$pattern = '/(.+(?!\n))/';

It matches the line without the \n

I then matched till I find '(' or '[' after the 2nd line and I've my 'answer list' !

edit : Ok, "Solved topic" thanks

salathe · June 24, 2010

$pattern = '/(.+(?:\n){0,1})/'; // Zach's
$pattern = '/(.+(?!\n))/'; // thepi's

Both of those are trying to be fancy where they needn't be so. The pattern /.+/ will do what it appears you want (match single lines of text, without newlines). The outer parentheses are unnecessary since the whole matched string is available with $match[0] (or $0 in a replacement). The check for the existence (or not) of a newline is also redundant since the dot metacharacter will never match one; its greediness will consume the whole line of text stopping at a newline (or the end of the subject string).

thepi · June 24, 2010

$pattern = '/(.+(?:\n){0,1})/'; // Zach's
$pattern = '/(.+(?!\n))/'; // thepi's
Both of those are trying to be fancy where they needn't be so. The pattern /.+/ will do what it appears you want (match single lines of text, without newlines). The outer parentheses are unnecessary since the whole matched string is available with $match[0] (or $0 in a replacement). The check for the existence (or not) of a newline is also redundant since the dot metacharacter will never match one; its greediness will consume the whole line of text stopping at a newline (or the end of the subject string).

Hey salathe, thanks for the explanations, would be usefull

Basically I tried with "." ("Any character except newline" cheat sheet), but I wanted something more complex... Once here by formulating my thoughts I saw I could do it line by line ! /.+/ would have work fine just for the line by line !

cags · June 25, 2010

If you wish to simply go through the information line by line, wouldn't you be better off simply explode'ing the information. Or depending on where you are getting it from simply reading it a line at a time instead?

Sign In

Separate Multiline blocks

Recommended Posts

thepi

Link to comment

Share on other sites

ZachMEdwards

Link to comment

Share on other sites

thepi

Link to comment

Share on other sites

salathe

Link to comment

Share on other sites

thepi

Link to comment

Share on other sites

cags

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information