Jump to content

find/filter result between two strings more than once


Go to solution Solved by Psycho,

Recommended Posts

Hey all,

 

maybe someone can help me with this problem:

 

I am using PHP and i want to extract the red text including the orange text but only between the blue text.

i tried a lot, but i cant find the solution.

 

This is what i tried: $repattern = '/<td class="program_date">(\s+\S+.+)\s+.+(\s\S.+)<\/a>/i'; 

 

The final result should look like this:

Array with following Data inside:

 

Mo, 09.06.2014

                    13:45

                    16:15

 

Di, 10.06.2014

                    13:45

 

Mi, 11.06.2014

                    13:45

                    16:15

 

and so on..

 

Thanks in advance.

 

Here is a part from source:

                                <td class="program_time">
                                	<span class="program_head">Vorstellungen:</span><br />
                                    
                                    <table>
<tr>
<td class="program_date">
Mo, 09.06.2014
</td>
<td><a href='http://fmzimst.kinokartenreservierung.at/reservierung.php?id=74380'  target="_top">13:45</a> </td>
<td><a href='http://fmzimst.kinokartenreservierung.at/reservierung.php?id=74381'  target="_top">16:15</a> </td>
</tr>
<tr>
<td class="program_date">
Di, 10.06.2014
</td>
<td><a href='http://fmzimst.kinokartenreservierung.at/reservierung.php?id=74384'  target="_top">13:45</a> </td>
</tr>
<tr>
<td class="program_date">
Mi, 11.06.2014
</td>
<td><a href='http://fmzimst.kinokartenreservierung.at/reservierung.php?id=74388'  target="_top">13:45</a> </td>
<td><a href='http://fmzimst.kinokartenreservierung.at/reservierung.php?id=74389'  target="_top">16:15</a> </td>
</tr>
<tr>
<td class="program_date">
Do, 12.06.2014
</td>
<td><a href='http://fmzimst.kinokartenreservierung.at/reservierung.php?id=74392'  target="_top">13:45</a> </td>
<td><a href='http://fmzimst.kinokartenreservierung.at/reservierung.php?id=74393'  target="_top">16:15</a> </td>
</tr>
</table>

                                    
 
                                </td>

Edited by spaxi
  • Solution

spaxi,

 

Using the DOM, as cyberRobot suggests is definitely the better approach. Using that you are far less tied to hard-coded logic. You could, for example, find the text just looking for rows that include a TD with the class "program_date". It wouldn't matter if it is enclosed in single or double quotes or whether there were additional parameters in the TD tag that you were not expecting.

 

It will take more code, but it will actually be much more efficient. In the code below, I am using loadHTML() since I was using a string to test with. But, if you are referencing a web page you can pass a URL to the function and use loadHTMLFile().

function getPrograms($content)
{
    //Create variable to hold results
    $programResults = array();
    //Create DOM object
    $dom = new DOMDocument();
    //Load content into DOM object
    $dom->loadHTML($content);
    //Get ALL TR objects from DOM
    $rows = $dom->getElementsByTagName('tr');
    //Iterate through the rows
    foreach ($rows as $row)
    {
        //Get all TDs in current row
        $cells = $row->getElementsByTagName('td');
        //Set flag for program date
        $programDate = false;
        //Iterate through all cells in current row
        foreach($cells as $cell)
        {
            //If programDate false, this is first cell.
            if(!$programDate)
            {
                //Test if this is program row, if not skip to next row
                if($cell->getAttribute('class') != 'program_date')
                {
                      break;
                }
                //Set programDate for this row
                $programDate = trim($cell->nodeValue);
                //Skip to next cell
                continue;

            }             //Add date to current programDate record             $programResults[$programDate][] = trim($cell->nodeValue);         }     }     //Return results     return $programResults; }   $programs = getPrograms($html); echo "<pre>" . print_r($programs, 1) . "</pre>";

Output (using the sample text you provided)

Array
(
    [Mo, 09.06.2014] => Array
        (
            [0] => 13:45
            [1] => 16:15
        )
 
    [Di, 10.06.2014] => Array
        (
            [0] => 13:45
        )
 
    [Mi, 11.06.2014] => Array
        (
            [0] => 13:45
            [1] => 16:16
        )
 
    [Do, 12.06.2014] => Array
        (
            [0] => 13:45
            [1] => 16:17
        )
)
Edited by Psycho
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.