Jump to content

Recommended Posts

Any ideas on how to extract text from html like this into multiple variables?

 

 

<li class="even" style="padding-top:1.25em"><label for="job_title">Title</label><div>Expatriate Tax Director</div></li>
<li class="odd"><label for="job_contract_type">Contract type</label><div>Permanent</div></li>
<li class="even"><label for="job_market_sector">Market sector</label><div><a href="/en/candidate/market_sectors/1.html">Accountancy / Auditing / Tax</a></div></li>
<li class="odd"><label for="job_country">Country</label><div><a href="/en/candidate/countries/GB.html">United Kingdom <img src='/images/flags/GB.png'/></a></div></li>
<li class="even"><label for="job_location">Location</label><div>Reading</div></li>
<li class="odd"><label for="job_min_salary">Salary</label><div>120.00 - 120.00 United Kingdom Pounds/Month

<li class="even"><label for="job_description">Description</label><div>Expat Tax Director 
<br/>Circa 120k - UK - Reading
<br/>
<br/>Our client is seeking experienced tax professional to undertake this central role.  This position has evolved due to the firm expanding. The firm now has an immediate need for a very experienced person to direct the team.  The ideal individual will have experience in advising International organisations with international tax, cost & risk assessment as well as reorganisations of processes and systems for the management of cross border employees.  Business Development skills and good Client Management are essential. 
<br/>Attractions:
<br/>- Combination of HNWI, Corporate Expat Programmes and International Partnerships
<br/>- Managing a skilled and established team
<br/>- Working within a successful and profitable office
<br/>- Central London location
<br/></div></li>
<li class="odd"><label for="job_expires_on">Expires on</label><div>February 28, 2013</div></li>
<li class="even"><label for="job_ideal_candidate">Ideal candidate</label><div>Expatriate Tax</div></li>
</ul></div></div><div class="warning">An EU passport or work permit is required for this position. Applicants without one of these will be rejected automatically.</div>

Link to comment
https://forums.phpfreaks.com/topic/260154-isolate-text-into-diferent-variables/
Share on other sites

Some people use preg_match() for this kind of task.  The exact expression to use depends on which text you need and what format you need it in.

 

You can also use a parser like http://simplehtmldom.sourceforge.net/ which will give you a data structure.

Yes, but you need to be more specific on what. EXACTLY, you need to extract. You will have to create rules to parse the data. For example, one possibility would be to create a process to extract each LI element into a name/value pair where the name is determined by the text within the label tag and the value is the text within the div that immediately follows the label tag. But, if the content was ever changed, or if it could be in a different format, the rules would not work.

 

Here is a quick example:

$pattern = "#<li[^>]*><label[^>]*>([^<]*)</label><div>([^<]*)#";
preg_match_all($pattern, $input, $matches, PREG_SET_ORDER);

//Clean up matches and put into
//data array with named keys
$data = array();
foreach($matches as $match)
{
    $data[trim($match[1])] = trim($match[2]);
}

//Output results
print_r($data);

 

Using your example as the input, this would be the result:

Array
(
    [Title] => Expatriate Tax Director
    [Contract type] => Permanent
    [Market sector] => 
    [Country] => 
    [Location] => Reading
    [salary] => 120.00 - 120.00 United Kingdom Pounds/Month
    [Description] => Expat Tax Director
    [Expires on] => February 28, 2013
    [ideal candidate] => Expatriate Tax
)

 

Note that there is no value for "Market sector" or "Country". That is because those values are enclosed in anchor tags. To get those values the pattern would need to be further modified to support values that are enclosed in anchor tags.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.