Jump to content

Isolate text into diferent variables.


nyxem90

Recommended Posts

Any ideas on how to extract text from html like this into multiple variables?

 

 

<li class="even" style="padding-top:1.25em"><label for="job_title">Title</label><div>Expatriate Tax Director</div></li>
<li class="odd"><label for="job_contract_type">Contract type</label><div>Permanent</div></li>
<li class="even"><label for="job_market_sector">Market sector</label><div><a href="/en/candidate/market_sectors/1.html">Accountancy / Auditing / Tax</a></div></li>
<li class="odd"><label for="job_country">Country</label><div><a href="/en/candidate/countries/GB.html">United Kingdom <img src='/images/flags/GB.png'/></a></div></li>
<li class="even"><label for="job_location">Location</label><div>Reading</div></li>
<li class="odd"><label for="job_min_salary">Salary</label><div>120.00 - 120.00 United Kingdom Pounds/Month

<li class="even"><label for="job_description">Description</label><div>Expat Tax Director 
<br/>Circa 120k - UK - Reading
<br/>
<br/>Our client is seeking experienced tax professional to undertake this central role.  This position has evolved due to the firm expanding. The firm now has an immediate need for a very experienced person to direct the team.  The ideal individual will have experience in advising International organisations with international tax, cost & risk assessment as well as reorganisations of processes and systems for the management of cross border employees.  Business Development skills and good Client Management are essential. 
<br/>Attractions:
<br/>- Combination of HNWI, Corporate Expat Programmes and International Partnerships
<br/>- Managing a skilled and established team
<br/>- Working within a successful and profitable office
<br/>- Central London location
<br/></div></li>
<li class="odd"><label for="job_expires_on">Expires on</label><div>February 28, 2013</div></li>
<li class="even"><label for="job_ideal_candidate">Ideal candidate</label><div>Expatriate Tax</div></li>
</ul></div></div><div class="warning">An EU passport or work permit is required for this position. Applicants without one of these will be rejected automatically.</div>

Link to comment
Share on other sites

Yes, but you need to be more specific on what. EXACTLY, you need to extract. You will have to create rules to parse the data. For example, one possibility would be to create a process to extract each LI element into a name/value pair where the name is determined by the text within the label tag and the value is the text within the div that immediately follows the label tag. But, if the content was ever changed, or if it could be in a different format, the rules would not work.

 

Here is a quick example:

$pattern = "#<li[^>]*><label[^>]*>([^<]*)</label><div>([^<]*)#";
preg_match_all($pattern, $input, $matches, PREG_SET_ORDER);

//Clean up matches and put into
//data array with named keys
$data = array();
foreach($matches as $match)
{
    $data[trim($match[1])] = trim($match[2]);
}

//Output results
print_r($data);

 

Using your example as the input, this would be the result:

Array
(
    [Title] => Expatriate Tax Director
    [Contract type] => Permanent
    [Market sector] => 
    [Country] => 
    [Location] => Reading
    [salary] => 120.00 - 120.00 United Kingdom Pounds/Month
    [Description] => Expat Tax Director
    [Expires on] => February 28, 2013
    [ideal candidate] => Expatriate Tax
)

 

Note that there is no value for "Market sector" or "Country". That is because those values are enclosed in anchor tags. To get those values the pattern would need to be further modified to support values that are enclosed in anchor tags.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.