Jump to content

Isolate text into diferent variables.


nyxem90

Recommended Posts

Any ideas on how to extract text from html like this into multiple variables?

 

 

<li class="even" style="padding-top:1.25em"><label for="job_title">Title</label><div>Expatriate Tax Director</div></li>
<li class="odd"><label for="job_contract_type">Contract type</label><div>Permanent</div></li>
<li class="even"><label for="job_market_sector">Market sector</label><div><a href="/en/candidate/market_sectors/1.html">Accountancy / Auditing / Tax</a></div></li>
<li class="odd"><label for="job_country">Country</label><div><a href="/en/candidate/countries/GB.html">United Kingdom <img src='/images/flags/GB.png'/></a></div></li>
<li class="even"><label for="job_location">Location</label><div>Reading</div></li>
<li class="odd"><label for="job_min_salary">Salary</label><div>120.00 - 120.00 United Kingdom Pounds/Month

<li class="even"><label for="job_description">Description</label><div>Expat Tax Director 
<br/>Circa 120k - UK - Reading
<br/>
<br/>Our client is seeking experienced tax professional to undertake this central role.  This position has evolved due to the firm expanding. The firm now has an immediate need for a very experienced person to direct the team.  The ideal individual will have experience in advising International organisations with international tax, cost & risk assessment as well as reorganisations of processes and systems for the management of cross border employees.  Business Development skills and good Client Management are essential. 
<br/>Attractions:
<br/>- Combination of HNWI, Corporate Expat Programmes and International Partnerships
<br/>- Managing a skilled and established team
<br/>- Working within a successful and profitable office
<br/>- Central London location
<br/></div></li>
<li class="odd"><label for="job_expires_on">Expires on</label><div>February 28, 2013</div></li>
<li class="even"><label for="job_ideal_candidate">Ideal candidate</label><div>Expatriate Tax</div></li>
</ul></div></div><div class="warning">An EU passport or work permit is required for this position. Applicants without one of these will be rejected automatically.</div>

Link to comment
https://forums.phpfreaks.com/topic/260154-isolate-text-into-diferent-variables/
Share on other sites

Some people use preg_match() for this kind of task.  The exact expression to use depends on which text you need and what format you need it in.

 

You can also use a parser like http://simplehtmldom.sourceforge.net/ which will give you a data structure.

Yes, but you need to be more specific on what. EXACTLY, you need to extract. You will have to create rules to parse the data. For example, one possibility would be to create a process to extract each LI element into a name/value pair where the name is determined by the text within the label tag and the value is the text within the div that immediately follows the label tag. But, if the content was ever changed, or if it could be in a different format, the rules would not work.

 

Here is a quick example:

$pattern = "#<li[^>]*><label[^>]*>([^<]*)</label><div>([^<]*)#";
preg_match_all($pattern, $input, $matches, PREG_SET_ORDER);

//Clean up matches and put into
//data array with named keys
$data = array();
foreach($matches as $match)
{
    $data[trim($match[1])] = trim($match[2]);
}

//Output results
print_r($data);

 

Using your example as the input, this would be the result:

Array
(
    [Title] => Expatriate Tax Director
    [Contract type] => Permanent
    [Market sector] => 
    [Country] => 
    [Location] => Reading
    [salary] => 120.00 - 120.00 United Kingdom Pounds/Month
    [Description] => Expat Tax Director
    [Expires on] => February 28, 2013
    [ideal candidate] => Expatriate Tax
)

 

Note that there is no value for "Market sector" or "Country". That is because those values are enclosed in anchor tags. To get those values the pattern would need to be further modified to support values that are enclosed in anchor tags.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.