nyxem90 Posted April 1, 2012 Share Posted April 1, 2012 Any ideas on how to extract text from html like this into multiple variables? <li class="even" style="padding-top:1.25em"><label for="job_title">Title</label><div>Expatriate Tax Director</div></li> <li class="odd"><label for="job_contract_type">Contract type</label><div>Permanent</div></li> <li class="even"><label for="job_market_sector">Market sector</label><div><a href="/en/candidate/market_sectors/1.html">Accountancy / Auditing / Tax</a></div></li> <li class="odd"><label for="job_country">Country</label><div><a href="/en/candidate/countries/GB.html">United Kingdom <img src='/images/flags/GB.png'/></a></div></li> <li class="even"><label for="job_location">Location</label><div>Reading</div></li> <li class="odd"><label for="job_min_salary">Salary</label><div>120.00 - 120.00 United Kingdom Pounds/Month <li class="even"><label for="job_description">Description</label><div>Expat Tax Director <br/>Circa 120k - UK - Reading <br/> <br/>Our client is seeking experienced tax professional to undertake this central role. This position has evolved due to the firm expanding. The firm now has an immediate need for a very experienced person to direct the team. The ideal individual will have experience in advising International organisations with international tax, cost & risk assessment as well as reorganisations of processes and systems for the management of cross border employees. Business Development skills and good Client Management are essential. <br/>Attractions: <br/>- Combination of HNWI, Corporate Expat Programmes and International Partnerships <br/>- Managing a skilled and established team <br/>- Working within a successful and profitable office <br/>- Central London location <br/></div></li> <li class="odd"><label for="job_expires_on">Expires on</label><div>February 28, 2013</div></li> <li class="even"><label for="job_ideal_candidate">Ideal candidate</label><div>Expatriate Tax</div></li> </ul></div></div><div class="warning">An EU passport or work permit is required for this position. Applicants without one of these will be rejected automatically.</div> Quote Link to comment https://forums.phpfreaks.com/topic/260154-isolate-text-into-diferent-variables/ Share on other sites More sharing options...
btherl Posted April 1, 2012 Share Posted April 1, 2012 Some people use preg_match() for this kind of task. The exact expression to use depends on which text you need and what format you need it in. You can also use a parser like http://simplehtmldom.sourceforge.net/ which will give you a data structure. Quote Link to comment https://forums.phpfreaks.com/topic/260154-isolate-text-into-diferent-variables/#findComment-1333386 Share on other sites More sharing options...
Psycho Posted April 2, 2012 Share Posted April 2, 2012 Yes, but you need to be more specific on what. EXACTLY, you need to extract. You will have to create rules to parse the data. For example, one possibility would be to create a process to extract each LI element into a name/value pair where the name is determined by the text within the label tag and the value is the text within the div that immediately follows the label tag. But, if the content was ever changed, or if it could be in a different format, the rules would not work. Here is a quick example: $pattern = "#<li[^>]*><label[^>]*>([^<]*)</label><div>([^<]*)#"; preg_match_all($pattern, $input, $matches, PREG_SET_ORDER); //Clean up matches and put into //data array with named keys $data = array(); foreach($matches as $match) { $data[trim($match[1])] = trim($match[2]); } //Output results print_r($data); Using your example as the input, this would be the result: Array ( [Title] => Expatriate Tax Director [Contract type] => Permanent [Market sector] => [Country] => [Location] => Reading [salary] => 120.00 - 120.00 United Kingdom Pounds/Month [Description] => Expat Tax Director [Expires on] => February 28, 2013 [ideal candidate] => Expatriate Tax ) Note that there is no value for "Market sector" or "Country". That is because those values are enclosed in anchor tags. To get those values the pattern would need to be further modified to support values that are enclosed in anchor tags. Quote Link to comment https://forums.phpfreaks.com/topic/260154-isolate-text-into-diferent-variables/#findComment-1333417 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.