savagenoob Posted October 4, 2011 Share Posted October 4, 2011 I need to get and seperate the data from a webpage, my regex is not working. Here is what the data looks like: <td class="datalabelnopad"> Insurance SERVICES OF NORTHERN CA INC.<br> 11111 W. MARCH LN. , P.O. BOX 1111 (91111)<br> STOCKTON, CA 91111<br> Phone: (209) 111-1111<br> Fax: (209) 111-1111<br> </td> It would be nice to seperate, business name, phys address, mail address, phone, and fax to go into seperate fields in a table. Even doing a simple: /<td class=("|\')datalabelnopad>("|\')>(.*?)<\/td>/ , is not showing the data between. Link to comment https://forums.phpfreaks.com/topic/248431-help-with-regex-on-a-screen-scrape/ Share on other sites More sharing options...
savagenoob Posted October 5, 2011 Author Share Posted October 5, 2011 I think its a problem with whitespace. How do I do a regex to get this string. <div class="strong"> Some Random String </div> Link to comment https://forums.phpfreaks.com/topic/248431-help-with-regex-on-a-screen-scrape/#findComment-1276263 Share on other sites More sharing options...
savagenoob Posted October 5, 2011 Author Share Posted October 5, 2011 I suck with regex, and Im talking to myself. <div class=("|\')strong("|\')>(/\s+/s)(.*?)(/\s+/s)<\/div> is this the right format for stripping whitespace? Link to comment https://forums.phpfreaks.com/topic/248431-help-with-regex-on-a-screen-scrape/#findComment-1276282 Share on other sites More sharing options...
savagenoob Posted October 7, 2011 Author Share Posted October 7, 2011 You freaking dumbass moron savage, your such a noob. How bout you friggen *google* you dumb nerd. Why dont you use DOM instead you idiot. Or if you still want to be a dumb nub, heres the solution: <div class=("|\')agentContainer("|\')>\n\s<div class="strong">\n\s(.*?)\n\s Link to comment https://forums.phpfreaks.com/topic/248431-help-with-regex-on-a-screen-scrape/#findComment-1276652 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.