schapel Posted August 2, 2009 Share Posted August 2, 2009 Here is the situation I'm in, and where I'm stuck. Maybe someone has some suggestions for me. I have a script that visits a particular website search engine, and punches in a search term. The script already extracts the valid block of html data which is the raw results, in a specific table on the page. So, the data that displays on the scraper script is everything between these <table> tags. What I need to do is grab the information from a specific column on the table, between the <td></td> tags. The trick to this problem is that I'm guessing most of you will suggest to run regex on the results to capture everything between <td></td>, however, I typically ONLY need the first column or second column of data. Is there some way to have regex or some other function loop through my results, then find the </tr> or </th> tag, which then triggers is to only extract data from the NEXT <td> </td> cell. This is the only way I could think of to identify the location of a 'first column' cell, because there is no special style attribute or any other identifier in the <td> tag. Any suggestions? Link to comment https://forums.phpfreaks.com/topic/168521-extract-text-data-from-an-html-table-between-the-tags/ Share on other sites More sharing options...
watsmyname Posted August 2, 2009 Share Posted August 2, 2009 Here is the situation I'm in, and where I'm stuck. Maybe someone has some suggestions for me. I have a script that visits a particular website search engine, and punches in a search term. The script already extracts the valid block of html data which is the raw results, in a specific table on the page. So, the data that displays on the scraper script is everything between these <table> tags. What I need to do is grab the information from a specific column on the table, between the <td></td> tags. The trick to this problem is that I'm guessing most of you will suggest to run regex on the results to capture everything between <td></td>, however, I typically ONLY need the first column or second column of data. Is there some way to have regex or some other function loop through my results, then find the </tr> or </th> tag, which then triggers is to only extract data from the NEXT <td> </td> cell. This is the only way I could think of to identify the location of a 'first column' cell, because there is no special style attribute or any other identifier in the <td> tag. Any suggestions? you got to use php DOM. You can get pre build class with usage examples from here http://simplehtmldom.sourceforge.net/ a very useful class Link to comment https://forums.phpfreaks.com/topic/168521-extract-text-data-from-an-html-table-between-the-tags/#findComment-888973 Share on other sites More sharing options...
schapel Posted August 2, 2009 Author Share Posted August 2, 2009 Great link by the way, the functions on that page once you download them are very easy to use. That is exactly what I needed, although I was worried about moving away from Regex. Thanks much. Link to comment https://forums.phpfreaks.com/topic/168521-extract-text-data-from-an-html-table-between-the-tags/#findComment-888996 Share on other sites More sharing options...
watsmyname Posted August 2, 2009 Share Posted August 2, 2009 Great link by the way, the functions on that page once you download them are very easy to use. That is exactly what I needed, although I was worried about moving away from Regex. Thanks much. nice to know that it helped you mate, regex is only the thing programmers would like to stay away from. Link to comment https://forums.phpfreaks.com/topic/168521-extract-text-data-from-an-html-table-between-the-tags/#findComment-889000 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.