twittoris Posted September 4, 2010 Share Posted September 4, 2010 I am trying to take a specific link from my site and place it into my database. I only want links starts with CORPSEARCH.ENTITY_INFORMATION?p_nameid= Can someone point me in the right direction here? Code for this is below: // make the cURL request to $target_url $html= curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } // parse the html into a DOMDocument $dom = new DOMDocument(); @$dom->loadHTML($html); // grab all the on the page $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); $sql="INSERT INTO links(cid, nlink)VALUES('$i','$url')"; $result=mysql_query($sql); echo $result; echo $url; Quote Link to comment https://forums.phpfreaks.com/topic/212530-parsing-domdocument-and-only-keeping-a-one-link/ Share on other sites More sharing options...
twittoris Posted September 4, 2010 Author Share Posted September 4, 2010 What if I implement preg_match somewhere in the code will it pull the urls containing it? Quote Link to comment https://forums.phpfreaks.com/topic/212530-parsing-domdocument-and-only-keeping-a-one-link/#findComment-1107227 Share on other sites More sharing options...
twittoris Posted September 4, 2010 Author Share Posted September 4, 2010 Here I have edited it a little and put the script online but it is still spitting out every link on the page. http://empirebuildingsestate.com/table.php I just want to grab any link similar to this layout only. CORPSEARCH.ENTITY_INFORMATION?p_nameid=3236937&p_corpid=3227476&p_entity_name=%41%72%77%65%6E%20%45%71%75%69%74%69%65%73&p_name_type=%41&p_search_type=%42%45%47%49%4E%53&p_srch_results_page=0 $dom = new DOMDocument(); @$dom->loadHTML($html); // grab all the on the page $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); preg_match_all(nameid,$url); $sql="INSERT INTO links(cid, nlink)VALUES('$i','$url')"; $result=mysql_query($sql); echo $result; echo $url; // if successfully insert data into database, displays message "Successful". if($result){ echo "Successful"; echo "<BR>"; } else { echo "ERROR"; } echo "<br />Link stored: $url"; } ?> Quote Link to comment https://forums.phpfreaks.com/topic/212530-parsing-domdocument-and-only-keeping-a-one-link/#findComment-1107234 Share on other sites More sharing options...
wildteen88 Posted September 4, 2010 Share Posted September 4, 2010 Use the built in XPath function, starts_with() to select only the links that begin with 'CORPSEARCH.ENTITY_INFORMATION' So change this $hrefs = $xpath->evaluate("/html/body//a"); To $hrefs = $xpath->evaluate("/html/body//a[starts-with(@href, 'CORPSEARCH.ENTITY_INFORMATION')]"); Or it can be just this $hrefs = $xpath->evaluate("//a[starts-with(@href, 'CORPSEARCH.ENTITY_INFORMATION')]"); Now your loop will be for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); echo '<p>Found:<br />' . $url. '<br />Adding it to the database... '; $sql="INSERT INTO links(cid, nlink)VALUES('$i','$url')"; $result = mysql_query($sql); echo (($result) ? 'Success!' : 'FAIL') . '</p>'; } Quote Link to comment https://forums.phpfreaks.com/topic/212530-parsing-domdocument-and-only-keeping-a-one-link/#findComment-1107255 Share on other sites More sharing options...
twittoris Posted September 4, 2010 Author Share Posted September 4, 2010 Awesome! That was it. Thanks so much for your help. Quote Link to comment https://forums.phpfreaks.com/topic/212530-parsing-domdocument-and-only-keeping-a-one-link/#findComment-1107262 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.