Jump to content

TheStudent2023

Members
  • Posts

    136
  • Joined

  • Last visited

Everything posted by TheStudent2023

  1. Web Gurus, Have you ever programmed api stuffs ? Yes or no ? I have never programmed any. First is tonight. Let us say, for example, I want to allow people to pay me with BitCoin on my website. This following code is API stuff of Official BitCoin Payment Gateway. How to integrate this on my website ? Meaning, how to write the html form so the following api php code is integrated with the html form ? Here is the api php code: /** * This function confirms a payment made to a website using the official Bitcoin payment gateway API. * * @param string $transaction_id The unique transaction ID generated by the Bitcoin payment gateway * @param float $amount The amount of Bitcoin paid by the customer * @param string $customer_address The Bitcoin address of the customer who made the payment * @param string $website_address The Bitcoin address of the website receiving the payment * @param string $api_key The API key provided by the Bitcoin payment gateway * * @return bool Returns true if the payment is confirmed, false otherwise */ function confirmBitcoinPayment($transaction_id, $amount, $customer_address, $website_address, $api_key) { // Initialize cURL $ch = curl_init(); // Set the cURL options curl_setopt($ch, CURLOPT_URL, "https://api.bitcoinpaymentgateway.io/v1/confirm_payment"); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query([ 'transaction_id' => $transaction_id, 'amount' => $amount, 'customer_address' => $customer_address, 'website_address' => $website_address, 'api_key' => $api_key ])); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // Execute the cURL request $response = curl_exec($ch); // Check for errors if (curl_errno($ch)) { error_log("Error confirming Bitcoin payment: " . curl_error($ch)); curl_close($ch); return false; } // Close the cURL connection curl_close($ch); // Parse the response $response = json_decode($response, true); // Check if the payment is confirmed if ($response['status'] == 'confirmed') { return true; } else { return false; } } https://codepal.ai/code-generator/query/wTcOQ1Ps/php-bitcoin-payment-gateway-api-confirmation Your sample html form should give me an idea how things should get integrated. Cheers!
  2. @mc_gyver I prefer your feed-back. Is the structure of my above code safe from structure flaw ?
  3. Damn! I give-up for tonight! Nearly 2am and I still cannot figure-out why my crawler fails to extract links, meta data & page titles! Here is the latest code. Do you see any flaws ? I get no errors. What the heck is wrong! Put my codes inside functions this time to make it look neater. For some reason, this forum messes up my code indentations. So, best you copy & paste the following and test on your localhost. FULL CODE ```` <?php ini_set('display_errors',1); ini_set('display_startup_errors',1); error_reporting(E_ALL); //START OF SCRIPT FLOW. //Preparing Crawler & Session: Initialising Variables. //Preparing $ARRAYS For Step 1: To Deal with Xml Links meant for Crawlers only. //Data Scraped from SiteMaps or Xml Files. $sitemaps = []; //This will list extracted further Xml SiteMap links (.xml) found on Sitemaps (.xml). $sitemaps_last_mods = []; //This will list dates of SiteMap pages last modified - found on Sitemaps. $sitemaps_change_freqs = []; //his will list SiteMap dates of html pages frequencies of page updates - found on Sitemaps. $sitemaps_priorities = []; //This will list SiteMap pages priorities - found on Sitemaps. //Data Scraped from SiteMaps or Xml Files. $html_page_urls = []; //This will list extracted html links Urls (.html, .htm, .php) - found on Sitemaps (.xml). $html_page_last_mods = []; //This will list dates of html pages last modified - found on Sitemap. $html_page_change_freqs = []; //his will list dates of html pages frequencies of page updates - found on Sitemaps. $html_page_priorities = []; //This will list html pages priorities - found on Sitemaps. //Preparing $ARRAYS For Step 2: To Deal with html pages meant for Human Visitors only. //Data Scraped from Html Files. Not Xml SiteMap Files. $html_page_meta_names = []; //This will list crawled pages Meta Tag Names - found on html pages. $html_page_meta_descriptions = []; //This will list crawled pages Meta Tag Descriptions - found on html pages. $html_page_titles = []; //This will list crawled pages Titles - found on html pages. // ----- //Step 1: Initiate Session - Feed Xml SiteMap Url. Crawing Starting Point. //Crawl Session Starting Page/Initial Xml Sitemap. $initial_url = "https://www.rocktherankings.com/sitemap_index.xml"; //Has more xml files. $xml = file_get_contents($initial_url); //Should I stick to this line or below line ? //Parse the sitemap content to object //$xml = simplexml_load_string($initial_url); //Should I stick to this line or above line ? $dom = new DOMDocument(); $dom->loadXML($xml); echo __LINE__; echo '<br>'; //LINE: 334 extract_links($xml); echo __LINE__; echo '<br>'; //LINE: 338 foreach($sitemaps AS $sitemap) { echo __LINE__; echo '<br>'; extract_links($sitemap); //Extract Links on page. } foreach($html_page_urls AS $html_page_url) { echo __LINE__; echo '<br>'; extract_links($html_page_url); //Extract Links on page. } scrape_page_data(); //Scrape Page Title & Meta Tags. //END OF SCRIPT FLOW. //FUNCTIONS BEYOND THIS POINT. //Links Extractor. function extract_links() { echo __LINE__; echo '<br>'; //LINE: 361 GLOBAL $dom; //Trigger following IF/ELSEs on each Crawled Page to check for link types. Whether Links lead to more SiteMaps (.xml) or webpages (.html, .htm, .php, etc.). if ($dom->nodeName === 'sitemapindex') //Current Xml SiteMap Page lists more Xml SiteMaps. Lists links to Xml links. Not lists links to html links. { echo __LINE__; echo '<br>'; //parse the index // retrieve properties from the sitemap object foreach ($xml->urlset as $urlElement) //Extracts html file urls. { // get properties $sitemaps[] = $sitemap_url = $urlElement->loc; $sitemaps_last_mods[] = $last_mod = $urlElement->lastmod; $sitemaps_change_freqs[] = $change_freq = $urlElement->changefreq; $sitemaps_priorities[] = $priority = $urlElement->priority; // print out the properties echo 'url: '. $sitemap_url . '<br>'; echo 'lastmod: '. $last_mod . '<br>'; echo 'changefreq: '. $change_freq . '<br>'; echo 'priority: '. $priority . '<br>'; echo '<br>---<br>'; } } else if ($dom->nodeName === 'urlset') //Current Xml SiteMap Page lists no more Xml SiteMap links. Lists only html links. { echo __LINE__; echo '<br>'; //parse url set // retrieve properties from the sitemap object foreach ($xml->sitemapindex as $urlElement) //Extracts Sitemap Urls. { // get properties $html_page_urls[] = $html_page_url = $urlElement->loc; $html_page_last_mods[] = $last_mod = $urlElement->lastmod; $html_page_change_freqs[] = $change_freq = $urlElement->changefreq; $html_page_priorities[] = $priority = $urlElement->priority; // print out the properties echo 'url: '. $html_page_url . '<br>'; echo 'lastmod: '. $last_mod . '<br>'; echo 'changefreq: '. $change_freq . '<br>'; echo 'priority: '. $priority . '<br>'; echo '<br>---<br>'; } } GLOBAL $sitemaps; GLOBAL $sitemaps_last_mods; GLOBAL $sitemaps_change_freqs; GLOBAL $sitemaps_priorities; GLOBAL $html_page_urls; GLOBAL $html_page_last_mods; GLOBAL $html_page_change_freqs; GLOBAL $html_page_priorities; echo 'SiteMaps Crawled: ---'; echo '<br><br>'; if(array_count_values($sitemaps)>0) { print_r($sitemaps); echo '<br>'; } elseif(array_count_values($sitemaps_last_mods)>0) { print_r($sitemaps_last_mods); echo '<br>'; } elseif(array_count_values($sitemaps_change_freqs)>0) { print_r($sitemaps_change_freqs); echo '<br>'; } elseif(array_count_values($sitemaps_priorities)>0) { print_r($sitemaps_priorities); echo '<br><br>'; } echo 'Html Pages Crawled: ---'; echo '<br><br>'; if(array_count_values($html_page_urls)>0) { print_r($html_page_urls); echo '<br>'; } if(array_count_values($html_page_last_mods)>0) { print_r($html_page_last_mods); echo '<br>'; } if(array_count_values($html_page_change_freqs)>0) { print_r($html_page_change_freqs); echo '<br>'; } if(array_count_values($html_page_priorities)>0) { print_r($html_page_priorities); echo '<br>'; } } //Meta Data & Title Extractor. function scrape_page_data() { GLOBAL $html_page_urls; if(array_count_values($html_page_urls)>0) { foreach($html_page_urls AS $url) { // https://www.php.net/manual/en/function.file-get-contents $html = file_get_contents($url); //https://www.php.net/manual/en/domdocument.construct.php $doc = new DOMDocument(); // https://www.php.net/manual/en/function.libxml-use-internal-errors.php libxml_use_internal_errors(true); // https://www.php.net/manual/en/domdocument.loadhtml.php $doc->loadHTML($html, LIBXML_COMPACT|LIBXML_NOERROR|LIBXML_NOWARNING); // https://www.php.net/manual/en/function.libxml-clear-errors.php libxml_clear_errors(); // https://www.php.net/manual/en/domdocument.getelementsbytagname.php $meta_tags = $doc->getElementsByTagName('meta'); // https://www.php.net/manual/en/domnodelist.item.php if ($meta_tags->length > 0) { // https://www.php.net/manual/en/class.domnodelist.php foreach ($meta_tags as $tag) { // https://www.php.net/manual/en/domnodelist.item.php echo 'Meta Name: ' .$meta_name = $tag->getAttribute('name'); echo '<br>'; echo 'Meta Content: ' .$meta_content = $tag->getAttribute('content'); echo '<br>'; $html_page_meta_names[] = $meta_name; $html_page_meta_descriptions[] = $meta_content; } } //EXAMPLE 1: Extract Title $title_tag = $doc->getElementsByTagName('title'); if ($title_tag->length>0) { echo 'Title: ' .$title = $title_tag[0]->textContent; echo '<br>'; $html_page_titles[] = $title; } //EXAMPLE 2: Extract Title $title_tag = $doc->getElementsByTagName('title'); for ($i = 0; $i < $title_tag->length; $i++) { echo 'Title: ' .$title = $title_tag->item($i)->nodeValue . "\n"; $html_page_titles[] = $title; } } } } if(array_count_values($html_page_meta_names)>0) { print_r($html_page_meta_names); echo '<br>'; } if(array_count_values($html_page_meta_descriptions)>0) { print_r($html_page_meta_descriptions); echo '<br>'; } if(array_count_values($html_page_titles)>0) { print_r($html_page_titles); echo '<br>'; } //END OF FUNCTIONS. ```` I only get this echoed.Notice the arrays are empty. It means no data is getting extracted from pages. 334 361 SiteMaps Crawled: --- Array ( ) Html Pages Crawled: --- Array ( ) Array ( ) Array ( ) Array ( ) 338 Array ( ) Array ( ) Array ( )
  4. Which one of these should I stick to ? 1 $xml = file_get_contents($sitemap); //Should I stick to this line or below line ? 2 //Parse the sitemap content to object $xml = simplexml_load_string($sitemap); //Should I stick to this line or above line ?
  5. Security Features Programmers, I got some challenging questions for you. Is there anything I got to program my link crawler to watch-out for ? When I ask below, "How do I code ?", I mean, "which php fuctions you want me to look into ?". I do not want my web crawler getting trapped onto some domain, while crawling it. Trapped and going in a loop for some reason. And so, what to look-out for to prevent loops ? 1. I know crawlers should not spider dynamic urls as they can go in a neverending loop. And so, apart from that, what other dangers are there ? 2. I know I have to program the crawler to avoid trying crawl pages that are dead. And so, got to lookout for 404 pages. And what other numbers got to lookout for ? I need a list of error numbers to feed my crawler. 3. I do not want any hacker/crook/fraud calling my crawler (pinging it) to crawl bad natured pages. Pages that are phishing pages. And so, how do I write code for my crawler to identify phishing pages so it does not crawl or index them on my searchengine ? 4. I do not want any hacker/crook/fraud calling my crawler (pinging it) to crawl his pages that are infected with virus, worm, ant, spyware, etc. Pages that will infect my crawler to carry infections to other domains it crawls afterwards. And so, how do I write code for my crawler to identify infected pages so it does not crawl or index them on my searchengine nor carry the infections to third party domains ? Would you like to add your own stuff in number 5 ?
  6. @ignace Since you are expert in preventing malicioius injections, is my crawler code safe ? Crawlers cannot be trapped by hackers on their sites can they ? I mean, let us say a crook called my crawler to one of his malicious or phishing sites, is he able to trap my crawler and inject virus so my crawler dumps viruses and malicious code onto my searchengine index by the crawler ? Or, worst, can my crawler carry the virus on other sites it crawls and infect them ? Good question. Yes ? What you think of my code above ? Is it orthodox or weird ? I cannot think of any better basic logics than the ones I used. What you say ?
  7. @kicken Scratching my head why no link and their meta tags & titles are getting extracted by this crawler. I have done all the basic logics. See for yourself. have i missed out any logic ? Can the below code get any shorter or not so I can easily spot where the issue is as I get no error and no proper result either. Just get echoed: **SiteMaps Crawled: --- Array ( ) Html Pages Crawled: --- Array ( ) Array ( ) Array ( ) Array ( ) ** I am really really puzzled. FULL CRAWLER <?php ini_set('display_errors',1); ini_set('display_startup_errors',1); error_reporting(E_ALL); //Preparing Crawler & Session: Initialising Variables. //Preparing $ARRAYS For Step 1: To Deal with Xml Links meant for Crawlers only. //Data Scraped from SiteMaps or Xml Files. $sitemaps = []; //This will list extracted further Xml SiteMap links (.xml) found on Sitemaps (.xml). $sitemaps_last_mods = []; //This will list dates of SiteMap pages last modified - found on Sitemap. $sitemaps_change_freqs = []; //his will list SiteMap dates of html pages frequencies of page updates - found on Sitemaps. $sitemaps_priorities = []; //This will list SiteMap pages priorities - found on Sitemaps. //Data Scraped from SiteMaps or Xml Files. $html_page_urls = array(); //This will list extracted html links Urls (.html, .htm, .php) - found on Sitemaps (.xml). $html_page_last_mods = []; //This will list dates of html pages last modified - found on Sitemap. $html_page_change_freqs = []; //his will list dates of html pages frequencies of page updates - found on Sitemaps. $html_page_priorities = []; //This will list html pages priorities - found on Sitemaps. //Preparing $ARRAYS For Step 2: To Deal with html pages meant for Human Visitors only. //Data Scraped from Html Files. Not Xml SiteMap Files. $html_page_titles = []; //This will list crawled pages Titles - found on html pages. $html_page_meta_names = []; //This will list crawled pages Meta Tag Names - found on html pages. $html_page_meta_descriptions = []; //This will list crawled pages Meta Tag Descriptions - found on html pages. // ----- //Step 1: Initiate Session - Feed Xml SiteMap Url. Crawing Starting Point. //Crawl Session Starting Page/Initial Xml Sitemap. $sitemap = "https://www.rocktherankings.com/sitemap_index.xml"; //Has more xml files. $xml = file_get_contents($sitemap); //Should I stick to this line or below line ? // parse the sitemap content to object //$xml = simplexml_load_string($sitemap); //Should I stick to this line or above line ? $dom = new DOMDocument(); $dom->loadXML($xml); extract_links(); function extract_links() { GLOBAL $dom; //Trigger following IF/ELSEs on each Crawled Page to check for link types. Whether Links lead to more SiteMaps (.xml) or webpages (.html, .htm, .php, etc.). if ($dom->nodeName === 'sitemapindex') //Current Xml SiteMap Page lists more Xml SiteMaps. Lists links to Xml links. Not lists links to html links. { //parse the index // retrieve properties from the sitemap object foreach ($xml->urlset as $urlElement) //Extracts html file urls. { // get properties $sitemaps[] = $sitemap_url = $urlElement->loc; $sitemaps_last_mods[] = $last_mod = $urlElement->lastmod; $sitemaps_change_freqs[] = $change_freq = $urlElement->changefreq; $sitemaps_priorities[] = $priority = $urlElement->priority; // print out the properties echo 'url: '. $sitemap_url . '<br>'; echo 'lastmod: '. $last_mod . '<br>'; echo 'changefreq: '. $change_freq . '<br>'; echo 'priority: '. $priority . '<br>'; echo '<br>---<br>'; } } else if ($dom->nodeName === 'urlset') //Current Xml SiteMap Page lists no more Xml SiteMap links. Lists only html links. { //parse url set // retrieve properties from the sitemap object foreach ($xml->sitemapindex as $urlElement) //Extracts Sitemap Urls. { // get properties $html_page_urls[] = $html_page_url = $urlElement->loc; $html_page_last_mods[] = $last_mod = $urlElement->lastmod; $html_page_change_freqs[] = $change_freq = $urlElement->changefreq; $html_page_priorities[] = $priority = $urlElement->priority; // print out the properties echo 'url: '. $html_page_url . '<br>'; echo 'lastmod: '. $last_mod . '<br>'; echo 'changefreq: '. $change_freq . '<br>'; echo 'priority: '. $priority . '<br>'; echo '<br>---<br>'; } } else { //Scrape Webpage Data as current page is an hmtl page for visitors and no Xml SiteMap page for Crawlers. //scrape_page_data(); //Scrape Page Title & Meta Tags. } GLOBAL $sitemaps; GLOBAL $sitemaps_last_mods; GLOBAL $sitemaps_change_freqs; GLOBAL $sitemaps_priorities; GLOBAL $html_page_urls; GLOBAL $html_page_last_mods; GLOBAL $html_page_change_freqs; GLOBAL $html_page_priorities; echo 'SiteMaps Crawled: ---'; echo '<br><br>'; if(array_count_values($sitemaps)>0) { print_r($sitemaps); echo '<br>'; } elseif(array_count_values($sitemaps_last_mods)>0) { print_r($sitemaps_last_mods); echo '<br>'; } elseif(array_count_values($sitemaps_change_freqs)>0) { print_r($sitemaps_change_freqs); echo '<br>'; } elseif(array_count_values($sitemaps_priorities)>0) { print_r($sitemaps_priorities); echo '<br><br>'; } echo 'Html Pages Crawled: ---'; echo '<br><br>'; if(array_count_values($html_page_urls)>0) { print_r($html_page_urls); echo '<br>'; } if(array_count_values($html_page_last_mods)>0) { print_r($html_page_last_mods); echo '<br>'; } if(array_count_values($html_page_change_freqs)>0) { print_r($html_page_change_freqs); echo '<br>'; } if(array_count_values($html_page_priorities)>0) { print_r($html_page_priorities); echo '<br>'; } } foreach($sitemaps AS $sitemap) { extract_links(); } foreach($html_page_urls AS $html_page_url) { extract_links(); } scrape_page_data(); //Scrape Page Title & Meta Tags. function scrape_page_data() { GLOBAL $html_page_urls; if(array_count_values($html_page_urls)>0) { foreach($html_page_urls AS $url) { //Extract Page's Meta Data & Title. file_get_contents($url); // https://www.php.net/manual/en/function.file-get-contents $html = file_get_contents($url); //https://www.php.net/manual/en/domdocument.construct.php $doc = new DOMDocument(); // https://www.php.net/manual/en/function.libxml-use-internal-errors.php libxml_use_internal_errors(true); // https://www.php.net/manual/en/domdocument.loadhtml.php $doc->loadHTML($html, LIBXML_COMPACT|LIBXML_NOERROR|LIBXML_NOWARNING); // https://www.php.net/manual/en/function.libxml-clear-errors.php libxml_clear_errors(); // https://www.php.net/manual/en/domdocument.getelementsbytagname.php $meta_tags = $doc->getElementsByTagName('meta'); // https://www.php.net/manual/en/domnodelist.item.php if ($meta_tags->length > 0) { // https://www.php.net/manual/en/class.domnodelist.php foreach ($meta_tags as $tag) { // https://www.php.net/manual/en/domnodelist.item.php echo 'Name: ' .$name = $tag->getAttribute('name'); echo '<br>'; echo 'Content: ' .$content = $tag->getAttribute('content'); echo '<br>'; } } //EXAMPLE 1: Extract Title $title_tag = $doc->getElementsByTagName('title'); if ($title_tag->length>0) { echo 'Title: ' .$title = $title_tag[0]->textContent; echo '<br>'; } //EXAMPLE 2: Extract Title $title_tag = $doc->getElementsByTagName('title'); for ($i = 0; $i < $title_tag->length; $i++) { echo $title_tag->item($i)->nodeValue . "\n"; } } } } ?> Want to see how much you are able to cut it short. Thanks
  8. Thanks. I know about the scopes. Just overlooked the fact that I was calling the $var from a function. Error is gone now. Nevertheless, why I get no urls crawled or extracted ? Get no errors, either. Strange! I just get this echoed: SiteMaps Crawled: --- Array ( ) Html Pages Crawled: --- Array ( ) Array ( ) Array ( ) Array ( ) As you can see, the starting point link does have urls on it's pages: <?php ini_set('display_errors',1); ini_set('display_startup_errors',1); error_reporting(E_ALL); //Preparing Crawler & Session: Initialising Variables. //Preparing $ARRAYS For Step 1: To Deal with Xml Links meant for Crawlers only. //Data Scraped from SiteMaps or Xml Files. $sitemaps = []; //This will list extracted further Xml SiteMap links (.xml) found on Sitemaps (.xml). $sitemaps_last_mods = []; //This will list dates of SiteMap pages last modified - found on Sitemap. $sitemaps_change_freqs = []; //his will list SiteMap dates of html pages frequencies of page updates - found on Sitemaps. $sitemaps_priorities = []; //This will list SiteMap pages priorities - found on Sitemaps. //Data Scraped from SiteMaps or Xml Files. $html_page_urls = array(); //This will list extracted html links Urls (.html, .htm, .php) - found on Sitemaps (.xml). $html_page_last_mods = []; //This will list dates of html pages last modified - found on Sitemap. $html_page_change_freqs = []; //his will list dates of html pages frequencies of page updates - found on Sitemaps. $html_page_priorities = []; //This will list html pages priorities - found on Sitemaps. //Preparing $ARRAYS For Step 2: To Deal with html pages meant for Human Visitors only. //Data Scraped from Html Files. Not Xml SiteMap Files. $html_page_titles = []; //This will list crawled pages Titles - found on html pages. $html_page_meta_names = []; //This will list crawled pages Meta Tag Names - found on html pages. $html_page_meta_descriptions = []; //This will list crawled pages Meta Tag Descriptions - found on html pages. // ----- //Step 1: Initiate Session - Feed Xml SiteMap Url. Crawing Starting Point. //Crawl Session Starting Page/Initial Xml Sitemap. $sitemap = "https://www.rocktherankings.com/sitemap_index.xml"; //Has more xml files. $xml = file_get_contents($sitemap); //Should I stick to this line or below line ? // parse the sitemap content to object //$xml = simplexml_load_string($sitemap); //Should I stick to this line or above line ? $dom = new DOMDocument(); $dom->loadXML($xml); //Trigger following IF/ELSEs on each Crawled Page to check for link types. Whether Links lead to more SiteMaps (.xml) or webpages (.html, .htm, .php, etc.). if ($dom->nodeName === 'sitemapindex') //Current Xml SiteMap Page lists more Xml SiteMaps. Lists links to Xml links. Not lists links to html links. { //parse the index // retrieve properties from the sitemap object foreach ($xml->urlset as $urlElement) //Extracts html file urls. { // get properties $sitemaps[] = $sitemap_url = $urlElement->loc; $sitemaps_last_mods[] = $last_mod = $urlElement->lastmod; $sitemaps_change_freqs[] = $change_freq = $urlElement->changefreq; $sitemaps_priorities[] = $priority = $urlElement->priority; // print out the properties echo 'url: '. $sitemap_url . '<br>'; echo 'lastmod: '. $last_mod . '<br>'; echo 'changefreq: '. $change_freq . '<br>'; echo 'priority: '. $priority . '<br>'; echo '<br>---<br>'; } } else if ($dom->nodeName === 'urlset') //Current Xml SiteMap Page lists no more Xml SiteMap links. Lists only html links. { //parse url set // retrieve properties from the sitemap object foreach ($xml->sitemapindex as $urlElement) //Extracts Sitemap Urls. { // get properties $html_page_urls[] = $html_page_url = $urlElement->loc; $html_page_last_mods[] = $last_mod = $urlElement->lastmod; $html_page_change_freqs[] = $change_freq = $urlElement->changefreq; $html_page_priorities[] = $priority = $urlElement->priority; // print out the properties echo 'url: '. $html_page_url . '<br>'; echo 'lastmod: '. $last_mod . '<br>'; echo 'changefreq: '. $change_freq . '<br>'; echo 'priority: '. $priority . '<br>'; echo '<br>---<br>'; } } else { //Scrape Webpage Data as current page is an hmtl page for visitors and no Xml SiteMap page for Crawlers. //scrape_page_data(); //Scrape Page Title & Meta Tags. } echo 'SiteMaps Crawled: ---';echo '<br><br>'; if(array_count_values($html_page_urls)>0) { print_r($sitemaps); echo '<br>'; } elseif(array_count_values($sitemaps_last_mods)>0) { print_r($sitemaps_last_mods); echo '<br>'; } elseif(array_count_values($sitemaps_change_freqs)>0) { print_r($sitemaps_change_freqs); echo '<br>'; } elseif(array_count_values($sitemaps_priorities)>0) { print_r($sitemaps_priorities); echo '<br><br>'; } echo 'Html Pages Crawled: ---'; echo '<br><br>'; if(array_count_values($html_page_urls)>0) { print_r($html_page_urls); echo '<br>'; } if(array_count_values($html_page_last_mods)>0) { print_r($html_page_last_mods); echo '<br>'; } if(array_count_values($html_page_change_freqs)>0) { print_r($html_page_change_freqs); echo '<br>'; } if(array_count_values($html_page_priorities)>0) { print_r($html_page_priorities); echo '<br>'; } scrape_page_data(); //Scrape Page Title & Meta Tags. function scrape_page_data() { GLOBAL $html_page_urls; if(array_count_values($html_page_urls)>0) { foreach($html_page_urls AS $url) { //Extract Page's Meta Data & Title. file_get_contents($url); // https://www.php.net/manual/en/function.file-get-contents $html = file_get_contents($url); //https://www.php.net/manual/en/domdocument.construct.php $doc = new DOMDocument(); // https://www.php.net/manual/en/function.libxml-use-internal-errors.php libxml_use_internal_errors(true); // https://www.php.net/manual/en/domdocument.loadhtml.php $doc->loadHTML($html, LIBXML_COMPACT|LIBXML_NOERROR|LIBXML_NOWARNING); // https://www.php.net/manual/en/function.libxml-clear-errors.php libxml_clear_errors(); // https://www.php.net/manual/en/domdocument.getelementsbytagname.php $meta_tags = $doc->getElementsByTagName('meta'); // https://www.php.net/manual/en/domnodelist.item.php if ($meta_tags->length > 0) { // https://www.php.net/manual/en/class.domnodelist.php foreach ($meta_tags as $tag) { // https://www.php.net/manual/en/domnodelist.item.php echo 'Name: ' .$name = $tag->getAttribute('name'); echo '<br>'; echo 'Content: ' .$content = $tag->getAttribute('content'); echo '<br>'; } } //EXAMPLE 1: Extract Title $title_tag = $doc->getElementsByTagName('title'); if ($title_tag->length>0) { echo 'Title: ' .$title = $title_tag[0]->textContent; echo '<br>'; } //EXAMPLE 2: Extract Title $title_tag = $doc->getElementsByTagName('title'); for ($i = 0; $i < $title_tag->length; $i++) { echo $title_tag->item($i)->nodeValue . "\n"; } } } } ?> That is my latest update. What do you think about it and why you think I getting echoed no links ?
  9. @ignace I would appreciate you chiming in too. Talking about what I asked Kicken above.
  10. @kicken Checking what I asked rprofitt above, have you done any api stuffs like this before ? Do you have experience with api stuff ? Have you ever tried the official BitCoin blockchain api ? Tonight, I asked Bing Api this (on bing.com): *"I want to accept Bit Coin payments on my website using the official Bit Coin payment gateway, providing I will get no extra charges apart from the usual transaction fee. Can you write me a php script to accept Bit Coin payments using the official Bit Coin payment gateway ?"* It replied: **"I’m sorry, but I’m not able to write code for you. However, I can help you find resources and tutorials on how to integrate the official Bitcoin payment gateway into your website using PHP. Would you like me to search for that information for you?".** I chose the option: *"Yes, please search for that information."* It replied: **"There are several resources available that can help you integrate the official Bitcoin payment gateway into your website using PHP. One such resource is the GoUrl PHP script which is a Bitcoin API payment gateway that you can install on your site 1. You can also find more information on integrating Bitcoin payment gateway in PHP on websites like Stack Overflow 2 and CodeCanyon 3. https://stackoverflow.com/questions/47009893/bitcoin-payment-gateway-in-php https://codecanyon.net/category/php-scripts?term=bitcoin payment gateway https://github.com/topics/bitcoin-payment Is there anything else you would like to know?"** I asked: *"The 3 links you recommended, do they provide me with a free php script or paid ones as I am only looking for free ones."* It replied: **"According to the information available on their website, GoUrl is a free and open-source Bitcoin payment gateway that charges no monthly fee and has a transaction fee starting from 0% 1. This means that you will only have to pay the usual official Bitcoin blockchain transaction fees when using GoUrl. Is there anything else you would like to know?".** Now, it was giving me readymade questions for me to select one. And so I started doing this .... *"Tell me more about GoUrl PHP script."* **"GoUrl is a free and open-source Bitcoin payment gateway that allows you to accept Bitcoin payments on your website. It supports several cryptocurrencies including Bitcoin, BitcoinCash, DASH, Dogecoin, Litecoin, Reddcoin, Feathercoin, Vertcoin, Potcoin and Speedcoin . The GoUrl PHP script is easy to install and integrate into your website and provides a range of features such as Pay-Per-Download, Pay-Per-Product, Pay-Per-Membership and Pay-Per-Page/Video-Access . Is there anything else you would like to know?"** *"What are the benefits of using GoUrl PHP script?"* "Some benefits of using the GoUrl PHP script include: **"It is free and open-source, meaning that you can use it without any additional costs apart from the usual Bitcoin blockchain transaction fees. It supports several cryptocurrencies including Bitcoin, BitcoinCash, DASH, Dogecoin, Litecoin, Reddcoin, Feathercoin, Vertcoin, Potcoin and Speedcoin. It is easy to install and integrate into your website. It provides a range of features such as Pay-Per-Download, Pay-Per-Product, Pay-Per-Membership and Pay-Per-Page/Video-Access. It allows you to accept payments directly to your Bitcoin or altcoin wallets without the need for a bank account or ID. Is there anything else you would like to know?"** *"How do I install GoUrl PHP script on my site?"* **"To install the GoUrl PHP script on your website, you can follow these steps: Download the GoUrl PHP script from their website or Github repository 12. Unpack the downloaded archive to a directory on your website. Open the configuration file and enter your details such as your public and private keys. Place the GoUrl PHP script on your website where you want to accept payments. You can find more detailed instructions on their website 1. Is there anything else you would like to know?" ** *"How does GoUrl compare to other payment gateways?"* "https://www.bing.com/aclick?ld=e8gMm_-8epiQQS_OafckDJEzVUCUwlUVJsUXbKUUrDChMtzio25bo59cmElvMdcv0ZfOQsAsZQh66LAFFX_V9s8CpchxHaAWyNRyCSAWmwaLvZilhAQkfaiRw9puzVSopWG565InZ-bPakvSV4hkn37BiHhEgAnknxVn3cLkQhWwQCsVLv&u=aHR0cHMlM2ElMmYlMmZ3d3cudG9wMTAuY29tJTJmbWVyY2hhbnQtc2VydmljZXMlMmZwYXltZW50LWdhdGV3YXktY29tcGFyaXNvbiUzZmJrdyUzZHBheW1lbnQlMjUyMGdhdGV3YXlzJTI2YmNhbXBpZCUzZDUzMDU0MDM1JTI2YmNhbXAlM2RDQ1AlMjUyME1TJTI1MjBEVCUyNTIwRSUyNTJCUCUyNTIwVENQQSUyNTIwLSUyNTIwQmluZyUyNmJhZ2lkJTNkMzE4OTMwMzA2MyUyNmJhZyUzZFBheW1lbnQlMjUyMEdhdGV3YXklMjZidGFyaWQlM2Rrd2QtMTU3OTg0OTQ1MTclMjZiaWRtJTNkYnAlMjZibmV0JTNkbyUyNmJkJTNkYyUyNmJtb2J2YWwlM2QwJTI2YnQlM2RzZWFyY2glMjZ1dG1fc291cmNlJTNkYmluZyUyNnV0bV9tZWRpdW0lM2RjcGMlMjZ1dG1fdGVybSUzZEdvVXJsJTI1MjB2cyUyNTIwb3RoZXIlMjUyMHBheW1lbnQlMjUyMGdhdGV3YXlzJTI2dXRtX2NhbXBhaWduJTNkQmluZyUyYkNQQyUyYkNhbXBhaWduJTI2YyUzZDczNzM2MjM1Njc2ODQ1JTI2bSUzZHAlMjZrJTNkMTU3OTg0OTQ1MTclMjZiaW50ZXJlc3QlM2QlMjZicGh5c2ljYWwlM2QxNDE5ODYlMjZiZmVlZGlkJTNkJTI2YSUzZEIyNjElMjZ0cyUzZCUyNnRvcGljJTNkJTI2dXBmJTNkJTI2dGVzdCUzZCUyNmNsaWNrdHlwZSUzZCUyNm1zY2xraWQlM2QxY2E5MDhlY2EzOTAxOTg5YWNlYjZjYjQ4ODJkMjNkNw&rlid=1ca908eca3901989aceb6cb4882d23d7"** *"What other free payment gateways can I use apart from GoUrl?"* **"There are several other free payment gateways that you can use apart from GoUrl. Some of the top free payment gateways include PaySimple, Dwolla, and PayLane 1. These payment gateways offer a free trial version with some limitations on time or features. You can compare their features and pricing to see which one best fits your needs. Is there anything else you would like to know?"** Sunrise. So ending the chat with bing.com. I hope you will be able to concisely teach me how to adjust the api to my tailor.
  11. Folks, What the heck is this! I signed upto ChatGpt using Hotmail. I get error. Then, did same with Gmail. Got same nonsense. You experienced following nonsense ? Oops! The email you provided is not supported. Please contact us through our help center if this issue persists.
  12. @kicken Just mentioning these incase they become handy for you. I am reading these now: https://beebom.com/how-use-chatgpt-4-free/ https://beebom.com/chatgpt-plugins-announced/ https://beebom.com/google-bard-ai-vs-microsoft-bing-chat/ Really getting into these AI stuffs now.
  13. @kicken Just signed up. Now, I get this scammy mssg: https://bitcointalk.org/index.php?action=credit;unproxyban "You are unable to post. How to enable posting. Bitcoin Forum > Remove Proxyban Your IP address has previously been used for evil on this forum, or it is a known proxy/VPN/Tor exit node, so you are required to pay a small fee before you are able to post messages or send PMs. You can still use all of the read-only features without paying. Your account contains 186.84 units of evil. To atone, you must pay a total of 0.00032623 bitcoins (0.32623 mBTC; 32623 satoshi). Pay to the address 3HtRc4GscEuCFbNr15EXvUKy6ttJ4LvVyA. Once you have paid the full amount, wait a few seconds and then reload this page. If the fee is so small that your wallet is unable to send it, you can send any larger amount, though you will not be refunded the difference. Alternatively, the people listed here can whitelist you for free. Paying the fee is probably easier/quicker, though. People commonly say, "I haven't posted anything before, so why on Earth am I banned?" You are not banned. Your IP address was previously used for evil, so everyone who uses that IP address has to pay this registration fee." I never visited this forum before. And so no question of my ip was upto no good in past. Sounding like a big scam. Did you experience this nonsense, ever ?
  14. @ignace Mmm.I just went through your post: Posted Friday at 01:07 PM. Now, I got more questions. Like 10. But, I think I won't have any questions or hardly any, if your code was in procedural style. Oop lines I do not understand and we are gonna go in a loop where I ask you questions after questions and bore you to death. And I won;t be able to memorise any of the oop lines since I do not completely understand. You want to get some basics of security across to me and so you showed some examples. Now, because I don't understand oop, it's resulting in my having questions after questions. To cut thinks short, best you proceduralize your code and add your blocks to my original code and give it to me. I think with one or two looks I will understand your security lines without needing to ask anymore tonnes of questions.
  15. @ignace I not in the habit of noticing things. Just noticed you are a MOD. So, do excuse me addressing you as "good man" as if I was your senior. Nearly addressed "Good Boy!" back then but changed my mind thinking you might think I am being sarcastic. Anyways, you have got a lot of explaining to do. I'm afraid! I can't keep your method waiting to be implemented any longer. (I remember now, few yrs back Mc_Gyver used to complain why I never fixed my code based on his feed-back. Frankly, I tried my best. But it seems there were few things I did not understand, (I thought I did), and so missed out a few things to update and he was all over my case now and then getting cross, thinking I deliberately ignoring his advice. Don't want you thinking the same). Now, do I ? So, here goes. Let's get rolling. Ready to update my code based on your samples. But, I ain't no copy-cat who asks people for code samples just to copy-paste without learning anything atall. And so, I am going to ask you a lot of questions about your code which will sound weird and childish and frustrating. But, can't be helped. Remember, I do not know the basics of programming. Well, maybe I do a little.But not in the REAL sense. Only memorized the basic php lang syntax and some functions and copied codes from tutorials and forums and memorized them and then cooked-up the codes you see in this thread. (And no, I do not memorize things I do not understand). Cooked them up with a little help here and there, ofcourse. And after a few test, experiments, fiddling and procrastinating about. Can't be pretending, I understood all your code lines and drift into procrastinating again, thinking I understood every line of code, even though I probably managed to understand the spirit of your code. Have to understand every line. Else, a waste of your time trying to teach me something that is going through one ear and will go out the other DOUBLE QUICK (as USUAL). Let's do a combing operation on your lines of code. Bear in mind, I do not know the basics of oop. Do not understand the difference between the procedural style & oop programming. Only understand, whether right or wrong, that the former goes in a straight flow, like a water-fall. While with latter, you can jump here and there like a frig. Up, down, up. All over the script flow. And oop re-uses the same lines of codes, through a lot of referencing. That's what I understood the benefits of oop. But still do not understand what oop really is. All that object and class stuff confuses me. Using the class, I do not understand why it cannot be used in procedural style and how using the class turns a code from procedural style to oop. I just do not get it. I guess, I am not made for adv programming. Just basic layman degree. You might be able to change that. For good.This time, where others have failed. NEW QUESTIONS ON YOUR SECURED CODE 1. >>Also don't do this: SELECT id,domain,word,phrase from $tbl WHERE $col_1 = ? Because now it's easy for a hacker to do something like: ?col_1=1;DELETE FROM users WHERE id&input_1=1 << I have read many tutorials on how sql is injected. They always say, hacker could input this and that. Just how will he input it ? Via the html form (POST METHOD) or the url (GET METHOD) ? So, limiting what values could be inputted will not foil the injection attempt ? Like the value has to be from a white-list (array) ? ALthough, my current scripts you see above do not do the white-listing using arrays, I have done in past. (Also, let's limit from where SQL commands come come to my db. Only from my domain. That way, hacker cannot build his own fake form on his localhost or webserver and gateway it to my db. How to write code to do that ? Good idea ? I know it won't solve all problems but atleast some *rs*hole won't get the satisfaction to build his own html form on his Xampp and connect it to my gateway and LAZILY start sending SQL injections. Got to keep the **ck*rs on their feet. On their tip toes where if they try too much they will tip over the fringe down mount everest in an ice pool. Ready to be frozen). On some other form, we will try cooking up TRAPS for hackers. Give them a hard-time. Screw up their PCs, if possible. Teach them a LESSON not to mess with us ever AGAIN. 2. >>Also don't do this: SELECT id,domain,word,phrase from $tbl WHERE $col_1 = ? Because now it's easy for a hacker to do something like: ?col_1=1;DELETE FROM users WHERE id&input_1=1 Instead do something like: /** * @param array{col_1: string} $fields * @param string $operator Possible values AND, OR * @param int $limit Possible value between 0 and 100, if invalid defaults to 100 * @param int $offset * *@return array{items: array, total_items: int} */ function page_search(array $fields, string $operator, int $limit, int $offset = 0): array { $where = $values = []; $operator = in_array($operator, ['OR', 'AND']) ? $operator : 'AND'; $limit = 0 < $limit && $limit < 100 ? $limit : 100; $offset = 0 <= $offset ? $offset : 0; >>By dividing your program into little pieces you reduce the cognitive load necessary to work on pieces of your application or to find and fix bugs. << So, I should not use $vars but hard code CONSTANTS instead ? That is always safe ? OK. Explain this comment line. What is going on here ? >>* @param array{col_1: string} $fields<< 3. >> * @param string $operator Possible values AND, OR << Your comment is telling me, the OPERATOR data type is STRING only. You limited to that ? Yes ? If so, let me see & learn, just how you did the restricting! This is GOLD NUGGET. I was trying to thinkup my own function or method to do these restrictions. Best not re-invent the wheel if by default php has a method. Or, you created a custom function. >>$operator = in_array($operator, ['OR', 'AND']) ? $operator : 'AND';<< I see you placed no restrictions here ? Can it not be done here ? I see you placed restriction there on the following line, instead. But cannot the restriction be placed in the above line ? >> function page_search(array $fields, string $operator, int $limit, int $offset = 0): array { $where = $values = [];<< I mean, something like these ... $operator = in_array(string $operator, ['OR', 'AND']) ? string $operator : 'AND'; $operator = in_array((string) $operator, ['OR', 'AND']) ? (string) $operator : 'AND'; ..Similar to the Type Casting. You will tell me why don't I test it and see if it works ? My answer: It might fail if I slipped in one place but you may no how not to slip. 4. >>The same goes for the database. I see you use mysqli_connect in the script with the values hardcoded. Instead you should create a file that holds your configuration values: // bootstrap.php define('DB_HOST', 'localhost'); define('DB_USER', 'the_username'); define('DB_PASS', 'the_password'); define('DB_NAME', 'the_database'); define('DB_PORT', 3306); define('DB_CHARSET', 'utf8mb4'); define('EMERGENCY_EMAIL', 'some@email.com'); // in case of fatal errors // other configuration values If no hard coding the SQL leaves room for sql injection then how come hard coding the DB connection is bad ? Did you not mean earlier to hard code CONSTANTS than use $vars as that leaves room for data injection ? Now, I am scratching my head. When should I hard code CONSTANT as values and when $vars as values for security purpose ? How do you figure when to do what out of the two ? What exactly are you trying to generate on this block here ? And, why specifically this way ? foreach ($fields as $field => $term) { switch ($field) { case 'col_1': $where[] = 'col_1 = ?'; $values[] = $term; break; // other fields you want to allow to search on } What are you doing in this block ? $result = [ 'items' => [], 'total_items' => 0, ]; Explain separately, what you are doing in this block: if ([] === $where) { return $result; } Explain this block separately: $result['items'] = db_fetch_all('SELECT * FROM some_table WHERE ' . implode($operator, $where) . " LIMIT $offset, $limit", $values); if (count($result['items'])) { // only execute a count() query if we have a result $result['total_items'] = db_fetch_column('SELECT count(*) FROM some_table WHERE ' . implode($operator, $where), $values); } return $result; } Result ? Execute only if there is a result ? What result ? What result are you expecting ? What will happen if there is no result, though ? Now,we are really gonna get somewhere. Making a good pace. Lol!
  16. @mc_gyver Oops! Sorry! Ignore my last post. As after refreshing this page, I see Ignace has answered on your behalf!
  17. @kicken You suggested I go to a cryptos forum to ask questions on it. I reckon best I do same for ChatGpt type AI. You aware of any forums where I will get good response and won't have top wait days or even months for an answer ? Anyway, trying an AI now for the first time. I want to ask it to build me an api in php so say when you pay my searchengine in BitCoin, then I want your account with my searchngine to add credits (balance) to your account. I myself got noe xperiences with api programming and so best I get some php ai code it for me. But, no matter how good these ai are, they won't be able to understand every word I write like I doing here to you. So, to be brief, precise & concise, how should I rephrase my question in 2 sentences to the ai to write me the php api of bitcoin ? I asking you because you got experience with ChatGpt. I always find the link dead. Even when you gave it to me the other day. So, got no experience with AI like ChatGpt. I am over here right now: https://codepal.ai/code-generator/php Scratching my head how to phrase my question to it. EDIT: I just input this in the ai: "Write a function in php that writes bitcoin payment gateway api to confirm payment to my website". It generated this code: /** * Confirms a Bitcoin payment using the Coinbase Commerce API * * @param string $paymentId The payment ID returned by Coinbase Commerce * @param string $apiKey The API key for Coinbase Commerce * @return bool Returns true if the payment is confirmed, false otherwise */ function confirmBitcoinPayment($paymentId, $apiKey) { // Set the API endpoint and headers $url = "https://api.commerce.coinbase.com/charges/$paymentId"; $headers = array( "Content-Type: application/json", "X-CC-Api-Key: $apiKey" ); // Send a GET request to the API endpoint $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HTTPHEADER, $headers); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $response = curl_exec($ch); curl_close($ch); // Parse the response JSON $json = json_decode($response, true); // Check if the payment is confirmed if ($json["data"]["status"] == "CONFIRMED") { return true; } else { return false; } } I can see CoinBase mentioned. I do not want to pay their transaction fees. Only BlockChain fee. So, this code will be no good. Anyway, asked the same question again but this time asked it to use official bitcoin gateway. It gave me this code: /** * This function confirms a payment made to a website using the official Bitcoin payment gateway API. * * @param string $transaction_id The unique transaction ID generated by the Bitcoin payment gateway * @param float $amount The amount of Bitcoin paid by the customer * @param string $customer_address The Bitcoin address of the customer who made the payment * @param string $website_address The Bitcoin address of the website receiving the payment * @param string $api_key The API key provided by the Bitcoin payment gateway * * @return bool Returns true if the payment is confirmed, false otherwise */ function confirmBitcoinPayment($transaction_id, $amount, $customer_address, $website_address, $api_key) { // Initialize cURL $ch = curl_init(); // Set the cURL options curl_setopt($ch, CURLOPT_URL, "https://api.bitcoinpaymentgateway.io/v1/confirm_payment"); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query([ 'transaction_id' => $transaction_id, 'amount' => $amount, 'customer_address' => $customer_address, 'website_address' => $website_address, 'api_key' => $api_key ])); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // Execute the cURL request $response = curl_exec($ch); // Check for errors if (curl_errno($ch)) { error_log("Error confirming Bitcoin payment: " . curl_error($ch)); curl_close($ch); return false; } // Close the cURL connection curl_close($ch); // Parse the response $response = json_decode($response, true); // Check if the payment is confirmed if ($response['status'] == 'confirmed') { return true; } else { return false; } } What you think. Is this code ok ? If it is, then imagine you have an account over at my searchengine with the username "kicken", Now, which part of the code I change to add the "kicken" account ? Using my basic LOGIC, I think this code won;t be enough but to add a few more lines for "kicken" account to read the gateway's confirm mssg. How do I write that ? That is the big question to which I got no experience how to proceed. Imagine, when everytime you log into your account, my searchengine or server puts this as your account id: $_SESSION['user_id'] = 'kicken'. And shows your credit balance as $_SESSION['credits']. Now got to add amounts to that variable. An amount that got confirmed by the payment gateway. if I can learn this basic stuff on one site, with your help, then I should be able to deal with other similar websites API. Like google, youtube, chatgpt, etc. Yes ?
  18. @kicken Ok. Thanks for being blunt at the end. But, it never crossed my mind that cryptos forums exist. You know of any good ones ?
  19. Oh bother! I thought I finished the crawler but I get this error that ````$html_page_urls```` is not defined! If you look at the first few lines of the whole script which I will give at the bottom of this post, then you will notice this: //Preparing Crawler & Session: Initialising Variables. //Preparing $ARRAYS For Step 1: To Deal with Xml Links meant for Crawlers only. //Data Scraped from SiteMaps or Xml Files. $sitemaps = []; //This will list extracted further Xml SiteMap links (.xml) found on Sitemaps (.xml). $sitemaps_last_mods = []; //This will list dates of SiteMap pages last modified - found on Sitemap. $sitemaps_change_freqs = []; //his will list SiteMap dates of html pages frequencies of page updates - found on Sitemaps. $sitemaps_priorities = []; //This will list SiteMap pages priorities - found on Sitemaps. //Data Scraped from SiteMaps or Xml Files. $html_page_urls = []; //This will list extracted html links Urls (.html, .htm, .php) - found on Sitemaps (.xml). Check the final line above! Then you see this has been defined. Look again: $html_page_urls = []; //Same as: $html_page_urls = array(); And so, I do not understand why I get error that this is not defined. I get error on this line: function scrape_page_data() { if(array_count_values($html_page_urls)>0) CONTEXT <?php //Preparing Crawler & Session: Initialising Variables. //Preparing $ARRAYS For Step 1: To Deal with Xml Links meant for Crawlers only. //Data Scraped from SiteMaps or Xml Files. $sitemaps = []; //This will list extracted further Xml SiteMap links (.xml) found on Sitemaps (.xml). $sitemaps_last_mods = []; //This will list dates of SiteMap pages last modified - found on Sitemap. $sitemaps_change_freqs = []; //his will list SiteMap dates of html pages frequencies of page updates - found on Sitemaps. $sitemaps_priorities = []; //This will list SiteMap pages priorities - found on Sitemaps. //Data Scraped from SiteMaps or Xml Files. $html_page_urls = []; //This will list extracted html links Urls (.html, .htm, .php) - found on Sitemaps (.xml). $html_page_last_mods = []; //This will list dates of html pages last modified - found on Sitemap. $html_page_change_freqs = []; //his will list dates of html pages frequencies of page updates - found on Sitemaps. $html_page_priorities = []; //This will list html pages priorities - found on Sitemaps. //Preparing $ARRAYS For Step 2: To Deal with html pages meant for Human Visitors only. //Data Scraped from Html Files. Not Xml SiteMap Files. $html_page_titles = []; //This will list crawled pages Titles - found on html pages. $html_page_meta_names = []; //This will list crawled pages Meta Tag Names - found on html pages. $html_page_meta_descriptions = []; //This will list crawled pages Meta Tag Descriptions - found on html pages. // ----- //Step 1: Initiate Session - Feed Xml SiteMap Url. Crawing Starting Point. //Crawl Session Starting Page/Initial Xml Sitemap. $sitemap = "https://www.rocktherankings.com/sitemap_index.xml"; //Has more xml files. $xml = file_get_contents($sitemap); //Should I stick to this line or below line ? // parse the sitemap content to object //$xml = simplexml_load_string($sitemap); //Should I stick to this line or above line ? $dom = new DOMDocument(); $dom->loadXML($xml); //Trigger following IF/ELSEs on each Crawled Page to check for link types. Whether Links lead to more SiteMaps (.xml) or webpages (.html, .htm, .php, etc.). if ($dom->nodeName === 'sitemapindex') //Current Xml SiteMap Page lists more Xml SiteMaps. Lists links to Xml links. Not lists links to html links. { //parse the index // retrieve properties from the sitemap object foreach ($xml->urlset as $urlElement) //Extracts html file urls. { // get properties $sitemaps[] = $sitemap_url = $urlElement->loc; $sitemaps_last_mods[] = $last_mod = $urlElement->lastmod; $sitemaps_change_freqs[] = $change_freq = $urlElement->changefreq; $sitemaps_priorities[] = $priority = $urlElement->priority; // print out the properties echo 'url: '. $sitemap_url . '<br>'; echo 'lastmod: '. $last_mod . '<br>'; echo 'changefreq: '. $change_freq . '<br>'; echo 'priority: '. $priority . '<br>'; echo '<br>---<br>'; } } else if ($dom->nodeName === 'urlset') //Current Xml SiteMap Page lists no more Xml SiteMap links. Lists only html links. { //parse url set // retrieve properties from the sitemap object foreach ($xml->sitemapindex as $urlElement) //Extracts Sitemap Urls. { // get properties $html_page_urls[] = $html_page_url = $urlElement->loc; $html_page_last_mods[] = $last_mod = $urlElement->lastmod; $html_page_change_freqs[] = $change_freq = $urlElement->changefreq; $html_page_priorities[] = $priority = $urlElement->priority; // print out the properties echo 'url: '. $html_page_url . '<br>'; echo 'lastmod: '. $last_mod . '<br>'; echo 'changefreq: '. $change_freq . '<br>'; echo 'priority: '. $priority . '<br>'; echo '<br>---<br>'; } } else { //Scrape Webpage Data as current page is an hmtl page for visitors and no Xml SiteMap page for Crawlers. //scrape_page_data(); //Scrape Page Title & Meta Tags. } echo 'SiteMaps Crawled: ---';echo '<br><br>'; if(array_count_values($html_page_urls)>0) { print_r($sitemaps); echo '<br>'; } elseif(array_count_values($sitemaps_last_mods)>0) { print_r($sitemaps_last_mods); echo '<br>'; } elseif(array_count_values($sitemaps_change_freqs)>0) { print_r($sitemaps_change_freqs); echo '<br>'; } elseif(array_count_values($sitemaps_priorities)>0) { print_r($sitemaps_priorities); echo '<br><br>'; } echo 'Html Pages Crawled: ---'; echo '<br><br>'; if(array_count_values($html_page_urls)>0) { print_r($html_page_urls); echo '<br>'; } if(array_count_values($html_page_last_mods)>0) { print_r($html_page_last_mods); echo '<br>'; } if(array_count_values($html_page_change_freqs)>0) { print_r($html_page_change_freqs); echo '<br>'; } if(array_count_values($html_page_priorities)>0) { print_r($html_page_priorities); echo '<br>'; } scrape_page_data(); //Scrape Page Title & Meta Tags. function scrape_page_data() { if(array_count_values($html_page_urls)>0) { foreach($html_page_urls AS $url) { //Extract Page's Meta Data & Title. file_get_contents($url); // https://www.php.net/manual/en/function.file-get-contents $html = file_get_contents($url); //https://www.php.net/manual/en/domdocument.construct.php $doc = new DOMDocument(); // https://www.php.net/manual/en/function.libxml-use-internal-errors.php libxml_use_internal_errors(true); // https://www.php.net/manual/en/domdocument.loadhtml.php $doc->loadHTML($html, LIBXML_COMPACT|LIBXML_NOERROR|LIBXML_NOWARNING); // https://www.php.net/manual/en/function.libxml-clear-errors.php libxml_clear_errors(); // https://www.php.net/manual/en/domdocument.getelementsbytagname.php $meta_tags = $doc->getElementsByTagName('meta'); // https://www.php.net/manual/en/domnodelist.item.php if ($meta_tags->length > 0) { // https://www.php.net/manual/en/class.domnodelist.php foreach ($meta_tags as $tag) { // https://www.php.net/manual/en/domnodelist.item.php echo 'Name: ' .$name = $tag->getAttribute('name'); echo '<br>'; echo 'Content: ' .$content = $tag->getAttribute('content'); echo '<br>'; } } //EXAMPLE 1: Extract Title $title_tag = $doc->getElementsByTagName('title'); if ($title_tag->length>0) { echo 'Title: ' .$title = $title_tag[0]->textContent; echo '<br>'; } //EXAMPLE 2: Extract Title $title_tag = $doc->getElementsByTagName('title'); for ($i = 0; $i < $title_tag->length; $i++) { echo $title_tag->item($i)->nodeValue . "\n"; } } } } Folks, Do test the code and see in your localhost what you get! Puzzling! It's 3:06am here and I do not have sleep in my eyes to do a typo herein the $var name!
  20. @requnix Do close this thread as issue solved. Ypu may add an input, if you want.
  21. Kicken I saw this youtube vid on AI. One AI watches your computer moves. And when you forget where you saw something, then you type a few words and it reminds you where you saw it. Like in your certain file or in the web. That one might become handy. Talking about this .... "By looking through it at least once though, you'll familiarize yourself with what's available and then when you encounter some problem hopefully you'll be able to say to yourself "I remember seeing this one thing that might be relevant here, let me go look it up again"
  22. Wise geeks, Am I correct to assume the following or not ? The robot.txt directives are too messy. Best not bother programming a crawler to deal with it. After-all, Xml Sitemaps are built for crawlers. They will only list those links the site wants crawled. In that case, no need for any crawler to deal with robots.txt file to learn which pages to not index. To cut the chase, cralwer can only index the links found listed in the SiteMaps.
  23. Mmm. How do you reckon these folks get hold of the Zone Files everyday more than once ? https://domains-monitor.com/domainzones/ Maybe, I stop procrastinating trying to do everything all y myself and outsource the job. Meaning, buy the lists from that website. What would you do if you were in my position where you need a list of all the domain names ? And if you were running your own searchengine, what you yourself would do so your Index is not empty or very limited with links ? After reading my plan above, you reckon I am on the right track or you suspect I getting derailed somewhere ?
  24. In the php oop, what does this marker mean: ->.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.