Jump to content

Search the Community

Showing results for tags 'crawl'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • Welcome to PHP Freaks
    • Announcements
    • Introductions
  • PHP Coding
    • PHP Coding Help
    • Regex Help
    • Third Party Scripts
    • FAQ/Code Snippet Repository
  • SQL / Database
    • MySQL Help
    • PostgreSQL
    • Microsoft SQL - MSSQL
    • Other RDBMS and SQL dialects
  • Client Side
    • HTML Help
    • CSS Help
    • Javascript Help
    • Other
  • Applications and Frameworks
    • Applications
    • Frameworks
    • Other Libraries
  • Web Server Administration
    • PHP Installation and Configuration
    • Linux
    • Apache HTTP Server
    • Microsoft IIS
    • Other Web Server Software
  • Other
    • Application Design
    • Other Programming Languages
    • Editor Help (PhpStorm, VS Code, etc)
    • Website Critique
    • Beta Test Your Stuff!
  • Freelance, Contracts, Employment, etc.
    • Services Offered
    • Job Offerings
  • General Discussion
    • PHPFreaks.com Website Feedback
    • Miscellaneous

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


AIM


MSN


Website URL


ICQ


Yahoo


Jabber


Skype


Location


Interests


Age


Donation Link

Found 3 results

  1. I'm using cURL to crawl and scrape data from a website. This website contains tables with rows of data. When I send a cURL POST for the underlying data at a specific row(A), it will return the expected data. But when I move to the second row(B), the data returns blank or specifically, a tons of spaces (or nbsp's.) When I access the cURL's POST location by browser, I can see (B)'s data. The only difference in the 2 POST's are location ID's for the data. I don't think it's a problem with JavaScript as I can successfully return data from row (A) as I mentioned. Website I'm trying to crawl: https://mycpa.cpa.state.tx.us/up/Search.jsp Working POST URL(A): https://mycpa.cpa.state.tx.us/up/searchresults.do?d-49216-p=&d-49216-s=&how=&last=bales&other=&d-49216-o=&zip=&_chk=74170700611986R2ZZZZ26&which=View+Details Non-working POST URL(B): https://mycpa.cpa.state.tx.us/up/searchresults.do?d-49216-p=&d-49216-s=&how=&last=bales&other=&d-49216-o=&zip=&_chk=74600015611995R1AC081084&which=View+Details Interestingly, you can combine the data location ID's to show more than 1 set of data per page. When trying this method, the first set of data(A) is displayed and the second(B) is shown as spaces (or nbsp.) Combined POST URL: https://mycpa.cpa.state.tx.us/up/searchresults.do?d-49216-p=&d-49216-s=&how=&last=bales&other=&d-49216-o=&zip=&_chk=74170700611986R2ZZZZ26&_chk=74600015611995R1AC081084&which=View+Details
  2. this may be a very basic question. I would like to know whether the data which are displayed only to logged in (php session authenticated) users will be crawlable by search engines? for example: there is a page www.domain-name.com/content-listings/ and this page lists some information for user. Non-registered users will view basic information like name and postal address and these should be SEO friendly and crawlable. Registered users (logged in) will view sensitive information such as email_id and phone number which should not be crawlable by search engines. will this be just achieved with sessions or do I need to use javascript and ajax to make email id and phone number protected from crawling and spammers.
  3. Hello to everybody, I need critiques and "website crawling help" about my website http://enginery.freecluster.eu . My crawling question was that: I tried google search console tools to add my website's sitemap and add it : http://enginery.freecluster.eu/sitemap.xml . It says my sitemap is ok and found 312 pages but not crawl all correctly! Three weeks have passed but nothing changed. I manually request indexing some pages(about 4 pages) and google search console, after than today it only shows some of them(not all 4) when I search using "site:http://enginery.freecluster.eu". My website's all files have php extensions. Did this prevent googlebot to reach the content of my websites' pages? My robots.txt file's content is : User-agent: Googlebot Allow: / User-agent: * Allow: / Sitemap: http://enginery.freecluster.eu/sitemapv1.xml Any critiques and help is appreciated. Thanks.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.