Search the Community
Showing results for tags 'crawl'.
-
I'm using cURL to crawl and scrape data from a website. This website contains tables with rows of data. When I send a cURL POST for the underlying data at a specific row(A), it will return the expected data. But when I move to the second row(B), the data returns blank or specifically, a tons of spaces (or nbsp's.) When I access the cURL's POST location by browser, I can see (B)'s data. The only difference in the 2 POST's are location ID's for the data. I don't think it's a problem with JavaScript as I can successfully return data from row (A) as I mentioned. Website I'm trying to crawl: https://mycpa.cpa.state.tx.us/up/Search.jsp Working POST URL(A): https://mycpa.cpa.state.tx.us/up/searchresults.do?d-49216-p=&d-49216-s=&how=&last=bales&other=&d-49216-o=&zip=&_chk=74170700611986R2ZZZZ26&which=View+Details Non-working POST URL(B): https://mycpa.cpa.state.tx.us/up/searchresults.do?d-49216-p=&d-49216-s=&how=&last=bales&other=&d-49216-o=&zip=&_chk=74600015611995R1AC081084&which=View+Details Interestingly, you can combine the data location ID's to show more than 1 set of data per page. When trying this method, the first set of data(A) is displayed and the second(B) is shown as spaces (or nbsp.) Combined POST URL: https://mycpa.cpa.state.tx.us/up/searchresults.do?d-49216-p=&d-49216-s=&how=&last=bales&other=&d-49216-o=&zip=&_chk=74170700611986R2ZZZZ26&_chk=74600015611995R1AC081084&which=View+Details
-
this may be a very basic question. I would like to know whether the data which are displayed only to logged in (php session authenticated) users will be crawlable by search engines? for example: there is a page www.domain-name.com/content-listings/ and this page lists some information for user. Non-registered users will view basic information like name and postal address and these should be SEO friendly and crawlable. Registered users (logged in) will view sensitive information such as email_id and phone number which should not be crawlable by search engines. will this be just achieved with sessions or do I need to use javascript and ajax to make email id and phone number protected from crawling and spammers.
-
Hello to everybody, I need critiques and "website crawling help" about my website http://enginery.freecluster.eu . My crawling question was that: I tried google search console tools to add my website's sitemap and add it : http://enginery.freecluster.eu/sitemap.xml . It says my sitemap is ok and found 312 pages but not crawl all correctly! Three weeks have passed but nothing changed. I manually request indexing some pages(about 4 pages) and google search console, after than today it only shows some of them(not all 4) when I search using "site:http://enginery.freecluster.eu". My website's all files have php extensions. Did this prevent googlebot to reach the content of my websites' pages? My robots.txt file's content is : User-agent: Googlebot Allow: / User-agent: * Allow: / Sitemap: http://enginery.freecluster.eu/sitemapv1.xml Any critiques and help is appreciated. Thanks.