Jump to content

Recommended Posts

Folks,

 

I am looking for a list of (at least 4-5) Mobile Bot (Google/Yahoo/MSN) User-Agent List, Please.

 

I have 4-5 user-agents in my list but they are almost 2 years old and outdated so do not look natural, So can you plz give me few User-Agents?

 

Cheers

Natasha

I don't understand the question. What do you mean by Mobile Bot? There are massive online lists of user agents that you can trawl through. If you give a bit more info on what you are trying to do there may be a way to help where you do not need to use the complete agent string (this is bad anyway as it is bound to change)

I don't understand the question. What do you mean by Mobile Bot? There are massive online lists of user agents that you can trawl through. If you give a bit more info on what you are trying to do there may be a way to help where you do not need to use the complete agent string (this is bad anyway as it is bound to change)

 

Thanks for the reply NJ.

 

I am doing CURL on a site which is on WAP, ti make my CURL calls natural i want to use Google or yahoo Mobile Bot user-agent strings, so it shows up like google/ yahoo mobile bots crwaling.

 

Ofcourse its no fool-proof as apart from User-agent we need IP as well, but i am ok with User Agents only.

 

Hope that helps.

 

Cheers

I am doing CURL on a site which is on WAP

WAP, my god how old is it. WAP is for phones that are probably 10 years old. Do you mean a mobile version of a website. If so you could use any mobile user agent to obtain the mobile version of a website.

 

Here are a few

http://learnthemobileweb.com/mobile-web-development/sample-mobile-user-agents/

 

I wouldn't worry too much about the user agent string like you are. To make your bot more natural make sure there is a random pause period between page requests. Also, I would use proxies or a proxy client such as TOR to make sure that your own IP address doesn't get blocked by the site if they suspect anything.

I am doing CURL on a site which is on WAP

WAP, my god how old is it. WAP is for phones that are probably 10 years old. Do you mean a mobile version of a website. If so you could use any mobile user agent to obtain the mobile version of a website.

 

Here are a few

http://learnthemobileweb.com/mobile-web-development/sample-mobile-user-agents/

 

I wouldn't worry too much about the user agent string like you are. To make your bot more natural make sure there is a random pause period between page requests. Also, I would use proxies or a proxy client such as TOR to make sure that your own IP address doesn't get blocked by the site if they suspect anything.

 

 

NJ,

 

You are right, i meant the mobile version of site.

 

Is it possible to implement TOR on web server? I have seen only Windows Desktop verison of TOR, can it be implemented on my web server (lunux)?

 

Cheers

That's a good link. I haven't used Polipo with Tor, only Privoxy.

 

Once you have it installed you would add the proxy option to your Curl instance i.e

 

$ch = curl_init();
curl_setopt($ch, CURLOPT_PROXY, '127.0.0.1:8118');

 

Get curl to grab a page such as http://www.whatismyip.com/ Do it with & without the proxy option set. You should find that the IP in the grabbed source code is different when using Tor.

That's a good link. I haven't used Polipo with Tor, only Privoxy.

 

Once you have it installed you would add the proxy option to your Curl instance i.e

 

$ch = curl_init();
curl_setopt($ch, CURLOPT_PROXY, '127.0.0.1:8118');

 

Get curl to grab a page such as http://www.whatismyip.com/ Do it with & without the proxy option set. You should find that the IP in the grabbed source code is different when using Tor.

 

NJ & Mr. Maq,

 

Thanks for Guidance.

 

Few Questions:

1) Will using TOR make things very very slow both Server Resources wise and Site load wise?

 

2) Can you please confirm, am i correct with the below code:

 

$URL = 'www.scrapeme.com';
$ch = curl_init();           
$tor_address = '127.0.0.1:8118';           
curl_setopt ($ch, CURLOPT_PROXY, $tor_address);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_HTTPPROXYTUNNEL, true);
curl_setopt ($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);
curl_setopt ($ch, CURLOPT_URL, $URL);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);

 

3) NJ, what is Polipo & Provoxy, what is to do with TOR?

 

Cheers

NT

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NJ, what is Polipo & Provoxy, what is to do with TOR?

Privoxy or Polipo are both proxy software. If you do not know anything about a proxy server it maybe a bit difficult to grasp.

 

Lets say that you wanted to allow me to browse the Internet through your server. When I go to a website, rather than my ISP IP address showing up on the website's access logs, they will see the IP address of your server. This is what is called a proxy server. You can change your connection settings within any web browser to use a proxy. All you need is the IP address and port number, maybe a username/password. The IP address would be that of your server running Privoxy. You can also block access to certain websites using it. That is why many companies may use a proxy server for all computers connected to the Internet. You can do a variety of web filtering for all traffic going through the proxy such as video, advertising, etc.

 

Now, when you are talking about crawling websites and wanting to hide your server's identity i.e the IP address, this is where TOR comes in. TOR is a bit of software that can either act as a client or a server or both. When you run TOR as a client it will attempt to connect to other TOR servers and establish a network that web traffic can flow through. This is why people in countries such as Japan that have firewalls in place to block websites use this type of network to get around them. What Privoxy can do is send traffic through the TOR network on a specific port number. Any website you attempt to go to from your server via CURL will log the IP address of the TOR server that traffic went through. As there are hundreds of thousands of TOR servers it is impossible to block them all.

 

The flow of traffic from your server would be:

 

Web Application (curl) ---> Privoxy/Polipo ---> TOR client ---> TOR server ---> Website

 

As opposed to

 

Web Application (curl) ---> Website

 

Running TOR on your server wont cause any issue. Just use it as a client and not a server otherwise you will have traffic coming through your server. TOR is only one option. If you want to hide your IP address then there are thousands of proxy IP addresses you can use. The only problem being that they are unrelieable and often do not work. You have to trawl through them to find decent ones.

 

http://www.proxy4free.com

 

You can also pay for dedicated proxy IP addresses to use. I would buy about 30 and cycle through them when making requests to websites.

 

http://www.bestproxyandvpn.com

 

Hope that helps. Your Curl code looks OK by the way.

Many Thanks NJ, it was helpful.

 

I understand in the first paragraph you talked about dedicated Proxies, i once took membership of Private proxies they seem to be too expensive.

 

Running TOR on your server wont cause any issue. Just use it as a client and not a server otherwise you will have traffic coming through your server.

 

1) You mean to say i should use TOR as Desktop client and not install on Server? I rather want to install TOR on my linux webserver and not on my machine.

 

2) Is Polipo and Privoxy paid/subscription services? Cnt i just use TOR on my server and call their IPs in CURL?

 

Cheers

You mean to say i should use TOR as Desktop client and not install on Server? I rather want to install TOR on my linux webserver and not on my machine.

No! TOR can act as a client or a server on the machine you install it on. Since you are using it for scripts on your webserver why would it be installed on your desktop machine? Just because the machine is a web server does not mean that it is not just a pc. Server is just a term.

 

Is Polipo and Privoxy paid/subscription services? Cnt i just use TOR on my server and call their IPs in CURL?

No, they are bits of software that you must install on your linux machine along with TOR. They are required for what you are trying to do. I don't think you fully understand what the bits of software do from my explanation. Why not do some Googling or read Wikipedia.

 

Here is an in-depth explanation regarding TOR. It also includes details of Polipo & Privoxy.

http://en.wikipedia.org/wiki/Tor_(anonymity_network)

 

I understand in the first paragraph you talked about dedicated Proxies, i once took membership of Private proxies they seem to be too expensive.

If the cost outweighs the benefit then I am a bit unsure why you would do this work? Would you not factor the cost into the job? I'm guessing you are getting paid for this work.  I would not say that proxy IP address are expensive. Have you spent the time to find good deals?

 

If you are doing this work for your own development or for free then use free proxies. Just Google & you will find thousands!

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.