Jump to content

Please cretique this tool for me


igor berger

Recommended Posts

I am working on a project to catch and catalogue Spam domains - PHSDL!

 

I have, just developed a searchable database module.

http://www.travelinasia.net/phsdl/search_spam_domains.php

 

If your domain is not doing good in search engine results or you

looking to buy a second hand domain, this is a good tool to check the domain name.

 

I also have a plugin tool on the Website forum that deletes the post automatically if any offending domain is on the PHSDL hot list.

 

I am going to be developing the plugin to share with other forum an blog Websites for free!

 

Any input would be appreciated.

 

Thank you,

Igor

 

P.S. I now the last critique did not go to smooth, but I did head your comments and fixed the browser interoperability issues. Not 100% yet!

Still need to finish w3c validations and implement unobtrusive JavaScript menus! Oh and the colors! ^(  )^

 

 

Link to comment
Share on other sites

You have no idea how much!

When a guy by the name of Zlob blasts your forum with 30 messages a day, everyday, using a different domain every time, you better believe it that it is useful to block him from posting on your forum.

They use different IP addresses via proxy server, and sign up with different user ids every time.

So you wind up getting all your posts in Google supplementary index, and that means death for your Website.

It is very hard to get out of the grave, once you there!

It takes a total exorcizem to resurrect the Website.

 

Link to comment
Share on other sites

The Spam domain filter is working excellent! I blocked them so tight that, they keep making new user ids in vain! I even put if zero post noindex meta to save bandwidth for the SERPs and not to contribute to SERP’s Spam with duplicate aka supplementary content.

I am thinking of putting a counter on the hot Spam list that will show the most active Spam domains in the past amount of hours!

Maybe the one that Spams the most, will get the most infamous award! <__>

This will look nice when people looking and coming back to see the list.

http://www.travelinasia.net/forum/project_honeypot.php

 

Also this will be useful for people to see the plugin at work, because I will be giving it to people to use for free so they can clean up and block Spam on their boards.

I will just give them an include that will query my database…I hope it does not use too much of my bandwidth!

(For you who are good with MySql is it better to extract the spam domains from a message and query MySql  20,000 records each domain extracted, for example 20 domains in a message so have to query 20 times. Or query MySql once get the array and compare each element to the haystack using preg_match(). Which one would be better for saving recourses for optimal processing?) 

 

Any suggestions on your part, for this project PHSDL.

 

Thanx,

Igor

 

Link to comment
Share on other sites

Thank you Steve, for thumbs up!

 

I would love to integrate my list with other peoples’ lists.

My only concern is how their list is harvested.

 

As the http://www.joewein.de/ I am not sure how they harvest their list of domains. Do they extract the urls from the message body, or do they extract the domains from the message headers. If it is from the headers it may not be reliable, because someone may send from someone else’s domain.

 

I want to concentrate on the urls.

 

But your suggestion is really good. I will look over some SBL list that harvest Spam urls from email message bodies and will contact them if they would be interested in collaboration.

I just need to look at my Spam Assassin Spam report messages to get the participating Spam urls SBL.

 

It may be too early for this in this stage. I need to organize the project a little more.

 

I just registered a domain for the project www.phsdl.com and will be looking to move the list from www.travelinasia.com to the project's domain!

 

I will use the remote include plugin for deleting unwanted forum messages based on hitting the Spam list!

 

I will set up the script on www.phsdl.com and do the include on www.travelinasia.net forum! (Right now the script is on Travelinasia)

If all goes well as I predict, I will distribute the plugin to other forum and blog users. Once I get a bit of following going I will look to collaborate with other list owners.

 

Project development and implementation is very hard thing to do!

The timing of bringing it out beta is very sensitive.

 

If you have any other input or ideas, please share them with me.

 

Thank you,

Igor

 

Link to comment
Share on other sites

Thanx Steve!

 

Now we know why Google does not respect the nofollow meta tag!

 

Google parses all urls for data!

http://www.travelinasia.net/forum/viewtopic.php?t=3812

 

"Your client application can use the API to download an encrypted table for local, client-side lookups of URLs that you would like to check."

 

The operative here, is that using XML API you can download a database table! This is good for developers not for average Joe.

 

An average Joe running a blog or a forum, is not going to be integrating the API table into his database, parsing out the domains and doing reg_match() on messages to be submitted to his database!

An average Joe will benefit much more with PHSDL include that will do all the work for him - project side database lookup!

Google would have been smart to fashion their API in the way I am doing, but I guess they do not want everyone in the world continuously querying their database.

 

Yes as a developer I may take a look at Google's API data, but more for secondary vs, primary verification. So I can validate my Spam domain database against Google's database, and eliminate any discrepancies in my database that may have been false positive! Google’s list may be too big for using it as primary! It will be nice to see who is on that list….. <__>

 

Steve I really thank you for heads up on all the ideas that you are contributing to the development of the project.

 

Igor

 

Link to comment
Share on other sites

I don't want to hurt your feelings or anything man, but I seriously think your tool is useless. Its not like its a bad thing, but as a developer, I check one domain on your database (mine) and leave. Never check out your site ever again. Now, if I wanted to go full out I would use google's tool. I can trust google. If google has my domain on the black list then I'd be more screwed. Because I wouldn't be on their search results. :'(

 

So from my perspective you are wasting your time. I maybe wrong, since there might be developers who benefit from your tool continuosly, but not me. It benefited me once, but never again. You have to create something that is continously updated, so I CAN USE  IT MORE THAN ONCE. But that's not possible. The web has too many domains to worry about, right? - Unless you are google. ::)

Link to comment
Share on other sites

TheFilmGod, thank you for your comments, and I am open to positive and negative comments as well.

 

The list is continuously updated and you may want to check it from time to time especially if you have a subdomain user on your domain.

 

Also the PHSDL plugin will prevent Spam to forums and blogs that do decide to use.

 

You do not have to be Google to help the Global Village!

 

If anyone gets a chance to see the Google's list can you check if any edu extension domains are on the list?

 

I get a lot of edu subdomain Spam to my forum, and I put the domains on the list. I believe the universities have to be responsible Net citizens and not say it was not us but a student!

 

Thank you,

Igor

 

Link to comment
Share on other sites

TLG thanx for heads up!

 

I do say in the disclaimer that there maybe a few false positives!

I would say from the whole list of 20,000 domains there maybe 10 to 20 false positives.

 

 

I will be setting up an exclussion module to correct the wrongs as soon as they are made aware of - now just putting them in if statement for validation!

 

Thank you,

Igor

Link to comment
Share on other sites

Okay, I got the exempt_domains table up!

Who needs the silly ifs!

 

I found www.msn.com on the list, but I think I will leave it there for a while!

Maybe have Bill file a removal request! <__>

 

If anyone finds something that does not belong on the list, please let me know, and I will take a look.

 

Thank you,

Igor

Link to comment
Share on other sites

Okay guys, I have added statistics on the PHSDL Spam domains list page!

 

I made a Ten Most Wanted list! I wanted to add dead or alive, but I figured it will not look professional!

 

The statistics will sho the top 10 offenders and how many times they have offended. I am thinking of adding time and date of last offence maybe for top 10 or for all offenders!

 

Please take a look

http://www.travelinasia.net/forum/project_honeypot.php

 

And if you want to give it a test run, take one of the domains from the list and try to post with it.

 

I am thinking about setting up an automated emaill  to RIP and ICON saying that this domain on this host has commited a Spam to this domain on this hour, and maybe forward the message with the Spam to them!

What do you guys think of this?

Or is there a better place to report?

 

The forum is looking okay and got the Spam messages down to about 5 a day, that my filter does not stop.

Still, today Zlob blastede me with porn, 20 posts with same domain!

He has not been around for about 3 or 4 days, but he is not copletely out!

Thinking of writing a multi message deletion module, to delte his post with one button, based on a preg_match of the domain he used!

 

Unfortunetly I had to take out members Websites in profiles and member lists, because of many bad Websites, that Google penalizes you if you link to them.

 

I still allow the Website of the members profile appear on the buttom of the post, if I deam the post relevent to the board and not offensive Spam!

 

Oh, heads up to phpBB forum users, the phpBB link on the buttom of all forums links to phpBB.com, but phpBB.com links to gambling and porn Websites, so Google penalizes you for that!

I do not think is fare, that we link to them and the PR of our Website is dimished because of linking to bad neighborhood!

That is how my forum started getting Porn, gambling, tabaco, and drugs Spam in the first place. If you link to a sertain content the robots categorize your Website as such a category!

 

I read the desclaimer in phpBB which said, "If you need to, you can remove the copyright but please leave the link! If you remove the link they will not provide any support!"

Well I never got any support from phpBB and I wrote all the mods myself, and will continue writing them! So I removed their link, but kept the copyright, because I believe the author of the software deserves the credits, but does not deserve the link PR.

 

There is still a lot to do, and I am learning new things everyday. It feels good doing the project, and because it is a volunteer project the feeling is even better! Cleaning up Spam of the Internet is really a good cause.

 

If any of you guys want to learn about optimizing your Website so it meets Google Webmaster Guidelines please visit me at Google Webmaster Help group!

It is really fun and there is a lot to learn! 301 redirects, robots disallow, duplicate content, and other stuff.

We do not call it SEO but proper Webmaster Guidelines!

 

I will try not to talk about search engines in here, because some users may get upset-> :)

So if you are interested how a Website lives in real world, come check us out!

Matt Cutts the CEO of Google and Adam Lasnik drop by from time to time.

So if you there you may bump into them.

From time to time, we also talk about different jobs in the Webmasters world...

 

Okay, so enough of the B.S., any suggestions towards PHSDL project are very welcomed.

 

Thank you all for your contributions!

Igor

 

Link to comment
Share on other sites

Ha, ha! Ebay Spam what do you guys think?

Should I leave them there for all the duplicate pages they feed Google?

 

Supplementary 102,000,000 total pages 113,000,000

 

90% of their pages are Supplementary in Google.

Is this not a waste of IT recources, to crawl them?

 

Igor

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.