SEO Blog Logs

Help the Kids

What is SEO (Search Engine Optimization)?
SEO is improving the volume and quality of traffic to a web site or page from the search engines using natural organic and algorithmic search results (SERPs).

SEO is actually a component of SEM (Search Engine Marketing). Which encompasses many avenues with which to promote a web site or page.
If you have come here looking for SEO help, you have found a blog that can help you. There is much information here to be held and please feel free to comment anytime...Even with a question.

September 13, 2007

Dealing with some Dark Side Webmaster Issues

There are some things that happen behind the scenes that many newer webmasters have never experienced. Unfortunately, the Internet, while providing a relatively open format for expression and movement, also opens doors for people who would use this very nature for wrong doing. We'll call this the dark side. There are many types of sites that are more prone to be problematic. Then there is the case of jealousy, or simply pissing someone off to deal with. In any case, education and detection is the best defense.

If you are a webmaster just getting started and using free hosting, I strongly suggest you look into some professional hosting. You aren't going to learn much of any real value from a free hosting environment. You traffic is slowed by the Ads, and you probably have bandwidth restrictions as well. The ability to see what is going on in your sites environment is limited to the http access, if you are even able to run Analytics. There are some great, really inexpensive hosting options available. I would recommend checking into IX Webhosting: For 3.95 a month you can host 2 Domains with great options and full controll, 300GB space and 3000 GB bandwidth/transfer. Host Monster is 4.95 a month for unlimited Domains, great options, 300GB space and 300Gb bandwidth/transfer. (No I am not making any money on either of those links!) These are both shared hosting, which means you are on a server with other sites and the hosting company balances the load to deliver the best performance to everyone. This should be more than adequate for getting you going. I would strongly recommend you choose a Linux environment. Even if you have no prior Linux experience, you will be very happy you chose it. It is far more powerful for the web environment. There are tools, and techniques easily accomplished that are nightmares on a Windows server...If they can even be accomplished at all.

OK, back to the dark side. Like I said, some sites are prone to problems. Forums, Blogs, image sites, etc. Some of you who have been here before, might have seen me say "fighting back" is a personal decision. Well, it is....However, you may find yourself in a "have to" situation. You may get the wicked duplication filter applied to your site as a result of a dark side action, and cease to rank for your own terms. How can this happen? Simple...You content has been syndicated on another IP/Domain with a high enough level of duplication to spark the filters into action, at which point the filter program made a "choice" based on PageRank. If your PageRank is lower your pages are sent to the supplemental index. So now lets look into some ways you might determine there is a problem with your content and protect it.

First and foremost, I highly recommend you leave one absolute URL in each page of your site. This mean a link your site complete, ideally the ,main page. So, instead of "/" , or "/index.html", use . You see scrapers and such tend to try to be as automated as possible, and this helps you keep a line on your pages.

Sign up and use Google Alerts. Use and alert for your URL or site address. Also choose a phrase completely unique to your site, a 7-10 word excerpt from the mid section of 2 or 3 of your site's pages, including the main page...Which includes like half of one sentence and half or all of the next. When you enter this alert into the field in Google Alerts, surround it in "quotes". Chose how Google will deliver results you choose to your mailbox, if and when this content pops up on someone else's site and you are done.

If you want to check your pages on a larger scale, then there are a couple of nice tools that will even detect partial content theft. Copyscape searches the entire web for copies of your page, but limits the results unless you pay. Webmaster Labor, only searches Google, but gives you individual phrases and the percentage of duplication. I usually just add any unique ones that are highly duplicate to Google Alerts (in quotes) to get the URLs. You see, this Blog was recently scraped by a automated program that changed every single URL to point internally to the scraper site...So "inurl" and "link" operators would not have found my content. Since this site was a PageRank 6, if I had not gotten the Google Alert, my search position surely would have tanked. We'll go over what to do, closer to the end.

So, you saw me mention the "inurl" and "link" operators. Here is a very good thread over at Webmaster World covering thier use. The deal with the command is this, they said you should be looking for other sites listed with the EXACT description and title. When you click these links, the site will look just like yours...This is a 302 hijack. Some time ago (2006) they were rampant, not as popular now....But I disconnected one 2 weeks ago. The, while I think it would be good to use it for verification, I would not call it at all accurate. Here's why...Firstly, I just told you to make an absolute link. Secondly, permalinks are absolute by design. Thirdly, Google throttles or holds back a great deal of the links in the links command....And they are very slow to add links sometimes. Yahoo is much better to scrape links. Just pop open Yahoo search and the operator is like this: link: . Yahoo doesn't display the description, but the titles and URLs are intact, so this can be helpful and is more complete.

The reason I was talking about the hosting up top, is this....One of the quickest, easiest ways to nab a scraper is your logs and server side stats. Even though many scrapers automate the whole process, for sites with images, codes, and other remarketable type material they will just scrape for the content to add to or develop their own sites. There is no easy way to track down these types of materials, codes maye be a little easier...The can sometimes be bugged, or tracked. Images, and backgrounds, and photos, videos, sound files, are nearly impossible. Yes, by ALL means, IF you actually own the content you can watermark or digitally sign them...But it is a very big legal expense to enforce this. Google is probably not going to make a webmaster remove this type of content with just a watermark or digital signature....You will have to pay for a valid copyright. Additionally, dealing with 90% of the ISPs for this type of things is a total waste of time, a necessary step....But generally fruitless. So these people come in steal all of your files, system files and all...Chew up your bandwidth, and most times they will return regularly for updates! Well you can find them in your stats...They will be top of your lists! They have very distinctive characteristics. Generally they will have chewed up a massive amount of bandwidth, just a few (VERY, DAMN FEW) visits, TONS of hits, and a huge number of files. Now, before you act on anything...There are some things to check out and consider. First of all, have you seen this IP/Host before? Check them out. How many files are on that Domain? If they have taken almost all or more files than the Domain has, then you can safely move to step 2...If not log the stats and wait until next time. Go into you logs and find that IP, if you don't have a built in reader for your logs the download the logs open the file with Excel and choose delimited, then space. Once you have it open select the top of the row with the IP addresses, so it select the whole row...Chose Data in the menu bar, then sort, then continue with current selection, the either ascending or descending. Now look for that IP, you want to see what kind of files they were hitting and how often. Here's why. First of all, an automated program is going to have access to files that humans do not, as it is designed to generally scrape the whole site. Additionally the time between requests is important, because, humans for example do not generally request files every second for 20 minutes. If you are still unsure, you might consider the time line vs the location on a few. For example, access form.php then one second later a file or image 3 clicks away...Not possible for a human. One quick note, I have a site like mentioned above. Lately I have a crafty scraper, he doesnt show up in my top list...Only in my logs. Like this:

  • 50 files

  • 50 files

  • 50 files

So, now the decision time. I guess, while initially the decision is personal, there may come a time when it becomes necessary. So, understand first of all many ISPs still rotate IP addresses...Like AOL for example. There is not much point in blocking most IP addresses for very long, as these types of individuals tend to move on or change IPs frequently. Blocking too many will slow down your site. Also, when you get into blocking these type of folks do them one at a time....DNS is a complicated matter, and you cannot accurately predict where your own Internet connection is routing through, so you might just block yourself. Anytime you block an IP, it is always best to block just 1 address and not a range, you are always risking blocking an innocent user. In Windows servers you block IPs in the IIS configuration: Website; Properties; Directory security. In the Linux/Apache environment you use your htaccess file. You will space a line above and a line below any other commands. The spot after the ## is not read, its a comment, I use them to help me keep track of the blocks.

Order Deny,Allow
Deny from ##note

So what if your site has been ripped, and republished or 302'd. If you need to take action, then the best way to proceed...In my own personal experience. Is to first of all, out of respect send the webmaster an email, if you can locate one. Tell him/her to remove your content. The next thing I do is a spam/duplicate report with Google if the ripper's site is indexed....Time is of the essence. Google is pretty quick on these. Then I call the offending site's hosting company. I will send them what they need in email, but not until I talk with the correct department and get a REAL persons name.

One other quick note, the first time this happened to me I thought this guys just had "really big kahunas"...But after I stopped laughing I had a more cynical thought. I banned a guy for a grab bag of offenses, including some hacking attempts and scraping ventures. Well this guy emailed me, and wanted to know why the hell I banned him. Many of you may not realize that when you send email from your computer based email programs that your IP address is in the header. So, if this happens, do not respond. If you feel compelled to answer for whatever reason, copy just the message into Yahoo mail and send it from there. So, what if I had just stupidly sent this guy my home IP? DUH!

Well, I hope this trip to the dark side has not put a rain cloud on your day. Do you have great tip? Advice or question? I would love to hear it!

Peace and SEO

Melanie Prough

DIY Your SEO With The SEOCog
Digg This Post We Require a Link Back to Please.
**We Require a Link Back Please.