| Author |
|
dirk Newbie

Joined: 13 April 2003 Location: Belgium
Online Status: Offline Posts: 2
|
| Posted: 13 April 2003 at 5:56pm | IP Logged
|
|
|
Hello gurus,
I am trying to lock out a number of spambots, rogue spiders and leech engines from our site. Do I have to create a rewrite condition/rule pair for every user agent, or can I just put a list of conditions one after another, using some "OR"-operator as in Apache mod_rewrite ?
Eg. RewriteCond user-agent: .*grub-client.* [or]
RewriteCond user-agent: .*wget.* [or]
RewriteCond user-agent: .*Teleport.*
- (some more agents) -
RewriteRule .* /block.htm
Thanx in advance for your replies,
Dirk
|
| Back to Top |
|
| |
Yaroslav Admin Group

Joined: 15 August 2002
Online Status: Offline Posts: 6519
|
| Posted: 14 April 2003 at 3:33am | IP Logged
|
|
|
You can use alternatives in the regular expression. For example thi is a good rule to block bad bots:
RewriteCond User-Agent: (?:Alexibot|asterias|BackDoorBot|Black.Hole|BlackWidow|BlowFish|BotALot|BuiltBotTough|Bullseye|BunnySlippers|Cegbfeieh|CheeseBot|CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ia_archiver|ImagesStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|Internet\sNinja|JennyBot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|larbin|LeechFTP|LexiBot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|NetZIP|NICErsPRO|NPBot|Octopus|Offline.Explorer|Offline\sExplorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|pavuk|pcBrowser|ProPowerBot/2.14|ProWebWalker|ProWebWalker|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|SpankBot|spanner|SuperBot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|TightTwatBot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|TurnitinBot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumBot|WebReaper|WebSauger|Website\seXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu's|Zeus).* RewriteRule .* /block.htm
__________________ Yaroslav Govorunov,
Helicon Tech
|
| Back to Top |
|
| |
dirk Newbie

Joined: 13 April 2003 Location: Belgium
Online Status: Offline Posts: 2
|
| Posted: 14 April 2003 at 11:50am | IP Logged
|
|
|
Seems to work as a charm. Thanx, Yaroslav. I have put this
fine little piece of software on the purchase list of our company.
|
| Back to Top |
|
| |
ozbiz Newbie

Joined: 31 March 2003 Location: Australia
Online Status: Offline Posts: 34
|
| Posted: 18 September 2003 at 4:50pm | IP Logged
|
|
|
Thanks Yaroslav, this works great.
I have got the rule in the helicon directory to cover multiple sites.
The logs seem to show this works for Art-Online.com+0.9(Beta) but not for Program+Shareware+1.0.3
Here is the rule I am using. I seem to remember reading something about + in my sleep. ;)
#spambots RewriteCond User-Agent: (?:Alexibot|Art-Online|asterias|BackDoorbot|Black.Hole|BlackWidow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers|Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ImagesStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|Internet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|larbin|LeechFTP|Lexibot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\sExplorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|pavuk|pcBrowser|Program+Shareware+1.0.3|ProPowerbot/2.14|ProWebWalker|ProWebWalker|psbot/0.1|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanner|Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumbot|WebReaper|WebSauger|Website\seXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu's|Zeus).* RewriteRule .* /block.asp
__________________ Jim
|
| Back to Top |
|
| |
Yaroslav Admin Group

Joined: 15 August 2002
Online Status: Offline Posts: 6519
|
| Posted: 19 September 2003 at 6:21am | IP Logged
|
|
|
#spambots RewriteCond User-Agent: (?:Alexibot|Art-Online|asterias|BackDoorbot|Black.Hole|BlackWidow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers|Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ImagesStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|Internet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|larbin|LeechFTP|Lexibot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\sExplorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|pavuk|pcBrowser|Program+Shareware+1.0.3|ProPowerbot/2.14|ProWebWalker|ProWebWalker|psbot/0.1|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanner|Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumbot|WebReaper|WebSauger|Website\seXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu's|Zeus|Program\sShareware).* RewriteRule .* /block.asp
__________________ Yaroslav Govorunov,
Helicon Tech
|
| Back to Top |
|
| |
Yaroslav Admin Group

Joined: 15 August 2002
Online Status: Offline Posts: 6519
|
| Posted: 19 September 2003 at 6:33am | IP Logged
|
|
|
Sorry, I have missed something. Heer is another one:
#spambots RewriteCond User-Agent: (?:Alexibot|Art-Online|asterias|BackDoorbot|Black.Hole|BlackWidow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers|Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ImagesStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|Internet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|larbin|LeechFTP|Lexibot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\sExplorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|pavuk|pcBrowser|Program\sShareware\s1|ProPowerbot/2.14|ProWebWalker|ProWebWalker|psbot/0.1|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanner|Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumbot|WebReaper|WebSauger|Website\seXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu's|Zeus).* RewriteRule .* /block.asp
__________________ Yaroslav Govorunov,
Helicon Tech
|
| Back to Top |
|
| |
ozbiz Newbie

Joined: 31 March 2003 Location: Australia
Online Status: Offline Posts: 34
|
| Posted: 19 September 2003 at 5:36pm | IP Logged
|
|
|
Thanks Yaroslav,
You are a legend.
__________________ Jim
|
| Back to Top |
|
| |
ac3creative Newbie

Joined: 09 June 2006 Location: United Kingdom
Online Status: Offline Posts: 14
|
| Posted: 09 June 2006 at 4:44am | IP Logged
|
|
|
Sorry to reserect a real oldie, but as i noticed it referenced around the forum i figured it made more sense than starting again.
I'm trying to block some bots out of certain folders / subdomains and whilst the coding above shows me how to block them from the site total, i only want them kept away from specific areas of the site.
Ideally i am trying to block all bots, so is there a better way to block them all rather than having listing them all?
I've been advised to show them a 403 rather than direct them off to a specific page, woud i simply edit the RewriteRule .* /block.asp to thye default 403 url of my server?
Thanks in advance
|
| Back to Top |
|
| |
Lexey Moderator Group

Joined: 15 August 2002 Location: Russian Federation
Online Status: Offline Posts: 8118
|
| Posted: 11 June 2006 at 9:36am | IP Logged
|
|
|
ac3creative wrote:
I'm trying to block some bots out of certain folders / subdomains and whilst the coding above shows me how to block them from the site total, i only want them kept away from specific areas of the site.
|
|
|
Rule could be like this:
RewriteCond User-Agent: (?:Alexibot|asterias|BackDoorBot|...others here)
RewriteCond Host: (subdomain1|sobdomain2|...)
RewriteRule /(?:folder1|folder2|...) /403.asp [I,L]
Quote:
Ideally i am trying to block all bots, so is there a better way to block them all rather than having listing them all? |
|
|
And how are you going to distinguish a bot from non-bot?
|
| Back to Top |
|
| |
ac3creative Newbie

Joined: 09 June 2006 Location: United Kingdom
Online Status: Offline Posts: 14
|
| Posted: 11 June 2006 at 5:24pm | IP Logged
|
|
|
Hi Lexey,
and thanks for the quick attention to this.
As for the knowing whether a bot is a bot, i'm really not sure, i've never had to do anything like this previously as i'm generally trying to get the bots to the site and not blocking them.
I guess if i have to list them all then so be it, i'll pull the list you posted above and add any others as i come across them.
If i use the coding above:
Quote:
Rule could be like this: RewriteCond User-Agent: (?:Alexibot|asterias|BackDoorBot|...others here) RewriteCond Host: (subdomain1|sobdomain2|...) RewriteRule /(?:folder1|folder2|...) /403.asp [I,L]
|
|
|
Do i take it that if i simply place the folder at www.domain.com/folder-to-block/, i simply add the name of the folder where you have (?:folder1) and get rid of the RewriteCond Host: (subdomain1)?
Do i also assume that if i place the 'subdomain' into the required space, that the folder that would be blocked would be the one of that name on the subdomain?
I'm not worried about using a subdomain if i can get away with blocking a folder out, so would i be correct in assuming that the following would work on a folder placed on the main domain?
Quote:
RewriteCond User-Agent: (?:Alexibot|asterias|BackDoorBot|...others here) RewriteRule /(?:folder1|folder2|...) /403.asp [I,L]
|
|
|
Thanks again
|
| Back to Top |
|
| |
Lexey Moderator Group

Joined: 15 August 2002 Location: Russian Federation
Online Status: Offline Posts: 8118
|
| Posted: 12 June 2006 at 12:23pm | IP Logged
|
|
|
All your assumptions are correct.
|
| Back to Top |
|
| |
ac3creative Newbie

Joined: 09 June 2006 Location: United Kingdom
Online Status: Offline Posts: 14
|
| Posted: 15 June 2006 at 1:37pm | IP Logged
|
|
|
OK, think i have it all sorted now with my host, but could someone please check that this coding within my httpd.ini file is correct before i start sending traffic to the site as i absolutely do not want any robots having access to the /members/ folder on my domain.
Quote:
|
[ISAPI_Rewrite]
# 3600 = 1 hour CacheClockRate 3600
RepeatLimit 32
# Block external access to the httpd.ini and httpd.parse.errors files RewriteRule /httpd(?:\.ini|\.parse\.errors).* / [F,I,O] # Block external access to the Helper ISAPI Extension RewriteRule .*\.isrwhlp / [F,I,O]
#botblock
RewriteCond User-Agent: (?:Alexibot|Art-online|asterias|BackDoorbot|Black.Hole|Black Widow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers| Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|CopyRightCheck|co smos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch| EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|E xpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|Fro ntPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Gra fula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|Image sStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|I nternet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spide r|Keyword.Density|larbin|LeechFTP|Lexibot|libWeb/clsHTTP|Lin kextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass \sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mis ter.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NE WT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampi re|NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\s Explorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|p avuk|pcBrowser|Program\sShareware\s1|ProPowerbot/2.14|ProWeb Walker|ProWebWalker|psbot/0.1|QueryN.Metasearch|ReGet|RepoMo nkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanne r|Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Tel eport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|Tight Twatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|Tr ue_Robot|turingos|Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|W ebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebF etch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|Web Leacher|WebmasterWorldForumbot|WebReaper|WebSauger|Website\s eXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebSt ripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow| [Ww]eb[Bb] andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu's|Zeus |Googlebot|Googlebot-Mobile|Googlebot-Image|Mediapartners-Go ogle|Adsbot-Google|WISEbot/1.0|mozilla/4.0|msnbot|psbot|Slur p) RewriteRule /(?:members) /403.html [I,L]
|
|
|
Thanks in advance
|
| Back to Top |
|
| |
Lexey Moderator Group

Joined: 15 August 2002 Location: Russian Federation
Online Status: Offline Posts: 8118
|
| Posted: 16 June 2006 at 6:26am | IP Logged
|
|
|
Rule should be like:
RewriteCond User-Agent: (?:Alexibot|Art-online|asterias|...).*
RewriteRule /members(?:/.*)? /403.html [I,L]
|
| Back to Top |
|
| |
ac3creative Newbie

Joined: 09 June 2006 Location: United Kingdom
Online Status: Offline Posts: 14
|
| Posted: 16 June 2006 at 6:30am | IP Logged
|
|
|
so, it should be:
Quote:
|
[ISAPI_Rewrite]
# 3600 = 1 hour CacheClockRate 3600
RepeatLimit 32
# Block external access to the httpd.ini and httpd.parse.errors files RewriteRule /httpd(?:\.ini|\.parse\.errors).* / [F,I,O] # Block external access to the Helper ISAPI Extension RewriteRule .*\.isrwhlp / [F,I,O]
#botblock
RewriteCond User-Agent: (?:Alexibot|Art-online|asterias|BackDoorbot|Black.Hole|Black Widow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers| Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|CopyRightCheck|co smos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch| EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|E xpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|Fro ntPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Gra fula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|Image sStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|I nternet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spide r|Keyword.Density|larbin|LeechFTP|Lexibot|libWeb/clsHTTP|Lin kextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass \sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mis ter.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NE WT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampi re|NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\s Explorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|p avuk|pcBrowser|Program\sShareware\s1|ProPowerbot/2.14|ProWeb Walker|ProWebWalker|psbot/0.1|QueryN.Metasearch|ReGet|RepoMo nkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanne r|Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Tel eport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|Tight Twatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|Tr ue_Robot|turingos|Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|W ebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebF etch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|Web Leacher|WebmasterWorldForumbot|WebReaper|WebSauger|Website\s eXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebSt ripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow| [Ww]eb[Bb] andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu's|Zeus |Googlebot|Googlebot-Mobile|Googlebot-Image|Mediapartners-Go ogle|Adsbot-Google|WISEbot/1.0|mozilla/4.0|msnbot|psbot|Slur p).* RewriteRule /members(?:/.*) /403.html [I,L] |
|
|
Right?
|
| Back to Top |
|
| |
ac3creative Newbie

Joined: 09 June 2006 Location: United Kingdom
Online Status: Offline Posts: 14
|
| Posted: 16 June 2006 at 6:32am | IP Logged
|
|
|
ok, just noticed i made a mistake but can't find how to edit the post
the final line of the code is missing a ? infront of /403.html
|
| Back to Top |
|
| |
Lexey Moderator Group

Joined: 15 August 2002 Location: Russian Federation
Online Status: Offline Posts: 8118
|
| Posted: 16 June 2006 at 7:43am | IP Logged
|
|
|
Right (with ad addition of the missing question mark).
|
| Back to Top |
|
| |
ac3creative Newbie

Joined: 09 June 2006 Location: United Kingdom
Online Status: Offline Posts: 14
|
| Posted: 02 July 2006 at 8:19am | IP Logged
|
|
|
Thanks for all the help guys, one last question though
Having set it all up as suggested and advised, if i use wannabrowser.com i get different results for indexing the folder itself, or a specifc file in the folder.
If i ask it to read the folder, i get the desired 403 returned, however, if i ask it to read a file from within the folder i get shown the file and not a 403?
Is this simply an issue with wannabrowser.com, or will Google still be able to read files in the folder if i am linking directly to them
Thanks in advance
Iain
|
| Back to Top |
|
| |
Lexey Moderator Group

Joined: 15 August 2002 Location: Russian Federation
Online Status: Offline Posts: 8118
|
| Posted: 03 July 2006 at 3:59am | IP Logged
|
|
|
Quote:
If i ask it to read the folder, i get the desired 403 returned, however, if i ask it to read a file from within the folder i get shown the file and not a 403?
|
|
|
Show me IIS log records corresponding to requests to a folder and a file (and enable User Agent logging before that).
Quote:
Is this simply an issue with wannabrowser.com, or will Google still be able to read files in the folder if i am linking directly to them.
|
|
|
Why are you talking about direct linking here? Your rule should protect against specific user agents. Not against direct linking.
|
| Back to Top |
|
| |
ac3creative Newbie

Joined: 09 June 2006 Location: United Kingdom
Online Status: Offline Posts: 14
|
| Posted: 03 July 2006 at 5:06am | IP Logged
|
|
|
OK, lost lol
Essentailly i need to know if google follows a link from a page to a file within this 'blocked' folder, will it still be able to read it or will it be shown the 403 as desired?
If it gets a 403, all is good, if it can still see the files within the folder after following a direct link then i need to think of something else
|
| Back to Top |
|
| |
Lexey Moderator Group

Joined: 15 August 2002 Location: Russian Federation
Online Status: Offline Posts: 8118
|
| Posted: 04 July 2006 at 6:25pm | IP Logged
|
|
|
What do you mean by "google follows a link from a page to a file within this 'blocked' folder"? If google will try to index this link then it will receive 403 (if google indexer's user agent is in your block list).
But if someone will try to open this link from a google search result it will work.
|
| Back to Top |
|
| |