Active TopicsActive Topics  Display List of Forum MembersMemberlist  HelpHelp   RegisterRegister  LoginLogin
ISAPI_Rewrite 2.x
 Helicon Tech : ISAPI_Rewrite 2.x
Subject Topic: Using multiple RewriteCond’s (Topic Closed Topic Closed) Post ReplyPost New Topic
Author
Message << Prev Topic | Next Topic >>
dirk
Newbie
Newbie


Joined: 13 April 2003
Location: Belgium
Online Status: Offline
Posts: 2
Posted: 13 April 2003 at 5:56pm | IP Logged  

Hello gurus,

I am trying to lock out a number of spambots, rogue spiders and leech engines from our site. Do I have to create a rewrite condition/rule pair for every user agent, or can I just put a list of conditions one after another, using some "OR"-operator as in Apache mod_rewrite ?
Eg. RewriteCond user-agent: .*grub-client.* [or]
RewriteCond user-agent: .*wget.* [or]
RewriteCond user-agent: .*Teleport.*
- (some more agents) -
RewriteRule .* /block.htm

Thanx in advance for your replies,

Dirk
Back to Top View dirk's Profile Search for other posts by dirk
 
Yaroslav
Admin Group
Admin Group


Joined: 15 August 2002
Online Status: Offline
Posts: 6519
Posted: 14 April 2003 at 3:33am | IP Logged  

You can use alternatives in the regular expression. For example thi is a good rule to block bad bots:

RewriteCond User-Agent: (?:Alexibot|asterias|BackDoorBot|Black.Hole|BlackWidow|BlowFish|BotALot|BuiltBotTough|Bullseye|BunnySlippers|Cegbfeieh|CheeseBot|CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ia_archiver|ImagesStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|Internet\sNinja|JennyBot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|larbin|LeechFTP|LexiBot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|NetZIP|NICErsPRO|NPBot|Octopus|Offline.Explorer|Offline\sExplorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|pavuk|pcBrowser|ProPowerBot/2.14|ProWebWalker|ProWebWalker|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|SpankBot|spanner|SuperBot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|TightTwatBot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|TurnitinBot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumBot|WebReaper|WebSauger|Website\seXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu's|Zeus).*
RewriteRule .* /block.htm



__________________
Yaroslav Govorunov,
Helicon Tech
Back to Top View Yaroslav's Profile Search for other posts by Yaroslav Visit Yaroslav's Homepage
 
dirk
Newbie
Newbie


Joined: 13 April 2003
Location: Belgium
Online Status: Offline
Posts: 2
Posted: 14 April 2003 at 11:50am | IP Logged  

Seems to work as a charm. Thanx, Yaroslav. I have put this
fine little piece of software on the purchase list of our company.
Back to Top View dirk's Profile Search for other posts by dirk
 
ozbiz
Newbie
Newbie


Joined: 31 March 2003
Location: Australia
Online Status: Offline
Posts: 34
Posted: 18 September 2003 at 4:50pm | IP Logged  

Thanks Yaroslav, this works great.

I have got the rule in the helicon directory to cover multiple sites.

The logs seem to show this works for Art-Online.com+0.9(Beta) but not for Program+Shareware+1.0.3

Here is the rule I am using. I seem to remember reading something about + in my sleep. ;)

#spambots
RewriteCond User-Agent: (?:Alexibot|Art-Online|asterias|BackDoorbot|Black.Hole|BlackWidow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers|Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ImagesStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|Internet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|larbin|LeechFTP|Lexibot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\sExplorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|pavuk|pcBrowser|Program+Shareware+1.0.3|ProPowerbot/2.14|ProWebWalker|ProWebWalker|psbot/0.1|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanner|Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumbot|WebReaper|WebSauger|Website\seXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu's|Zeus).*
RewriteRule .* /block.asp



__________________
Jim
Back to Top View ozbiz's Profile Search for other posts by ozbiz Visit ozbiz's Homepage
 
Yaroslav
Admin Group
Admin Group


Joined: 15 August 2002
Online Status: Offline
Posts: 6519
Posted: 19 September 2003 at 6:21am | IP Logged  

#spambots
RewriteCond User-Agent: (?:Alexibot|Art-Online|asterias|BackDoorbot|Black.Hole|BlackWidow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers|Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ImagesStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|Internet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|larbin|LeechFTP|Lexibot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\sExplorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|pavuk|pcBrowser|Program+Shareware+1.0.3|ProPowerbot/2.14|ProWebWalker|ProWebWalker|psbot/0.1|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanner|Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumbot|WebReaper|WebSauger|Website\seXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu's|Zeus|Program\sShareware).*
RewriteRule .* /block.asp



__________________
Yaroslav Govorunov,
Helicon Tech
Back to Top View Yaroslav's Profile Search for other posts by Yaroslav Visit Yaroslav's Homepage
 
Yaroslav
Admin Group
Admin Group


Joined: 15 August 2002
Online Status: Offline
Posts: 6519
Posted: 19 September 2003 at 6:33am | IP Logged  

Sorry, I have missed something. Heer is another one:

#spambots
RewriteCond User-Agent: (?:Alexibot|Art-Online|asterias|BackDoorbot|Black.Hole|BlackWidow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers|Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ImagesStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|Internet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|larbin|LeechFTP|Lexibot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\sExplorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|pavuk|pcBrowser|Program\sShareware\s1|ProPowerbot/2.14|ProWebWalker|ProWebWalker|psbot/0.1|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanner|Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumbot|WebReaper|WebSauger|Website\seXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu's|Zeus).*
RewriteRule .* /block.asp



__________________
Yaroslav Govorunov,
Helicon Tech
Back to Top View Yaroslav's Profile Search for other posts by Yaroslav Visit Yaroslav's Homepage
 
ozbiz
Newbie
Newbie


Joined: 31 March 2003
Location: Australia
Online Status: Offline
Posts: 34
Posted: 19 September 2003 at 5:36pm | IP Logged  

Thanks Yaroslav,

You are a legend.

 



__________________
Jim
Back to Top View ozbiz's Profile Search for other posts by ozbiz Visit ozbiz's Homepage
 
ac3creative
Newbie
Newbie


Joined: 09 June 2006
Location: United Kingdom
Online Status: Offline
Posts: 14
Posted: 09 June 2006 at 4:44am | IP Logged  

Sorry to reserect a real oldie, but as i noticed it referenced around the forum i figured it made more sense than starting again.

I'm trying to block some bots out of certain folders / subdomains and whilst the coding above shows me how to block them from the site total, i only want them kept away from specific areas of the site.

Ideally i am trying to block all bots, so is there a better way to block them all rather than having listing them all?

I've been advised to show them a 403 rather than direct them off to a specific page, woud i simply edit the RewriteRule .* /block.asp to thye default 403 url of my server?

Thanks in advance

Back to Top View ac3creative's Profile Search for other posts by ac3creative
 
Lexey
Moderator Group
Moderator Group


Joined: 15 August 2002
Location: Russian Federation
Online Status: Offline
Posts: 8118
Posted: 11 June 2006 at 9:36am | IP Logged  

ac3creative wrote:

I'm trying to block some bots out of certain folders / subdomains and whilst the coding above shows me how to block them from the site total, i only want them kept away from specific areas of the site.




Rule could be like this:
RewriteCond User-Agent: (?:Alexibot|asterias|BackDoorBot|...others here)
RewriteCond Host: (subdomain1|sobdomain2|...)
RewriteRule /(?:folder1|folder2|...) /403.asp [I,L]

Quote:

Ideally i am trying to block all bots, so is there a better way to block them all rather than having listing them all?



And how are you going to distinguish a bot from non-bot?
Back to Top View Lexey's Profile Search for other posts by Lexey
 
ac3creative
Newbie
Newbie


Joined: 09 June 2006
Location: United Kingdom
Online Status: Offline
Posts: 14
Posted: 11 June 2006 at 5:24pm | IP Logged  

Hi Lexey,

and thanks for the quick attention to this.

As for the knowing whether a bot is a bot, i'm really not sure, i've never had to do anything like this previously as i'm generally trying to get the bots to the site and not blocking them.

I guess if i have to list them all then so be it, i'll pull the list you posted above and add any others as i come across them.

If i use the coding above:

Quote:


Rule could be like this:
RewriteCond User-Agent: (?:Alexibot|asterias|BackDoorBot|...others here)
RewriteCond Host: (subdomain1|sobdomain2|...)
RewriteRule /(?:folder1|folder2|...) /403.asp [I,L]


Do i take it that if i simply place the folder at www.domain.com/folder-to-block/, i simply add the name of the folder where you have (?:folder1) and get rid of the RewriteCond Host: (subdomain1)?

Do i also assume that if i place the 'subdomain' into the required space, that the folder that would be blocked would be the one of that name on the subdomain?

I'm not worried about using a subdomain if i can get away with blocking a folder out, so would i be correct in assuming that the following would work on a folder placed on the main domain?

Quote:


RewriteCond User-Agent: (?:Alexibot|asterias|BackDoorBot|...others here)
RewriteRule /(?:folder1|folder2|...) /403.asp [I,L]

Thanks again

Back to Top View ac3creative's Profile Search for other posts by ac3creative
 
Lexey
Moderator Group
Moderator Group


Joined: 15 August 2002
Location: Russian Federation
Online Status: Offline
Posts: 8118
Posted: 12 June 2006 at 12:23pm | IP Logged  

All your assumptions are correct.
Back to Top View Lexey's Profile Search for other posts by Lexey
 
ac3creative
Newbie
Newbie


Joined: 09 June 2006
Location: United Kingdom
Online Status: Offline
Posts: 14
Posted: 15 June 2006 at 1:37pm | IP Logged  

OK, think i have it all sorted now with my host, but could someone please check that this coding within my httpd.ini file is correct before i start sending traffic to the site as i absolutely do not want any robots having access to the /members/ folder on my domain.

Quote:

[ISAPI_Rewrite]
 
# 3600 = 1 hour
CacheClockRate 3600
 
RepeatLimit 32
 
# Block external access to the httpd.ini and httpd.parse.errors files
RewriteRule /httpd(?:\.ini|\.parse\.errors).* / [F,I,O]
# Block external access to the Helper ISAPI Extension
RewriteRule .*\.isrwhlp / [F,I,O]
#botblock
RewriteCond User-Agent: (?:Alexibot|Art-online|asterias|BackDoorbot|Black.Hole|Black Widow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers| Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|CopyRightCheck|co smos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch| EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|E xpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|Fro ntPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Gra fula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|Image sStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|I nternet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spide r|Keyword.Density|larbin|LeechFTP|Lexibot|libWeb/clsHTTP|Lin kextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass \sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mis ter.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NE WT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampi re|NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\s Explorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|p avuk|pcBrowser|Program\sShareware\s1|ProPowerbot/2.14|ProWeb Walker|ProWebWalker|psbot/0.1|QueryN.Metasearch|ReGet|RepoMo nkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanne r|Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Tel eport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|Tight Twatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|Tr ue_Robot|turingos|Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|W ebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebF etch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|Web Leacher|WebmasterWorldForumbot|WebReaper|WebSauger|Website\s eXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebSt ripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow| [Ww]eb[Bb] andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu's|Zeus |Googlebot|Googlebot-Mobile|Googlebot-Image|Mediapartners-Go ogle|Adsbot-Google|WISEbot/1.0|mozilla/4.0|msnbot|psbot|Slur p)
RewriteRule /(?:members) /403.html [I,L]

 
Thanks in advance
Back to Top View ac3creative's Profile Search for other posts by ac3creative
 
Lexey
Moderator Group
Moderator Group


Joined: 15 August 2002
Location: Russian Federation
Online Status: Offline
Posts: 8118
Posted: 16 June 2006 at 6:26am | IP Logged  

Rule should be like:

RewriteCond User-Agent: (?:Alexibot|Art-online|asterias|...).*
RewriteRule /members(?:/.*)? /403.html [I,L]
Back to Top View Lexey's Profile Search for other posts by Lexey
 
ac3creative
Newbie
Newbie


Joined: 09 June 2006
Location: United Kingdom
Online Status: Offline
Posts: 14
Posted: 16 June 2006 at 6:30am | IP Logged  

so, it should be:

Quote:

[ISAPI_Rewrite]
 
# 3600 = 1 hour
CacheClockRate 3600
 
RepeatLimit 32
 
# Block external access to the httpd.ini and httpd.parse.errors files
RewriteRule /httpd(?:\.ini|\.parse\.errors).* / [F,I,O]
# Block external access to the Helper ISAPI Extension
RewriteRule .*\.isrwhlp / [F,I,O]
#botblock
RewriteCond User-Agent: (?:Alexibot|Art-online|asterias|BackDoorbot|Black.Hole|Black Widow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers| Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|CopyRightCheck|co smos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch| EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|E xpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|Fro ntPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Gra fula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|Image sStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|I nternet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spide r|Keyword.Density|larbin|LeechFTP|Lexibot|libWeb/clsHTTP|Lin kextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass \sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mis ter.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NE WT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampi re|NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\s Explorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|p avuk|pcBrowser|Program\sShareware\s1|ProPowerbot/2.14|ProWeb Walker|ProWebWalker|psbot/0.1|QueryN.Metasearch|ReGet|RepoMo nkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanne r|Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Tel eport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|Tight Twatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|Tr ue_Robot|turingos|Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|W ebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebF etch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|Web Leacher|WebmasterWorldForumbot|WebReaper|WebSauger|Website\s eXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebSt ripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow| [Ww]eb[Bb] andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu's|Zeus |Googlebot|Googlebot-Mobile|Googlebot-Image|Mediapartners-Go ogle|Adsbot-Google|WISEbot/1.0|mozilla/4.0|msnbot|psbot|Slur p).*
RewriteRule /members(?:/.*) /403.html [I,L]

 
Right?
Back to Top View ac3creative's Profile Search for other posts by ac3creative
 
ac3creative
Newbie
Newbie


Joined: 09 June 2006
Location: United Kingdom
Online Status: Offline
Posts: 14
Posted: 16 June 2006 at 6:32am | IP Logged  

ok, just noticed i made a mistake but can't find how to edit the post

the final line of the code is missing a ? infront of /403.html

Back to Top View ac3creative's Profile Search for other posts by ac3creative
 
Lexey
Moderator Group
Moderator Group


Joined: 15 August 2002
Location: Russian Federation
Online Status: Offline
Posts: 8118
Posted: 16 June 2006 at 7:43am | IP Logged  

Right (with ad addition of the missing question mark).
Back to Top View Lexey's Profile Search for other posts by Lexey
 
ac3creative
Newbie
Newbie


Joined: 09 June 2006
Location: United Kingdom
Online Status: Offline
Posts: 14
Posted: 02 July 2006 at 8:19am | IP Logged  

Thanks for all the help guys, one last question though

Having set it all up as suggested and advised, if i use wannabrowser.com i get different results for indexing the folder itself, or a specifc file in the folder.

If i ask it to read the folder, i get the desired 403 returned, however, if i ask it to read a file from within the folder i get shown the file and not a 403?

Is this simply an issue with wannabrowser.com, or will Google still be able to read files in the folder if i am linking directly to them

Thanks in advance

Iain

Back to Top View ac3creative's Profile Search for other posts by ac3creative
 
Lexey
Moderator Group
Moderator Group


Joined: 15 August 2002
Location: Russian Federation
Online Status: Offline
Posts: 8118
Posted: 03 July 2006 at 3:59am | IP Logged  

Quote:

If i ask it to read the folder, i get the desired 403 returned, however, if i ask it to read a file from within the folder i get shown the file and not a 403?


Show me IIS log records corresponding to requests to a folder and a file (and enable User Agent logging before that).

Quote:

Is this simply an issue with wannabrowser.com, or will Google still be able to read files in the folder if i am linking directly to them.


Why are you talking about direct linking here? Your rule should protect against specific user agents. Not against direct linking.
Back to Top View Lexey's Profile Search for other posts by Lexey
 
ac3creative
Newbie
Newbie


Joined: 09 June 2006
Location: United Kingdom
Online Status: Offline
Posts: 14
Posted: 03 July 2006 at 5:06am | IP Logged  

OK, lost lol

Essentailly i need to know if google follows a link from a page to a file within this 'blocked' folder, will it still be able to read it or will it be shown the 403 as desired?

If it gets a 403, all is good, if it can still see the files within the folder after following a direct link then i need to think of something else

Back to Top View ac3creative's Profile Search for other posts by ac3creative
 
Lexey
Moderator Group
Moderator Group


Joined: 15 August 2002
Location: Russian Federation
Online Status: Offline
Posts: 8118
Posted: 04 July 2006 at 6:25pm | IP Logged  

What do you mean by "google follows a link from a page to a file within this 'blocked' folder"? If google will try to index this link then it will receive 403 (if google indexer's user agent is in your block list).
But if someone will try to open this link from a google search result it will work.
Back to Top View Lexey's Profile Search for other posts by Lexey
 

Page of 2 Next >>
  Post ReplyPost New Topic
Printable version Printable version

Forum Jump
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot delete your posts in this forum
You cannot edit your posts in this forum
You cannot create polls in this forum
You cannot vote in polls in this forum