-
-
the robot.txt is set for search engine .
Using a robots.txt file is easy, but does require access to your server's root location. For instance, if your site is located at:
http://adomain.com/mysite/index.html
you will need to be able to create a file located here:
http://adomain.com/robots.txt
If you cannot access your server's root location you will not be able to use a robots.txt file to exclude pages from your index.
The robots.txt is a TEXT file (not HTML!) which has a section for each robot to be controlled. Each section has a user-agent line which names the robot to be controlled and has a list of "disallows" and "allows". Each disallow will prevent any address that starts with the disallowed string from being accessed. Similarly, each allow will permit any address that starts with the allowed string from being accessed. The (dis)allows are scanned in order, with the last match encountered determining whether an address is allowed to be used or not. If there are no matches at all then the address will be used.
-
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /
The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
There are two important considerations when using /robots.txt:
robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
-
If you cannot access your server's root location you will not be able to use a robots.txt file to exclude pages from your index.
For those who unable to use robots.txt, they can implement it to meta tag. e.g
<meta name="robots" content="noindex, nofollow">
Full explanation of robots.txt and no follow usage.
http://www.seobook.com/robots-txt-vs...obots-nofollow
-
Here is a definitive site http://www.robotstxt.org/robotstxt.html
Here is a link to an generator for robots.txt http://www.basisoft.com/
Make sure you don't exclude search engines from the public areas of your site, you want to keep those available for the public to find.
Thread Information
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|