User Agent | | Date Added |
JavaCrawler | | 7/02/2004 23:11:28 |
The JavaCrawler, a prototype next generation MetaCrawler written in Java, supports most of the features already present in the MetaCrawler. |
JoBo Java Web Robot | | 8/03/2004 0:40:18 |
JoBo is a web site download tool. The core web spider can be used for any purpose. User agent can be changed by user |
Jobot | | 8/03/2004 0:43:30 |
Its purpose is to generate a Resource Discovery database. Intended to seek out sites of potential "career interest". Hence - Job Robot. |
JoeBot | | 8/03/2004 0:44:38 |
JoeBot is a generic web crawler implemented as a collection of Java classes which can be used in a variety of applications, including resource discovery, link validation, mirroring, etc. It currently limits itself to one visit per host per minute. |
JoobleBot | | 23/12/2012 14:54:59 |
Jooble indexes jobs from the web. |
JumpStation | | 8/03/2004 0:47:20 |
|
KDD-Explorer | | 8/03/2004 0:52:56 |
KDD-Explorer is used for indexing valuable documents which will be retrieved via an experimental cross-language search engine, CLINKS. This robot was designed in Knowledge-bases Information processing Laboratory, KDD R&D Laboratories, 1996-1997 |
Keyword Density | | 8/09/2006 0:57:44 |
? |
Kilroy | | 8/03/2004 0:55:58 |
Used to collect data for several projects. Runs constantly and visits site no faster than once every 90 seconds. |
Knowledge.com | | 7/01/2005 17:15:41 |
|
KO_Yappo_Robot | | 8/03/2004 1:01:34 |
The KO_Yappo_Robot robot is used to build the database for the Yappo search service by k,osawa (part of AOL). The robot runs random day, and visits sites in a random order. |
KomodiaBot | | 26/11/2012 14:15:03 |
|
Krugle | | 29/04/2006 23:52:36 |
The Krugle spider crawls web pages, documents, and archives looking for technical information that would be of value to programmers.
We use the results of the crawl to provide a vertical search service for programmers. Our product page explains how Krugle helps programmers find code and answers to technical questions.
Our spider is based on Nutch (version 0.8 as of April 2006), and uses various open source components to pull down publicly available information via HTTP(S) and FTP.
To be polite, our spider tries to only access a given domain via one thread at any time. In addition, we impose a 5 second minimum delay between requests. |
LabelGrabber | | 8/03/2004 1:05:03 |
The PICS label grabber application searches the WWW for PICS Labels and submits them to the label bureau of your choice. It can be used to populate a label bureau for testing, or for any other label bureau purpose. |
Larbin | | 8/02/2004 0:11:21 |
Larbin is a web crawler (also called (web) robot, spider, scooter...). It is intended to fetch a large number of web pages to fill the database of a search engine. |
LexiBot | | 8/09/2006 11:48:18 |
|
linkdex.com | | 1/12/2010 13:53:05 |
Linkdex is an enterprise-class platform that combines team management software with search engine optimization tools. Sometimes we crawl websites to understand backlink profiles or gather information for our users so they can improve their websites and SEO strategies.
|
LinkScan | | 7/02/2004 14:12:22 |
LinkScan is an industrial-strength link checking and website management tool. LinkScan checks links, validates HTML and creates site maps |
LivelapBot | | 29/08/2014 9:01:20 |
Livelap is a content discovery app that indexes web content. Probably you have seen the Livelapbot/0.1 or LivelapBot/0.2 crawler in your server logs. LivelapBot can visit a page if it is shared on social media, and as part of its RSS/page crawling schedule.
What does LivelapBot collect
Livelap indexes web content and makes meta data and a link to your content available in livelap.com and in the Livelap app. For indexing we only use official HTML and media meta tags in your page. We don't scrape the contents of your articles. The following fields are used for indexing: Title Description Author Publication date Type of content (article, photo, video, etc) Images (og, twitter and other standard tags) Videos (og, twitter and other standard tags) RSS links Detect whether showing page in iframe is allowed |
Lockon | | 8/03/2004 1:15:24 |
This robot gathers only HTML documents. |
logo.gif Crawler | | 24/07/2005 22:51:36 |
meta-indexing engine for corporate logo graphics The robot runs at irregular intervals and will only pull a start page and its associated /.*logo\.gif/i (if any). It will be terminated once a statistically significant number of samples has been collected.
logo.gif is part of the design diploma of Markus Weisbeck, and tries to analyze the abundance of the logo metaphor in WWW corporate design. The crawler and image database were written by Sevo Stille and Peter Frank of the Institut für Neue Medien, respectively. |
lufsbot | | 13/07/2013 9:44:15 |
Fake search engine (no results in search). |
Mac WWWWorm | | 24/07/2005 22:53:56 |
a French Keyword-searching robot for the Mac The author has decided not to release this robot to the public |
Magpie | | 24/07/2005 22:54:50 |
Used to obtain information from a specified list of web pages for local indexing. Runs every two hours, and visits only a small number of sites. |
meanpathbot | | 28/06/2013 13:56:19 |
meanpath is a new search engine that allows software developers to access detailed snapshots of millions of websites without having to run their own crawlers. Our clients use the information we gather from your site to help solve problems in these areas: - Semantic analysis - Linguistics - Identity theft protection - Malware and virus analysis
|
MediaFox | | 24/07/2005 23:01:53 |
The robot is used to index meta information of a specified set of documents and update a database accordingly. |
MerzScope | | 24/07/2005 23:03:08 |
Robot is part of a Web-Mapping package called MerzScope, to be used mainly by consultants, and web masters to create and publish maps, on and of the World wide web. |
Mnogosearch | | 7/02/2004 23:13:37 |
mnoGoSearch (formerly known as UdmSearch) is a full-featured web search engine software for intranet and internet servers. mnoGoSearch for UNIX is a free software covered by the GNU General Public License and mnoGoSearch for Windows is a commercial search software version. |
Motor | | 25/07/2005 0:48:09 |
The Motor robot is used to build the database for the www.webindex.de search service operated by CyberCon. The robot is under development - it runs in random intervals and visits site in a priority driven order (.de/.ch/.at first, root and robots.txt first) |
MS Sharepoint Portal Server | | 7/02/2004 23:12:33 |
|
MSNBot Media | | 13/06/2006 0:06:56 |
|
Muncher | | 25/07/2005 0:50:11 |
Used to build the index for www.goodlookingcooking.co.uk. Seeks out cooking and recipe pages. |
Muscat Ferret | | 25/07/2005 0:54:52 |
Used to build the database for the EuroFerret |
Mwd.Search | | 25/07/2005 0:55:50 |
Robot for indexing finnish (toplevel domain .fi) webpages for search engine called Fifi. Visits sites in random order. |
NDSpider | | 25/07/2005 0:57:50 |
It is designed to index the web. |
NEC-MeshExplorer | | 24/07/2005 23:04:16 |
The NEC-MeshExplorer robot is used to build database for the NETPLAZA search service operated by NEC Corporation. The robot searches URLs around sites in japan(JP domain). The robot runs every day, and visits sites in a random order.
Prototype version of this robot was developed in C&C Research Laboratories, NEC Corporation. Current robot (Version 1.0) is based on the prototype and has more functions. |
NetCarta WebMap Engine | | 25/07/2005 0:59:30 |
The NetCarta WebMap Engine is a general purpose, commercial spider. Packaged with a full GUI in the CyberPilo Pro product, it acts as a personal spider to work with a browser to facilitiate context-based navigation. The WebMapper product uses the robot to manage a site (site copy, site diff, and extensive link management facilities). All versions can create publishable NetCarta WebMaps, which capture the crawled information. If the robot sees a published map, it will return the published map rather than continuing its crawl. Since this is a personal spider, it will be launched from multiple domains. This robot tends to focus on a particular site. No instance of the robot should have more than one outstanding request out to any given site at a time. The User-agent field contains a coded ID identifying the instance of the spider; specific users can be blocked via robots.txt using this ID. |
NetResearchServer | | 8/02/2004 16:31:37 |
NRS crawls pages all over the world in order to build full-text search indexes and/or to compile lists of search engine forms. |
NetScoop | | 25/07/2005 1:01:18 |
The NetScoop robot is used to build the database for the NetScoop search engine.
The robot has been used in the research project at the Faculty of Engineering, Tokushima University, Japan., since Dec. 1996. |
newscan-online | | 25/07/2005 1:02:31 |
The newscan-online robot is used to build a database for the newscan-online news search service operated by smart information services. The robot runs daily and visits predefined sites in a random order.
This robot finds its roots in a prereleased software for news filtering for Lotus Notes in 1995. |
NextopiaBOT | | 7/02/2004 23:16:00 |
|
NHSE Web Forager | | 25/07/2005 1:03:15 |
to generate a Resource Discovery database |
Nomad | | 25/07/2005 1:04:17 |
Developed in 1995 at Colorado State University. |
NutchCVS | | 7/02/2004 23:18:12 |
When we crawl to populate our index, we advertise the "User-agent" string "NutchOrg". If you see the agent "Nutch" or "NutchCVS", that's probably a developer testing a new version of our robot, or someone running their own instance. |
Occam | | 25/07/2005 1:08:07 |
The robot takes high-level queries, breaks them down into multiple web requests, and answers them by combining disparate data gathered in one minute from numerous web sites, or from the robots cache.
The robot is a descendant of Rodney, an earlier project at the University of Washington. |
omgilibot | | 31/03/2008 17:10:34 |
crawls forums |
OpenIntelligenceData | | 3/09/2005 17:15:02 |
Open Intelligence Data ™ is a project by Tortuga Group LLC to provide free tools for collecting information for millions of Internet domains. |
Oracle Ultra Search | | 7/02/2004 23:33:14 |
Ultra Search can be used to search across Collaboration Suite Components, corporate Web servers, databases, mail servers, fileservers and Oracle10g Portal instances. |
Orb Search | | 25/07/2005 1:12:12 |
Orbsearch builds the database for Orb Search Engine. It runs when requested. |
Origin | | 7/02/2004 23:40:28 |
Empty user agent |