Next Previous Contents

2. Introduction

ht://Check is more than a link checker. It's a console application written for GNU/Linux systems in C++ and derived from the best search engine available on the Internet for free (GNU GPL): ht://Dig.

However, ht://Dig is not needed in order to install and run ht://Check, which is therefore totally independent: the only relationship existing between these two applications, is that ht://Check's code is partially derived from ht://Dig.

ht://Check can retrieve information through HTTP/1.1 and store the information in a MySQL database, and it is particularly suitable for small Internet domains or Intranet.

Its purpose is to help a Webmaster managing one or more related sites: after a "crawl", ht://Check creates a powerful data source made up of information based on the retrieved documents. The kind of information available to the ht://Check user includes:

A skinny report is given by the program htcheck, however at the current situation most of the information is given by the PHP interface which comes with the package and that is able to query the database built by the htcheck program in a previously made crawl. It goes without saying that you need a Web server to use it, and of course PHP with the MySQL connectivity module.

By the way, as long as after a crawl ht://Check produces a database on a MySQL server, it's needless to say that every user theoretically could build its own information retrieval interface to this database; you only need to know the structure of it, its tables and fields, and the relationships among them. Other solutions are represented by independent scripts written by using common scripting languages with MySQL connectivity modules (i.e. Perl and Python), or faster programs written in C or C++ using MySQL API or wrapper libraries (such as MySQL++ or dbconnect), or other Web driven solutions like JSP, ColdFusion. There exists an interface to ht://Check for the Roxen Internet Software (http://www.roxen.com/) written by Michael Stenitzer (stenitzer@eva.ac.at).

Something that must not be underestimated, is that ht://Check theoretically can give the user lots of information regarding the structure of a Web domain: in a few words it can be used for Web Structure Mining purposes.

ht://Check is distributed under the GNU General Public License (GPL). See the Copying section for license information.

ht://Check main Website is at http://htcheck.sourceforge.net/.


Next Previous Contents