How Earth Works

Diagram showing the Earth network topology

Overview

The Earth system consists of three components:

  • The Earth daemon, earthd, running on each computer whose file system is to be indexed.
  • A database storing the index
  • A web application allowing to browse and query the index using a Web browser

This setup is illustrated in the diagram to the right.

A: Indexing Local File Systems

The Earth daemon runs on each host you want to index, continuously scanning a configured set of directories for updates.

B: Index Storage and Presentation

A single SQL database is used to store the index generated by all the Earth daemons. That SQL database is queried by the Earth Web application in response to client requests, providing various views of the index information.

C: Accessing the Information Using a Web Browser

People across your organization can view and query the index just by pointing their Web browser to the URL of the Earth Web application.

Recommended Setup

Running Earth On a Single Machine

There is no requirement to run the individual components of Earth on different computers. It is fine to run the Earth daemon, the database, the Web application, and the Web browser all on the same host. In fact, this is the setup we use for developing and testing Earth.

Running Earth In a Large Organization

However, Earth's primary purpose is to index very large file systems (in the Terabyte range.) In this situation, it is advisable to run the database on a dedicated, fast system since it will be under high load whenever large amounts of files or directories are created or deleted on an indexed file system.

You should also avoid using the Earth daemon to scan remote file systems. Instead, run the daemon on each host which has local disks you want to index. Usually this means running it on all your file servers, but you might also want to consider indexing workstations or other servers.


Attachments