Tuesday, October 18, 2005

Setting up a Grid – The requirements

Setting up a grid involves a lot of different steps. Even the simplest of all grids would require a very good network infrastructure, storage and computational resources as well as the middleware and resource managers. The other features can be added as required and desired.

Infrastructure Requirements: The Network

Network is the heart of any successful grid implementation. Most grid implementations are using the high speed fiber optics backbone providing speeds up to 40 Gbps. If the network connectivity is not reliable or appropriate bandwidth is not available, the grid applications deployed on top of it should communicate as little as possible and support high latency, or the deployment will fail. For normal data-intensive or low latency application, high speed fiber optics network should be there.

At present, grids are built on high-performance network. According to Berman et. al., by the year 2002, most of those networks had roughly 10Gbps backbone. Examples of such networks include Abilene Network (USA), SuperJanet backbone (UK), GEANT Network (intra-Europe), APAN (Asia Pacific), and others. A given institution is connected via about 1 Gbps link to the backbone and has around a 100 Mbps LAN. This means that there is a 10:1:0.1 Gbps ratio between national, organizational, desktop links.

A Global Terabit Research Network (GTRN) is expected to enhance this ratio many times. GTRN aims to enhance the international, national, organizational, optical desktop and copper desktop link ratios to 1000:1000:100:10:1 Gbps, a 100 times increase from the current state. The NSF funded TeraGrid project, has one of the most striking network infrastructure with four locations spread across USA linked with a backbone of 40 Gbps.

Hardware Requirements: Shared Resources

Resources are the main building block of the grid. The whole concept was introduced to share resources, so there should be redundant and highly-available resources. Even simple grid systems include a combination of high speed and high capacity data storage in addition to a large amount of computational power. Additionally, a scientific grid consists of specialized analysis, visualization and scientific equipments. Other specialized grids have other components as well.

Software Requirements: The Grid Middleware

The software layer between the operating system and the applications is termed as middleware. It provides a variety of services required by an application to function correctly. Middleware has recently re-emerged as a means of integrating software applications running in distributed heterogeneous environments. Middleware thus refers to the software which is common to multiple applications and builds on the network transport services to enable ready development of new applications and network services. CORBA, for example, defines a middleware standard.

In a grid, the middleware is used to hide the heterogeneous nature and provide users and applications with a homogeneous and seamless environment by providing a set of standardized interfaces to a variety of services [9]. With the use of service oriented architecture in grid computing (see next section for details), this middleware consists of the services commonly used by the grid applications like authentication, resource access and management, etc. Examples of such middleware include the Globus toolkit.

With so many resources available, these need to be looked up by probable users and need to be managed properly. This task is performed by resource managers which manage resources like processing power by distributing it among the many applications depending upon their priority. GRAM, the Globus Resource Allocation Manager is one such resource manager which is an integral part of the Globus toolkit.

Considering that the main purpose of most grid implementations is distributed processing, we also need a meta-scheduler most of the times.

Most of the grids would also require specialized software applications making use of the available grid resources in the most optimal way. These applications are usually built using the services provided by the middleware, resource managers, job schedulers and other components.

Software Requirements: Grid Portals - The User Interface for the Grid

In order to provide easy access to the grid services and the resources, a web portal like interface to grid was introduced called The Grid Portal. Just like a web portal allows users to access various resources via a web interface, a grid portal provides access to grid resources. Grid portal utilizes the web browser as a thin client and thus has the advantage of having minimal setup time on the machines of the users. A typical grid portal provides functionality to authenticate users, permit them to access remote resources, help them make decisions about scheduling jobs, and allow users to access and manipulate grid enabled databases and file systems. Grid portal access can also be personalized by the use of profiles, which are created and stored for each portal user.

Many teams have come up with various applications and projects helpful in the development of the grid portals which include NPACI Hotpage, OGCE, SDSC Grid Port Toolkit, Mississippi Computational Web Portal, Lattice Portal, Grid Portal Development Kit and many others.

Grid Certification Authority

The Grid Security Model as described in Globus Grid Security Infrastructure (GSI) is an extension of public key infrastructure (X.509 certificates). In the short term the user generates his short-term proxy using his long-term certificate. Grid implementations thus require presence of a Certification Authority to issue certificates to users and hosts. Grid Certificates are just like the normal certificates used on the Internet and even the same authorities can be used. But due to security considerations, it is expected that grid has its own certification authorities.


No comments: