|
|
GrangeNet
Briefing
Document
|
|
 |
 |
The Grid
Grid computing has become the leading candidate for so called next
generation computing. It has arisen in response to the convergence of
many current trends, in particular:
- the increase in available bandwidth,
- the increase in computing power on desktops,
- the improvement in virtualizing networks,
- the growing confidence with cluster and distributed computing, and
- the vast increase in the amount of data being created and becoming
available on-line.
All these factors have led to a move to de-localize computing. Researchers doing
major computations or data analyses already use remote systems and/or
obtain data from remote sources, and the vision for the future is that
intelligent software will automatically move computations or data to the
most appropriate location, taking into account both computation and network
costs, before returning a result to the user.
This vision is what has come to be known as Grid computing.
The Grid computing paradigm provides for both local access to remote data
and computational resources and remote access to remote resources.
Depending on the actual work being done,
programs can either import remote data or export computing load.
The Grid is sometimes called the next-generation Web. The Web makes information
available in a transparent and user-friendly way. On the other hand, the Grid
goes one step further in that it enables members of a dynamic,
multi-institutional virtual organisation to share distributed computing
resources to solve an agreed set of problems in a managed and coordinated
fashion. When using the Web, users are often unaware and unconcerned as to
where the information they are viewing is located. Similarly with the Grid,
users should be unaware whether they are using the
computer or data on their own desktop or any other computer or resource
connected to the international network. Users get the resources they need,
anytime, and from anywhere, with the complexity of the Grid infrastructure
being hidden from them.
Experience and know-how has to build up in the area of linking tens of
thousands of commodity components combined into tiers of variant complexity
(from tens of thousands to a few tens of nodes linked to the Grid). These
managed components include CPU, disk, network switches, huge mass storage,
plus the needed manpower and other resources to make the whole setup function.
Issues of scale, efficiency and performance, resilience, fault tolerance,
total cost (acquisition, maintenance, operation), usability, and security
have to be taken into account.
The technology needed to implement the grid includes new protocols, services,
and APIs for secure resource access, resource management, fault detection,
and communication. Moreover, one introduces application concepts such as
virtual data, smart instruments, collaborative design spaces, and
meta-computations.
In order to achieve this, a global standard for connectivity needs to be
established. This is being formed at the moment, with most developers using
the Globus Toolkit as the basis for
further enhancements. In order to
become a participant, The University of Melbourne needs to provide
comprehensive co-ordinated support to our researchers working in this area.
In Australia, the federal government has funded GrangeNet, among others, as
a vehicle for allowing Australian universities to develop both their own and
collaborative Grid projects.
GrangeNet
- What is GrangeNet?
- Physically, it's a high-speed trunk joining member institutions.
- It also has a major interest in supporting Grid deployment
- Follows the lead of the US and EU where there are many large
government sponsored projects.
- All traffic between members is allowed (i.e. deemed educational)
- Why has it been established?
- Improve collaboration between Universities and between researchers
- Support distributed visualization
- Improve data-sharing
- Support "Capability Computing"
- Support more efficient computing
- But...
- Need to deliver to the desktop
- Capability must come first - users follow
- Expensive - need to plan ahead
Local Network
- Current
- Redundant central core
- Mirrors - e.g. Softdist
- Proxy servers
- up to 100Mbit connection (many still 10Mbit)
- Trends
- Data rate still growing
- Data generation exploding
- Video conferencing to desktop
- Multimedia teaching
Observations
- "
There is one big problem with this model, and that is that the vast
majority of internal company networks simply would not be able to cope
with the load of disk-data transfers going across them. Many database
applications are written with the expectation of having 100MB/s bandwidth
from the computer to its disks; a ``standard'' ethernet network can
reasonably deliver 4Mb/s ( = 0.5MB/s), and an ordinary phone line can
handle less than 1% of that."
(
A geeks eyeview)
- "The goal of our next generation hardware project, which is already well
under way, is to achieve 300 MB/sec. throughput from a disk array of 9 to
15 disks, and to deliver that entire throughput out onto your network
through multiple Gigabit Ethernet cards. We believe that the hardware
needed to achieve this goal is evolutionary, not revolutionary, and that
there is an excellent chance we will complete it before the end of 2002."
(
Raidzone technology roadmap.)
- "The combination of Internet and Intranet traffic, distributed servers,
and the increasing use of multicast applications is changing network
traffic patterns. In the past, approximately 80 percent of network traffic
was local, while 20 percent was routed over the backbone. These changing
traffic patterns are driving the need for a higher-speed communication
technology. "
"Gigabit Ethernet provides a means to scale these existing networks to
higher speeds, particularly at the backbone, but Ethernet and fast Ethernet
will continue to be widely used for enterprise-network desktop connections
in the foreseeable future. However, this migration is highly dependent on
running Gigabit Ethernet over Cat5 UTP copper cable. "
(
NCAR Office Roadmap")
Network performance
|
10Mbps |
100Mbps |
1Gbps |
10Gbps |
| Peak Transfer Rate |
1 MB/s |
10 MB/s |
100 MB/s |
1 GB/s |
| Effective Transfer Rate |
300 KB/s |
3 MB/s |
30 MB/s |
300 MB/s |
| Time to Download |
|
| 1 MB |
3.3s |
0.33s |
0.03s |
0.003s |
| 10 MB |
33s |
3.3s |
0.33s |
0.03s |
| 1 GB |
55m |
5.5m |
33s |
3.3s |
| 1 TB |
38d 5hr |
3d 20hr |
9hr 10m |
55m |
| 1 PB |
105yr |
10.5yr |
382d |
38d 5hr |
Useful Analogies and Statistics (approximate)
- 1 MByte - 1 floppy disk, 1 screen image (raw)
- 10 MByte - Typical Landsat image
- 1 GByte - more than a CD, 1 quarter of a DVD
- 1 TByte - Overnight full backup in Thomas Cherry Machine room
- 1 MB/s - floppy transfer rate
- 5 MB/s - CD transfer rate
- 10-20MB/s - local Hard-drive transfer rate
- 100 MB/s - SAN (fibre-channel) transfer rate
- 170 MB/s - VGA port (1024pixels x 768pixels x 3Bytes x 72Hz)
- Video :-
- low-res (320x220 mpeg) 10MB/min
- Hi-res (800x600 .mov) 200-350MB/min
- Hi res (720x576 raw dv) 2GB/min
- Visible Human :-
- Male (low-res) 15GB
- Female (Hi-res) 40GB (5189 images up to 7.7MB)
- Male (Hi-res) 60GB (1871 images @ 15MB)
- Earth Observing System :-
- Terra - 194 GB/day (raw) 850 GB/day (processed)
- Landsat7 - 150 GB/day
- Hubble - 200 GB/yr
(NASA has over 1 PetaByte of EOS data)
- High-Energy Physics :-
- LHC - 27 TB/day (operational 2006)
- Belle - 200 GB/day
- Melbourne Uni Computer Supplies sells ~1800 computers/yr each with a
30 GB hard-drive = 54 TByte of storage (+ 6 TByte of loose discs)
- Total disc capacity at UofM is probably 200 TByte
- We expect 1 TByte discs to become available next year
Comments
- Snowball Effect - more users generate more users - like WWW
(which also started as a research project to share collaborative data)
- We expect that libraries will be major users (digitised data)
- Access to data is vital - future belongs to the Information Rich
- Pervasive computing - computers in everything and everything
on the net
- Capability Computing - non-data aspects can be done already
- Collaboration/AccessGrid
- Video conferencing
- Visual data sharing for collaboration and/or teaching
- Tele-presence, tele-medicine, tele-teaching, ...
- Digital Rights Management by certificate based authentication
for access. Copyright etc. up to users (watermarks, etc.)
GrangeNet/Grid related projects
- AccessGrid
- HDTV standard video conferencing + computer sharing
- Tele-presence
- Remote teaching
- Replica Catalog
- ARC is building test-bed for Physics and Linguistics
- It will require contributions from users to expand
- Like DNS/WWW. Users will deposit data - assigned GUID (accession no.)
access is via bookmarks or portals (using metadata)
- Eventually benefit everybody (just research first in some cases)
- Other University of Melbourne Projects
- Atlas Data Challenge - a High Energy Physics project testing the
global
networking and computing infrastructure for the LHC experiment at CERN.
- Belle - another High Energy Physics project taking data from KEK
(in Japan). The data will be stored at the APAC mass storage facility
and
be made available using Grid technology for analysis at both UofM and
Sydney U.
- VPAC Job Share - a proposal to link the VPAC computer with the ARC
systems to allow jobs to be migrated between the systems to take
advantage of different software availability or idle nodes.
- GridBus - a UofM Computer Science project to develop advanced Grid Job
Schedulers.
- Digital Languages Archive - A joint project with Sydney and ANU to
create a digital archive of material related to Pacific and S-E Asian
languages.
- NANO - A project to establish a Telepresence studio to give
researchers
access to micro and nano scale instrumentation at other sites.