Mirroring Technology
through the World Data Centers
|
|
|
David Clark |
|
WDC Panel |
|
18th International CODATA
Conference |
|
October 1, 2002 |
Why Establish Mirror
Sites?
|
|
|
Improves access between geographically
separated sites |
|
Encourages data exchange |
|
Encourages new data set compilations |
|
Adds a regional aspects |
|
Builds capacity at mirror sites |
|
|
Three Types of “Mirrors”
|
|
|
1- Exact copy, i.e. true mirror |
|
2- Duplicates content, mirror site
designed locally to reflect regional/cultural/organizational aspects |
|
3- Includes some aspects of main site
which acts in a “mirror” mode; local and regional data added which can also
be mirrored as appropriate |
Slide 4
Slide 5
What is mirroring?
|
|
|
|
What gets mirrored in the
Paleoclimatology site from Boulder? |
|
4000 Web pages (HTML) |
|
4000 Images (graphics, figures, slide
sets) |
|
100 CGI programs (WebMapper, search
forms, model output comparisons) |
|
12 Java animations (temperature,
climate, drought reconstructions) |
|
110,000 FTP files |
|
What does not get mirrored |
|
Oracle database searches (metadata
queries; but results are localized) |
|
IDL "on-the-fly" graphics
(model output comparisons) |
|
ArcIMS (GIS) data access |
Requirements (ideally)...
|
|
|
|
Unix server with (good!) Internet
access |
|
10 Gb disk space (but can be less:
“server minimal”) |
|
Software |
|
Apache web server |
|
Perl (programming language) |
|
Java2 (programming language) |
|
SSH (secure shell) |
|
rsync (a faster, flexible remote copy
program) |
|
Updates through JavaMail-based mirror
system |
There will be days...
|
|
|
|
Server availability |
|
Internet connectivity: slow to very
slow to non-existent |
|
Electrical power problems: frequent
on-battery, occasional shutdown |
|
System administrators |
|
Security concerns: sudden loss of
access to the server |
|
Unannounced changes, e.g. Domain Name
Service reorganizations |
|
Sometimes at the main site! |
|
Changes that don't get mirrored
correctly |
|
Failure to verify that things work on
the mirrors |
How it works...
|
|
|
|
Analyze our web- and ftp- sites |
|
Discover and correct problems, e.g. bad
links or absolute addresses |
|
Stage the mirror locally |
|
Localize headers for each mirror site |
|
Change FTP hostnames (these are
absolute references) |
|
Change script paths |
|
Exclude specific pages, text, or images |
|
Copy the staged material to the mirror
site |
|
Check that mirroring occurred correctly |
Examples of Type One Site
|
|
|
Exact mirror copies |
|
mostly to aid access in geographically
separate locations |
|
WDC pages |
|
Paleoclimatology |
|
STP Sites |
|
|
Slide 11
Slide 12
Examples of Type Two Site
|
|
|
Content mostly identical |
|
Layout similar or identical |
|
Reflects regional data sets in addition
to other data from main site |
|
Implemented to encourage regional data
exchange |
|
“Selective Mirroring” |
|
SPIDR site |
|
Paleoclimatology site |
|
|
Slide 14
Slide 15
Examples of Type Three
Site
|
|
|
Content not identical |
|
Layout reflects regional aspects and
programs |
|
Implemented to encourage regional data
exchange |
|
Builds capacity at mirror site |
|
Paleoclimatology mirror site |
Slide 17
Slide 18
Slide 19