Mirroring Technology through the World Data Centers
David Clark
WDC Panel
18th International CODATA Conference
October 1, 2002

Why Establish Mirror Sites?
Improves access between geographically separated sites
Encourages data exchange
Encourages new data set compilations
Adds a regional aspects
Builds capacity at mirror sites

Three Types of “Mirrors”
1- Exact copy, i.e. true mirror
2- Duplicates content, mirror site designed locally to reflect regional/cultural/organizational aspects
3- Includes some aspects of main site which acts in a “mirror” mode; local and regional data added which can also be mirrored as appropriate

Slide 4

Slide 5

What is mirroring?
What gets mirrored in the Paleoclimatology site from Boulder?
4000 Web pages (HTML)
4000 Images (graphics, figures, slide sets)
100 CGI programs (WebMapper, search forms, model output comparisons)
12 Java animations (temperature, climate, drought reconstructions)
110,000 FTP files
What does not get mirrored
Oracle database searches (metadata queries; but results are localized)
IDL "on-the-fly" graphics (model output comparisons)
ArcIMS (GIS) data access

Requirements (ideally)...
Unix server with (good!) Internet access
10 Gb disk space (but can be less: “server minimal”)
Software
Apache web server
Perl (programming language)
Java2 (programming language)
SSH (secure shell)
rsync (a faster, flexible remote copy program)
Updates through JavaMail-based mirror system

There will be days...
Server availability
Internet connectivity: slow to very slow to non-existent
Electrical power problems: frequent on-battery, occasional shutdown
System administrators
Security concerns: sudden loss of access to the server
Unannounced changes, e.g. Domain Name Service reorganizations
Sometimes at the main site!
Changes that don't get mirrored correctly
Failure to verify that things work on the mirrors

How it works...
Analyze our web- and ftp- sites
Discover and correct problems, e.g. bad links or absolute addresses
Stage the mirror locally
Localize headers for each mirror site
Change FTP hostnames (these are absolute references)
Change script paths
Exclude specific pages, text, or images
Copy the staged material to the mirror site
Check that mirroring occurred correctly

Examples of Type One Site
Exact mirror copies
mostly to aid access in geographically separate locations
WDC pages
Paleoclimatology
STP Sites

Slide 11

Slide 12

Examples of Type Two Site
Content mostly identical
Layout similar or identical
Reflects regional data sets in addition to other data from main site
Implemented to encourage regional data exchange
“Selective Mirroring”
SPIDR site
Paleoclimatology site

Slide 14

Slide 15

Examples of Type Three Site
Content not identical
Layout reflects regional aspects and programs
Implemented to encourage regional data exchange
Builds capacity at mirror site
Paleoclimatology mirror site

Slide 17

Slide 18

Slide 19