Notes
Slide Show
Outline
1
 
2
The Virtual Observatory: The Future of Astrophysics Data Handling

  • David Schade
  • Canadian Astronomy Data Centre
  • Herzberg Institute for Astrophysics
  • National Research Council Canada
  • With support from the Canadian Space Agency
3
Astronomy and Astrophysics

  • Does fairly well in information technology
  • Has excellent online literature services
    • ADS Abstracts
    • Journals
    • Preprints
  • Has a good history of data archiving
  • Has reasonable data access policies
  • BUT
    • As a scientist it is frustrating and time-consuming to locate suitable data and data quality is often sub-standard
4
History

  • NASA has been a driving force in data archiving for astronomy
  • Canada-France-Hawaii Telescope (CFHT) was a pioneer in archiving data from  ground-based observatories
  • Digital Revolution in astronomy happened in the 1980’s
5
History

  • Canadian Astronomy Data Centre was created in 1986
  • Astronomers and Computer Scientists
  • supported by the Canadian Space Agency
  • original mandate: to serve Hubble Space Telescope
6
Current Collection at CADC
7
`
  • Many Services
    • Digitized Sky Survey
    • Archive Inter-operability
  • Meta-Data Catalogues
    • 19 databases
    • 80,000,000 rows
    • 34 gigabytes
  • Data Files
    • 12,000,000 files
    • 20 terabytes
8
"Archiving is a word that..."
  • Archiving is a word that does not adequately describe the activities, capabilities, and functions of data centres


    • Store,protect,catalogue, facilitate access, lobby for open data policy
    • Lobby for effective handling of data and metadata
    • Develop processing pipelines to add value
    • Execute processing on request


9
"Do astronomers publish research based..."
    • Do astronomers publish research based on archival data?


10
Scientific Impact of Multi-Mission Archive at Space Telescope Science Institute
    • ~10% of the most-cited papers in the ISI database are based on MAST archival data
    • Over 600 papers/year with HST and other archives
    • HST Data: Retrieval rate is 4 times the ingestion rate
    • Over 30,000 datasets requested per month (over 8,000 are non-HST data); ~400,000 web hits per month
11
International VO initiatives
  • Massive homogeneous survey datasets are being created
    • Sloan Digital Sky Survey
    • 2MASS infared survey
    • Canada-France-Hawaii Legacy Survey
  • Multi-wavelength survey datasets can be constructed
  • Network bandwidth is increasing
  • Astronomers have embraced many online services
  • Funding agencies are receptive


12
 
13
International initiatives: Different strokes for different folks
  • Major initiatives in Canada, the United States, the European community, the United Kingdom. (Australia, India, Russia)
  • Each VO group has their own view of what it means to produce a VO and what the priorities should be.



14
Definition
  •     The Virtual Observatory will be said to exist when astronomers can successfully execute scientific queries that seamlessly cross archive boundaries and wavelength boundaries, can combine the returned datasets in a way that permits their joint processing, and can achieve this without the need to understand engineering-level details of the instrument that produced the returned datasets.



15
Convergence ?
  • Despite the differences in viewpoint at this early stage of the VO game, the approaches will converge as projects become reality.
  • Interoperability
  • Standards
  • Integration



16
"Standard Practice"
  • Standard Practice
  • Proprietary period of 1-2 years during which only the proposer of the observations may access those data
  • Some data is calibrated and much is not
  • Data quality is an issue
  • Metadata completeness is an issue
  • Metadata quality is an issue


17
"Past history"
  • Past history
    • Canada has benefited enormously from open data access (and facility access) policies of the United States
      • Data access: Largely NASA
      • Facility access: NOAO and many others


  • NASA has been very progressive


  • Many facilities have had no channels to access data (NOAO) , some do not save and protect data (e.g., Keck telescopes: U. California and California Institute of Technology)


  • Europe has been very progressive: BUT now the archives of the European Southern Observatory are CLOSED to astronomers outside of Europe.
18
"Present-day data policies are very..."
  • Present-day data policies are very mixed:
    • Tension between observatory operations and archiving needs
  • Canada has been progressive
    • Canada-France-Hawaii Telescope archives since 1980s
      • Data quality has been fair
  • Canada and Chile were the leading forces in creating an archive for the Gemini telescopes (partners U.S., U.K., Canada, Argentina, Chile, Brazil, Australia)
  • Canada and France are considering a long (~ 3 years) proprietary period for the CFHT Legacy Survey


19
 
20
 
21
 
22
 
23
CFHT MegaCam
  • A 40 CCD camera
    • 320 Megapixels
    • 1 square degree on the sky
  • Raw Data Rate
    • 720 megabytes per image!
    • 100 gigabytes per night!
    • 20 Terabytes per year!
24
CFHT Legacy Survey
  • CFHT Legacy Survey


    • SCIENCE
      • Determine the fate of the universe




    • Data Policy
      • Data are released immediately to the Canadian and French communities and to the world after a proprietary period
25
CFHT Legacy Survey
  • CFHT Legacy Survey
    • Partnership between CFHT (Hawaii), CADC (Victoria),TERAPIX (Paris), CDS (Strasbourg)
    • Science: Supernovae, Weak Lensing, Kuiper Belt


    • 5 years / 500 nights
    • 20 Terabytes per year
    • 50 million objects with high-quality imaging
    • Processed image products and catalogues
    • 100 Terabyte project
26
"DVD jukeboxes"
  • DVD jukeboxes
    • 4.7 Gbytes/disk
    • 16 $/Gbyte
    • 11.5 Tbytes/m2
    • 6 jukeboxes/year
    • 3,900 disks/year
  • High overhead
    • Operationally
    • Physical space

27
"Spinning disks"
  • Spinning disks
    • 20 Terabytes in each rack
  • Processing
    • 20  1.5 GHz CPUs in each rack
  • Cost effective
  • Effective use of space
  • Reliability ???


28
Astronomy and Astrophysics

  • Virtual Observatory recognizes the value and effectiveness of good information management in astrophysics
  • Astronomy has a good IT foundation to build upon
  • Funding agencies are receptive
  • Data access policies need to be monitored for problems
  • Virtual Observatory needs to invest in both infrastructure and in data
29
 
30
Outline
  • History and CADC
    • I will neglect NASA and concentrate on what I know
      • CFHT archived their digital data in the 1980’s
      • Plates were taken home but remained the property of the observatory which never recalled them
    • Hubble Space Telescope opened doors in archiving for optical astronomers
    • Archiving is a word that has outlived its usefulness
      • Archive functions: Store,protect,catalogue, facilitate access, lobby for open data policy
      • Non-archive functions: Develop processing pipelines to add value
    • Archive Status: Do astronomers do research with archival data? YES  HST examples
      • Deliver the archive over and over/ Megan’s publication numbers


31
Outline
  • Virtual Observatory initiatives: IVOA
    • Definition of the VO
    • Different strokes for different folks
    • High-level infrastructure
    • Where’s the data?
    • There need to be data-centric initiatives also
    • THE GOALS ARE WELL-ALIGNED