Notes
Slide Show
Outline
1
 
2
Overview
  • Market & Technology Trends
  • Spatial Database Technology
  • GeoSpatial DBMS in GeoSciences
  • Life Sciences Data Management Challenges
  • BioSpatial DBMS in Life Sciences
3
Spatial data becoming ubiquitous
  • Location Aware and Enabled Infrastructure
    • Defense, Logistics, Mobile devices
  • Internet Portals: MapQuest, Yahoo, MapPoint.NET
  • Automobiles: by 2006, 80% of new cars will have some telematics navigation access (eyeforauto 2001)
  • Structure Databases: Proteomics, Materials Science
4
Spatial Analysis
Revealing patterns, relationships & trends
5
Overcoming Application “Stovepipes”
  • Specialty GIS servers
    • Data isolation
    • High systems admin
      and management costs
    • Scalability problems
    • High training costs
    • Complex support problems
  • Information not aligned with Business Processes
  • Applications can’t leverage brute force of large servers
6
Life Sciences: Drug Discovery
  • The Process
7
Many Different Kinds Data
8
IT Challenges
9
Oracle Platform
10
Integrated NYC Spatial Architecture
11
Managing All the Data in an e-Enterprise
12
 
13
Spatial Database Technology: Manage Location & Structure Data
14
"SELECT STREET_NAME FROM ROADS,"
  • SELECT STREET_NAME FROM ROADS, COUNTIES
  • WHERE SDO_RELATE(road_geom, county_geom, ‘MASK=ANYINTERACT QUERYTYPE=WINDOW’) =‘TRUE’
  • AND COUNTYNAME=‘PASSAIC’;
15
Vector Map Data in Oracle Tables
16
Sub-surface Geological Analysis
17
Raster/Vector Mapping
18
How Spatial Data Is Stored
19
Performing Location Query in Oracle9i
20
Jphone J-Navi Launch May 2000
21
Extensible Database Framework
22
Dealing with large data volumes
  • How large is large ?
    • 100’s of thousands is normal
    • Millions is interesting
    • 10’s of millions is serious
    • 100’s of millions is large
  • What is the problem with large volumes ?
    • They mean big structures
      • Cumbersome to manage
    • Long operations
      • Data reload, refresh
      • Index rebuilds
23
Partitioning: Divide and Conquer
  • For manageability
  • Break large problems into manageable pieces
  • Can load / rebuild individual partitions
  • Can load / rebuild multiple partitions concurrently
  • For performance
  • Query parallelism
  • Partition elimination


24
Oracle9i Spatial Features
  • Spatial Reference System
  • Spatial Operators
  • Versioning/Long Transactions
  • Linear Referencing
  • Quadtree/R-tree index
  • Parallel Index create
  • Geodetic Support
  • Spatial Aggregates
  • Topology *
  • Raster/Grid Management *
  • Spatial Data Mining *


  • *  Planned Release 10i
25
Life Sciences
Data Management Trends
26
Data Storage
Today
  • “To meet the scientific   goals we believe we   need to add around 80 - 100TB of storage each year for the next 5 years”

    P. Butcher,
    The Sanger Centre
27
Increasing Computational Load
28
What does DBMS technology bring?
  • Access and storage of vast quantities of life science data from a variety of sources
  • High throughput loading, indexing, processing and update of information
  • Data integration from a variety of sources
  • Scalability and reliability problems
  • Find patterns & insights through queries, analyses and data mining
  • Collaboration & security challenges



29
1. Vast quantities of data, types & sources
30
2. High Throughput Processing
31
3. Scalability & Reliability
32
4. Hidden Patterns & Relationships
33
5. Collaboration & Security
34
Some Additional Proteomics Challenges:
  • High-throughput crystallography generating large volumes of complex protein structure data
  • Small molecule (structure) databases growing to tens of millions of compounds
  • 3D and pharmacophore analysis require efficient storage, indices and operators of structure data
  • Integrated visualization & computation tools with DBMS
35
How do spatial databases help?
  •  Object-relational model and extensibility enable 2D data types and indices
  •  Powerful and growing operator set for sophisticated location/structure queries
  •  Validation by Geographic Information Systems (GIS) and CAD Community
  •  Common query language – SQL- that all data banks and tool vendors leverage
  •  Security, reliability, scalability and flexibility
  •  Faster, bigger, better, cheaper


36
 
37
Structural Bioinformatics
and Rational Drug Design
38
Virtual High-throughput Screening Ligand-Protein Docking Simulation
39
Planned Oracle BioSpatial
Types and Functions
40
Managing Protein Structures in DBMS
  • Extend Oracle DBMS with custom 3D structure features
  • Provide BioSpatial types and an object-relational schema for large & small molecule data in Oracle
    • Compliant with mmCIF; SQL interface
  • Provide a low-level interfaces consistent with OMG standard (RCSB)
  • Integration with leading visualization and analytical tools (commercial, shareware)
41
Rich BioSpatial Operators
  • Support the SQL query and computation requirements from needed by biotechs and pharmas and independent software vendors
  • Implement indices and operators in the server to meet requirements
  • Begin with simple operators and those that serve as foundations for extension
  • Integration with 3rd party visualization tools
42
Foundation Operators
  • Sample BioSpatial Operators:
  • Nearest atom(s) to a specified position or residue in a structure
    • Embedded atomic position index
  • Retrieve polypeptide  skeleton list
  • On-the-fly bond and bond-order computation
43
Advanced Operators
  • Protein active site identification
  • Protein surface representation
    • van der Waals; solvation.
  • Surface classification, abstraction
    • Charges; hydrophobicity; H-bond donors/acceptors
    • Extraction of pharmacophore keys
44
Integrate with Existing Tools
  • Current visualization tools based on PDB format parsers
    • Integrate with popular public domain tools and make available
  • Deposition tools
    • Support transition with PDB-to-CIF conversion tool
  • Protein 3rd party docking and homology applications
45
Oracle Life Sciences 
Product Directions
  • Better support for life sciences data types
  • Improved support for life science specific analytics
  • Improved support for data import and incremental update
  • Enhanced XML (XDB) & Java support in the Database and Application Server (IAS)
  • Enhanced support for distributed data
  • Partner with ISVs and researchers to deliver “solution”
  • Customer Advisory Board participation
46