|
1
|
|
|
2
|
- Market & Technology Trends
- Spatial Database Technology
- GeoSpatial DBMS in GeoSciences
- Life Sciences Data Management Challenges
- BioSpatial DBMS in Life Sciences
|
|
3
|
- Location Aware and Enabled Infrastructure
- Defense, Logistics, Mobile devices
- Internet Portals: MapQuest, Yahoo, MapPoint.NET
- Automobiles: by 2006, 80% of new cars will have some telematics
navigation access (eyeforauto 2001)
- Structure Databases: Proteomics, Materials Science
|
|
4
|
|
|
5
|
- Specialty GIS servers
- Data isolation
- High systems admin
and management costs
- Scalability problems
- High training costs
- Complex support problems
- Information not aligned with Business Processes
- Applications can’t leverage brute force of large servers
|
|
6
|
|
|
7
|
|
|
8
|
|
|
9
|
|
|
10
|
|
|
11
|
|
|
12
|
|
|
13
|
|
|
14
|
- SELECT STREET_NAME FROM ROADS, COUNTIES
- WHERE SDO_RELATE(road_geom, county_geom, ‘MASK=ANYINTERACT
QUERYTYPE=WINDOW’) =‘TRUE’
- AND COUNTYNAME=‘PASSAIC’;
|
|
15
|
|
|
16
|
|
|
17
|
|
|
18
|
|
|
19
|
|
|
20
|
|
|
21
|
|
|
22
|
- How large is large ?
- 100’s of thousands is normal
- Millions is interesting
- 10’s of millions is serious
- 100’s of millions is large
- What is the problem with large volumes ?
- They mean big structures
- Long operations
- Data reload, refresh
- Index rebuilds
|
|
23
|
- For manageability
- Break large problems into manageable pieces
- Can load / rebuild individual partitions
- Can load / rebuild multiple partitions concurrently
- For performance
- Query parallelism
- Partition elimination
|
|
24
|
- Spatial Reference System
- Spatial Operators
- Versioning/Long Transactions
- Linear Referencing
- Quadtree/R-tree index
- Parallel Index create
- Geodetic Support
- Spatial Aggregates
- Topology *
- Raster/Grid Management *
- Spatial Data Mining *
- * Planned Release 10i
|
|
25
|
|
|
26
|
- “To meet the scientific goals we
believe we need to add around 80
- 100TB of storage each year for the next 5 years”
P. Butcher,
The Sanger Centre
|
|
27
|
|
|
28
|
- Access and storage of vast quantities of life science data from a
variety of sources
- High throughput loading, indexing, processing and update of information
- Data integration from a variety of sources
- Scalability and reliability problems
- Find patterns & insights through queries, analyses and data mining
- Collaboration & security challenges
|
|
29
|
|
|
30
|
|
|
31
|
|
|
32
|
|
|
33
|
|
|
34
|
- High-throughput crystallography generating large volumes of complex
protein structure data
- Small molecule (structure) databases growing to tens of millions of
compounds
- 3D and pharmacophore analysis require efficient storage, indices and
operators of structure data
- Integrated visualization & computation tools with DBMS
|
|
35
|
- Object-relational model and
extensibility enable 2D data types and indices
- Powerful and growing operator set
for sophisticated location/structure queries
- Validation by Geographic
Information Systems (GIS) and CAD Community
- Common query language – SQL- that
all data banks and tool vendors leverage
- Security, reliability,
scalability and flexibility
- Faster, bigger, better, cheaper
|
|
36
|
|
|
37
|
|
|
38
|
|
|
39
|
|
|
40
|
- Extend Oracle DBMS with custom 3D structure features
- Provide BioSpatial types and an object-relational schema for large &
small molecule data in Oracle
- Compliant with mmCIF; SQL interface
- Provide a low-level interfaces consistent with OMG standard (RCSB)
- Integration with leading visualization and analytical tools (commercial,
shareware)
|
|
41
|
- Support the SQL query and computation requirements from needed by
biotechs and pharmas and independent software vendors
- Implement indices and operators in the server to meet requirements
- Begin with simple operators and those that serve as foundations for
extension
- Integration with 3rd party visualization tools
|
|
42
|
- Sample BioSpatial Operators:
- Nearest atom(s) to a specified position or residue in a structure
- Embedded atomic position index
- Retrieve polypeptide skeleton
list
- On-the-fly bond and bond-order computation
|
|
43
|
- Protein active site identification
- Protein surface representation
- van der Waals; solvation.
- Surface classification, abstraction
- Charges; hydrophobicity; H-bond donors/acceptors
- Extraction of pharmacophore keys
|
|
44
|
- Current visualization tools based on PDB format parsers
- Integrate with popular public domain tools and make available
- Deposition tools
- Support transition with PDB-to-CIF conversion tool
- Protein 3rd party docking and homology applications
|
|
45
|
- Better support for life sciences data types
- Improved support for life science specific analytics
- Improved support for data import and incremental update
- Enhanced XML (XDB) & Java support in the Database and Application
Server (IAS)
- Enhanced support for distributed data
- Partner with ISVs and researchers to deliver “solution”
- Customer Advisory Board participation
|
|
46
|
|