|
1
|
- Peter Linstrom
- National Institute of Standards and Technology
- Gaithersburg, MD, USA
|
|
2
|
- Nature of physical property data
- Historical record or interpretation
- Limitations of automated systems
- Problem areas
- Summary
|
|
3
|
- A numeric tuple which applies to a physical system
- Describing how the numeric value was obtained from the system is
difficult
- Identification of techniques, equipment, ancillary data used in
calculations and calibrations.
|
|
4
|
- Describing the system is difficult
- Identification of sample: chemical species and concentrations
- Recording the numeric tuple is easy
|
|
5
|
- Two goals (non-exclusive) goals:
- 1.) Historical record
- What was measured, computed, or estimated?
- How was this done?
- 2.) Basic knowledge
- What do we know about this property?
- What is the probable range of the numeric value?
|
|
6
|
- Historical record
- Does not change
- Applies to specific physical systems and measurements
- Basic knowledge
- Built on analysis of the historical record
- Applies to an “idealized” physical system
- Improves through scientific processes
|
|
7
|
- 1998 Roux et al, Fraday Trans.
- Stability of dimethyl benzenedicarboxolates
|
|
8
|
- 2 C6H5CO2Me ® C6H4(CO2Me) + C6H6
- Endothermicity of gas phase reaction:
- ortho 52.3 kJ/mol
- meta 29.2 kJ/mol
- para 30.4 kJ/mol
- Quite different from dinitro and dicyano benzenes
|
|
9
|
- 2002 Roux et al, Phys. Chem. Chem. Phys.
- DfHº
methyl benzoate gas (kJ/mol):
- 1971 Hall et al -299.8
- 1980 Guthrie et al -269.3 ± 5.1
- 1994 Pedley -287.9 ± 2.4
- 1998 Maksimuk et al -277.74 ± 1.2
- 2002 Roux et al -276.1 ± 4.0
|
|
10
|
- 2 C6H5CO2Me ® C6H4(CO2Me) + C6H6
- Revised endothermicity of gas phase reaction:
- ortho 28.7 kJ/mol
- meta 5.6 kJ/mol
- para 6.8 kJ/mol
- Similar to dinitro and dicyano benzenes
|
|
11
|
- Automated systems require data with well defined semantics
- Portions of physical property data are recorded in natural language
(literature)
- Need procedures to map information to a form appropriate for automated
systems
|
|
12
|
- Mapping of information to computer friendly semantics may involve
- Loss of information
- A judgement on the part of the archivist (introduction of information
not explicitly contained in the original source)
- Blurring of the line between historical record and interpretation
|
|
13
|
- Some options for expressing information
- Develop taxonomy of codes
- Token value pairs
- Incorporate into database design
- Text comments (loss of data processing capability)
- Ignore the information
|
|
14
|
|
|
15
|
- Proper design of systems for expressing data requires significant domain
knowledge
- Definition of appropriate taxonomies, codes, etc.
- Knowledge of what will be important to future investigators
- Knowledge of what can be safely ignored
|
|
16
|
- Chemical identification
- Taxonomies for methods
- Describing domain-specific meta-data
|
|
17
|
- Identification of pure species can be difficult
- Identification of mixtures is a superset of the problem for single
species
- Chemical nomenclature is too complex for most data systems to handle
|
|
18
|
- Registry of species
- Simplifies identification to an integer number
- Maintained by third parties
- Species may not be in registry
- Identification may not be precise (isomers)
- Deprecated entries
- Users consult secondary sources – errors propagate
|
|
19
|
- Chemical structure
- No third party
- Less ambiguity, but more complex semantics
- Expensive to draw or look up
- Costs decreasing with modern technology
|
|
20
|
- Purity / uncertainty of composition
- May not be known
- Purification / synthesis technique may be provided
- Often omitted from database
|
|
21
|
- Classification of the manner in which a value was obtained
- Instrument type, model form natural divisions
- Appropriate resolution determined by archivist
- How does one handle unique methods?
- Science is not static – taxonomies will grow
|
|
22
|
- Lias, et al, Ionization Potential Database
- Compiled over many years
- Taxonomy for basic measurement types
- Additional codes added to supplement supplement taxonomy for new
methods which cross existing hierarchical boundaries (e.g. electron
impact and laser spectroscopy)
|
|
23
|
- Meta-data recognized by archivist (domain specialist) as significant
- Need method to encode in computer friendly format
- Taxonomies
- Token value pairs
|
|
24
|
- Affefy, Liebman, and Stein – Neutral Thermochemistry Archive
- Meta-data options expanded as archive grew
- Correction to current CODATA heats of formation: done, not-done, or
not-possible
- Data disagrees with previously published data: acknowledged by
author(s), or not acknowledged
|
|
25
|
- Two pairs of trade-offs
- Historical record vs. interpretation
- Semantic complexity vs. loss of information
- Important for archivists and researchers to be aware of the compromises
that are made
|