Notes
Slide Show
Outline
1
"CODATA 18th International Conference:"
  • CODATA 18th International Conference: Frontiers of Scientific
  • and Technical Data (29 September - 3 October, 2002)



  • Prototype of TRC Integrated Information System for Physicochemical Properties of Organic Compounds:
  • Evaluated Data, Models and Knowledge



  • Xinjian Yan,
  • Qian Dong, Xiangrong Hong, Robert D. Chirico, Michael Frenkel


  • Thermodynamics Research Center (TRC)
  • National Institute of Standards and Technology
2
Introduction
  • Requirement: Industrial and scientific developments require high quality data and models


  • Key Point: High quality data system needs strong support from comprehensive knowledge base


  • Aim: Develop a system with high quality data and models fully supported by domain knowledge
3
 
4
 
5
The Support of Knowledge to Data and Model Analysis
6
Data Background - TRC Databases
  • Databases:
  • Source Database, Table Database, Density Database, Vapor Pressure Database, Ideal Gas Database, etc.
  • A Comprehensive Physicochemical Data System:
  • Source Database contains more than 100 physical and chemical properties, over 2 million experimental records for 32,000 chemical systems (pure compounds, mixtures, and reaction systems)
7
Information for Recommended Data (RD)
  • Detailed information is crucial for a good understanding
  • of data. The following information has been prepared for
  • recommended data (also for experimental data).


    • The uncertainty values of RD
    • The number of data points used for obtaining RD
    • The discreteness of the data used to process RD
    • The description about the selection of RD
    • The grade of RD
8
Data Processing for RD
  • For compounds having multiple values, a weighted average method is used to obtain recommended data
  • For compounds having only one or two values, the data are inspected by:
  •     A. Theories and thermodynamics relationships
  •     B. Comparison with the values from models
  •     C. Comparison with other well characterized sources
  •     D. Similar compounds
  • For doubtful data, original articles are reviewed



9
Criteria and Methods of Evaluating Models
  • Prediction ability
  • Complexity of compounds used in developing and testing models
  • Diversity of compounds used in developing and testing models
  • Reliability of each parameter (how many and how well data were used in obtaining each parameter)
  • Similarity analysis
10
 
11
 
12
Complexity of Organic Compounds - Definition

  • Group/  complexity =1 >1
  • CH 1   1 CH3-CH(CH3)-CH3 = 2
  • C 2   2
  • C=C (double bond) 2   2
  • =C= 2   2
  • C*C (triple bond) 2   2
  • F, Cl, Br, I 3   5   2 (when groups >4)
  • CN 3   4
  • N 3   4
  • NC 3   4
  • S 3   4
  • SH 3   4
  • CHO 4 10
  • CO 4 10
  • COO 4 10
  • COOH 4 10
  • N= 4 10
  • NH 4 10
  • NH= 4 10
  • NH2 4 10
  • NO2 4 10
  • O 4 10
  • OH 4 10 OH-CH2-CH2-OH = 18
  • SO 4 10
  • SO2 4 10


  • Ring /  complexity 3   5 Including fused ring


  • Terminals /  complexity    6 (C=1 )   3 (C=2) 1 (C=3)


  • C atoms / complexity
  • 1- 10   1 11- 20   2 21- 30   3
  • 31- 40   4 41- 50   5 > 50   6


13
Example of Complexity for Compounds Having Critical Temperature (Tc) Data
  • CN     AC
  • Tc before 1996* 500      14
  • Tc after    1995** 100      21


  • CN - Compound Number; AC - Average Complexity


  • * 500 compounds having critical temperature reported before 1996.
  • ** 100 new compounds reported between 1996 and 2001.
14
Example of Using the Information from Similar Compounds to Judge Uncertainty of the Value Estimated by Models
15
Knowledge is the key to evaluate and understand scientific data as well as models
  • Scientific experiment is a complicated process
  • Experimental data tend to have uncertainty or error
  • Evaluation of scientific data is extremely difficult, no way to guarantee their absolute correctness
  • The true value of physicochemical property needs repeated experimental examination
  • The above problems are also true for models
16
Domain Knowledge
  • Thermophysics theory and concept
  • Experimental and theoretical research methods
  • Evaluation and comment on experimental data
  • Compound physical and chemical characteristics
  • Models (introduction, evaluation and comment)
  • Molecular structure and interaction information
  • Terminology
  • Unit
  • ……
17
Example about Knowledge and the Selection of Ethanol’s Recommended Critical Temperature
18
Example about Knowledge and the Selection of Ethanol’s Recommended Critical Temperature
19
Example of Knowledge Supporting System
20
Summary
  • Uncertainty is everywhere
  • Our knowledge on uncertainty is very limited
  • Our awareness on uncertainty is low
  • Knowledge is crucial to decrease the uncertainty
  • For building a high quality information system, it is necessary to develop a strong ability for analyzing the uncertainty of data, models and text information