‹header›
‹date/time›
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
‹footer›
‹#›
In my talk, I will briefly introduce our consideration and practice in developing a high quality scientific database
No high quality data, no reliable models, no reliable simulations of chemical processes.
Our view point is that high quality data system needs strong support from comprehensive knowledge base.
We intend to develop an integrated system with high quality data and models fully supported by domain knowledge.
Here, it means that we try to build up a system melted with high quality data, models and knowledge. Not just put three different things together, because they are dependent each other.
Here it is the frame of our Integrated Information System. It mainly contains evaluated data, models, knowledge and the functions to deduce, and then recommend the best data and models for users, as well as provide valuable information for users to have a good understanding on the data it provides.
Recommended data are mainly extracted from the TRC Source database.
About the models, before we include them into the software, we carefully analyze their abilities, and save their features into a knowledge base, which is used to recommend best models.
The knowledge is being extracted from  literature, experimental data, and models.
From the aspect of computer application, there are two kinds of knowledge – structured and unstructured knowledge. In this content, unstructured knowledge is in text format accompanied with title, keywords, etc., and can be provided directly to users. Structured knowledge is in the computer-readable format, and used for inference.
This slide shows our background of data.
We firmly hold that
This slide shows how to get recommended data
What are the criteria to judge a model, to say this one is good and that is not good? How to understand functions of models? In this aspect, big problems exist. There are very few articles in discussing model evaluations.
This slide displays an example of
In this slide, I define a way to calculate complexities of organic compounds based on molecular structure features, such as molecular size, branching, composition and rings. I assign a number for each structure feature. Complexity of each compound can be obtained by summing up the numbers.
The first line means there are about 500 compounds which have critical temperatures reported before 1996.
The second line means there are around 100 compounds which have critical temperatures reported between 1996 and 2001.
What does this information mean? It just likes that a student in 7_th grade is asked to answer questions for students in 8_th or 9_th grade. So, you can imagine that most students could have troubles in answering the questions. Because the question are beyond their abilities.The example I given before shows this kind of problem in models.
This slide shows the critical temperatures of ethanol from different measurements. From which you can find there are two groups of data, one is around 514 K, the other 516K.  Which one is more close to the true value of pure ethanol? Knowledge and experts suggest that ethanol is very difficult to separate from water. If the sample of ethanol contains water, it is very possible the measured Tc is higher than the value of pure ethanol. Experts suggest the 514 K should be the correct one.
This example shows the importance of knowledge.
But not only that, this example also implies the importance of inference.  Because the problem described above is also true for similar systems and for other properties. We can get more information based on that knowledge and inference.
This slide shows the critical temperatures of ethanol from different measurements. From which you can find there are two groups of data, one is around 514 K, the other 516K.  Which one is more close to the true value of pure ethanol? Knowledge and experts suggest that ethanol is very difficult to separate from water. If the sample of ethanol contains water, it is very possible the measured Tc is higher than the value of pure ethanol. Experts suggest the 514 K should be the correct one.
This example shows the importance of knowledge.
But not only that, this example also implies the importance of inference.  Because the problem described above is also true for similar systems and for other properties. We can get more information based on that knowledge and inference.
This slide shows the knowledge provided for users.