鉴于大家对应用化学十分关注,我们编辑小组在此为大家搜集整理了“高维数据降维与建模在过程中的应用”一文,供大家参考学习
论文编号:HG213 论文字数:20751,页数:55
摘要:高维数据,如生物数据、网络数据及金融市场交易数据等,其处理面临两个问题:一是维数灾难问题,维数膨胀给高维数据中模型识别和规则发现带来极大挑战;二是维数的增长又带来“维数福音”,高维数据中蕴藏的丰富信息中可产生解决问题的新的可能性。如何将高维数据表示在低维空间中,并由此发现其内在结构是高维信息簇处理研究的关键问题之一。降维方法作为克服“维数灾难”的有效手段,本文对降维过程的应用进行了深入的探讨主要包括三部分:
1.简要概述了高维空间处理领域的发展状况以及它所蕴含的一些问题,如“维数灾难”问题、高维空间中的特殊性问题、高维数据本征维数问题等。同时对常用线性降维方法,主要是主成分分析原理和常用非线性降维方法,主要是局部线性嵌入方法(LLE)原理的介绍。
2.给出了高维数据处理中降维方法在对苯二甲酸精馏过程中的一个应用实例,给出了线性降维方法中主成分分析在此精馏过程中降维的处理过程及结论和非线性降维中LLE对该精馏过程进行降维。
3. 比较了三种预测因变量的方法的优良,即BP神经网络,主成分分析与BP神经网络和LLE与BP神经网络,发现就含水量而言,LLE与BP最佳,主成分与BP次之,BP最差;而就醋酸含量而言BP最佳,主成分与BP次之,LLE与BP最差。
关键词:高维数据,降维,主成分分析,LLE,BP神经网络
Abstract: The high-dimensional data, such as biological data, network data and financial market trading data, its processing faces two problems: First, the curse of dimensionality problem, dimension expansion to the high-dimensional data, pattern recognition and rules of discovery bring great challenges; second dimension of growth brings " blessing of dimension ", high-dimensional data in the hidden wealth of information that can be created new possibilities to solve the problem which includes three parts:
1. A brief overview of high-dimensional space processing state of development, as well as it contains some of the issues, such as "dimension disaster" problem, the special high-dimensional space, high dimensional data intrinsic dimension and so on. At the same time commonly used linear dimension reduction methods, is mainly used principal component analysis theory and nonlinear dimensionality reduction methods, mainly locally linear embedding (LLE) principle of introduction.
2. Gives the high-dimensional data processing method in dimensionality reduction distillation process of terephthalic acid in an application example, and gives a linear dimensionality reduction method, principal component analysis in this distillation process in the processing and dimensionality reduction conclusions, as well as Nonlinear dimensionality reduction of LLE to reduce the dimension of the distillation process.
3. Comparison Good or bad of three methods to predict the dependent variable, that is the BP neural network, principal component analysis with BP neural network and LLE with BP neural network. If the Water content is dependent variable, LLE with BP is best, principal component analysis with BP is second and BP is worst, while the acetic acid content is dependent variable, BP is best, principal component analysis with BP is second andLLE with BP is worst.
Key words: high-dimensional data; dimensionality reduction; principal component analysis; LLE, BP neural network
目 录
中文摘要 …………………………………………………………………………I
英文摘要 …………………………………………………………………………II
目录 ………………………………………………………………………………Ⅲ
1、概述 ……………………………………………………………………………1
2、绪论 ……………………………………………………………………………4
2.1 降维问题的提出 ……………………………………………………4
2.1.1 降维分类 ………………………………………………………6
2.1.2 降维的定义 ……………………………………………………….7
2.1.3 维数祸根 ………………………………………………………….9
2.1.4 高维空间的特殊性 ………………………………………………. 9
2.1.5 本征维数 …………………………………………………………. 11
2.2 降维方法的概述………………………………………………………. 12
2.2.1 线性降维 …………………………………………………………. 12
2.2.2非线性降维 ………………………………………………………….15
2.3 BP神经网络基本原理…………………………………………………..19
2.3.1 BP神经元…………………………………………………………….20
2.3.2 BP网络……………………………………………………………….21
2.3.3 最速下降法…………………………………………………………..25
3、化工过程中降维方法的应用 …………………………………………………………..26
3.1 主成分分析应用………………………………………………………….. ….. 26
3.1.1 工艺说明 …………………………………………………………..26
3.1.2 符合说明 …………………………………………………………..27
3.1.3 模型假设 …………………………………………………………..27
3.1.4 变量设置 …………………………………………………………..28
3.1.5 建立模型 …………………………………………………………..28
3.1.6 模型分析 …………………………………………………………..29
3.2 LLE的应用 ………………………………………………………….33
3.2.1 LLE领域点的个数和维数的确定…………………………………....34
3.2.2 神经网络隐含层层数的确定………………………………………..43
3.3 线性降维与非线性降维的比较…………………………………….. ....45
4、总结与展望 ……………………………………………………………………..48
致谢 ………………………………………………………………………………..49
参考文献 …………………………………………………………………………..50
附录…………………………………………………………………………………..53