《電子技術(shù)應(yīng)用》
您所在的位置:首頁(yè) > 人工智能 > 設(shè)計(jì)應(yīng)用 > 面向多源異構(gòu)數(shù)據(jù)的知識(shí)圖譜可視化融合方法
面向多源異構(gòu)數(shù)據(jù)的知識(shí)圖譜可視化融合方法
電子技術(shù)應(yīng)用
梁浩1,付達(dá)2
1.深圳鵬銳信息技術(shù)股份有限公司;2.北京京能能源技術(shù)研究有限責(zé)任公司
摘要: 為解決數(shù)據(jù)冗余沖突與關(guān)聯(lián)缺失問(wèn)題,研究面向多源異構(gòu)數(shù)據(jù)的知識(shí)圖譜可視化融合方法,提升數(shù)據(jù)融合的可靠性。利用網(wǎng)絡(luò)本體語(yǔ)言為多源異構(gòu)數(shù)據(jù)建立對(duì)應(yīng)的領(lǐng)域本體庫(kù)與全局本體庫(kù),使得知識(shí)實(shí)體抽取和知識(shí)融合在同一框架下進(jìn)行;通過(guò)長(zhǎng)短期記憶網(wǎng)絡(luò)-條件隨機(jī)場(chǎng)模型,在本體庫(kù)約束下,從多源異構(gòu)數(shù)據(jù)中抽取符合領(lǐng)域定義的知識(shí)實(shí)體;利用基于層次過(guò)濾思想的知識(shí)融合模型,可視化融合抽取的知識(shí)實(shí)體,解決多源異構(gòu)數(shù)據(jù)中冗余信息和不一致性問(wèn)題,形成準(zhǔn)確、完整、可靠的多源異構(gòu)數(shù)據(jù)可視化融合知識(shí)圖譜,有助于發(fā)現(xiàn)潛在的數(shù)據(jù)關(guān)聯(lián),補(bǔ)全數(shù)據(jù)關(guān)聯(lián)缺失。實(shí)驗(yàn)結(jié)果表明:隨著數(shù)據(jù)缺失比例的提升,尺度系數(shù)與屬性覆蓋度均開始下降,最低尺度系數(shù)與屬性覆蓋度是0.86與0.87,均顯著高于對(duì)應(yīng)的閾值;所提方法在處理四個(gè)數(shù)據(jù)源時(shí),視覺(jué)清晰度達(dá)93%~97%,信息融合度達(dá)92%~96%,均優(yōu)于對(duì)比方法。說(shuō)明該方法可有效抽取多源異構(gòu)數(shù)據(jù)知識(shí)實(shí)體,建立知識(shí)圖譜,實(shí)現(xiàn)多源異構(gòu)數(shù)據(jù)可視化融合;在不同數(shù)據(jù)缺失比例下,該方法多源異構(gòu)數(shù)據(jù)可視化融合的尺度系數(shù)與屬性覆蓋度均較大,即數(shù)據(jù)可視化融合效果較優(yōu);同時(shí)有效提升了數(shù)據(jù)可視化效果和信息整合程度。
中圖分類號(hào):TP391 文獻(xiàn)標(biāo)志碼:A DOI: 10.16157/j.issn.0258-7998.245966
中文引用格式: 梁浩,付達(dá). 面向多源異構(gòu)數(shù)據(jù)的知識(shí)圖譜可視化融合方法[J]. 電子技術(shù)應(yīng)用,2025,51(6):47-53.
英文引用格式: Liang Hao,F(xiàn)u Da. Knowledge graph visualization fusion method for heterogeneous data from multiple sources[J]. Application of Electronic Technique,2025,51(6):47-53.
Knowledge graph visualization fusion method for heterogeneous data from multiple sources
Liang Hao1,F(xiàn)u Da2
1.Plant Resource Technology Co., Ltd.; 2.Beijing Jingneng Energy Technology Reach Co., Ltd.
Abstract: In order to solve the problem of data redundancy conflict and lack of association, a knowledge graph visualization fusion method for multi-source heterogeneous data is studied to improve the reliability of data fusion. The domain ontology database and global ontology database corresponding to multi-source heterogeneous data are established by using Web Ontdogy Languge(OWL), so that knowledge entity extraction and knowledge fusion are carried out under the same framework. Based on the Long Short-Term Memory network(LSTM) and Conditional Random Field(CRF) model, knowledge entities conforming to domain definition are extracted from heterogeneous data from multiple sources under the constraint of ontology library. The knowledge fusion model based on hierarchical filtering is used to visualize the extracted knowledge entities, solve the redundant information and inconsistency problems in multi-source heterogeneous data, and form an accurate, complete and reliable multi-source heterogeneous data visualization fusion knowledge graph, which helps to find potential data associations and complete the missing data associations. The experimental results show that with the increase of the proportion of missing data, the scaling coefficient and attribute coverage begin to decrease, and the lowest scaling coefficient and attribute coverage are 0.86 and 0.87, which are significantly higher than the corresponding thresholds. When dealing with four data sources, the visual clarity of the proposed method is 93%~97%, and the information fusion is 92%~96%, which are better than the comparison methods. It shows that the method can effectively extract the knowledge entities of multi-source heterogeneous data, establish the knowledge graph, and realize the visualization fusion of multi-source
Key words : multi-source heterogeneous data;knowledge graph;visual ization fusion;ontology library;long short-term memory network;conditional random field

引言

在實(shí)際應(yīng)用中,數(shù)據(jù)往往來(lái)源于多個(gè)不同的源頭,具有異構(gòu)性、多樣性和復(fù)雜性等特點(diǎn),這給數(shù)據(jù)的處理、分析和應(yīng)用帶來(lái)了巨大挑戰(zhàn)[1]。多源異構(gòu)數(shù)據(jù)融合方法應(yīng)運(yùn)而生,旨在通過(guò)先進(jìn)的技術(shù)手段,將來(lái)自不同數(shù)據(jù)源、不同格式、不同結(jié)構(gòu)的數(shù)據(jù)進(jìn)行有效整合與展示,為用戶提供直觀、全面、深入的數(shù)據(jù)洞察[2]。

多源異構(gòu)數(shù)據(jù)融合方法不僅有助于解決數(shù)據(jù)孤島問(wèn)題,實(shí)現(xiàn)數(shù)據(jù)的互聯(lián)互通[3],還能夠顯著提升數(shù)據(jù)處理的效率和準(zhǔn)確性,為決策支持、科學(xué)研究、產(chǎn)業(yè)創(chuàng)新等領(lǐng)域提供強(qiáng)有力的數(shù)據(jù)支撐。例如,莫慧凌等人利用聯(lián)邦學(xué)習(xí)框架實(shí)現(xiàn)數(shù)據(jù)融合,各參與方均利用張量Tucker分解理論,提取數(shù)據(jù)特征;通過(guò)中央服務(wù)器收集并聚合來(lái)自各參與方的模型參數(shù),形成全局模型;以多次迭代方式優(yōu)化全局模型,完成數(shù)據(jù)融合[4]。在異構(gòu)數(shù)據(jù)中,存在冗余或沖突的信息。Tucker分解和聯(lián)邦學(xué)習(xí)框架在處理這些信息時(shí)無(wú)法完全避免冗余和沖突的影響,進(jìn)而影響數(shù)據(jù)融合效果。王姝等人利用信息熵評(píng)估各證據(jù)源的相對(duì)重要性,并通過(guò)散度計(jì)算來(lái)獲取證據(jù)可信度優(yōu)化證據(jù),得到差異信息量,確定各數(shù)據(jù)源的最終權(quán)重,進(jìn)行數(shù)據(jù)融合[5]。信息熵方法主要關(guān)注于信息量的評(píng)估,而對(duì)于數(shù)據(jù)之間的冗余性缺乏直接的識(shí)別能力,導(dǎo)致數(shù)據(jù)融合過(guò)程中冗余數(shù)據(jù)仍然被保留,增加數(shù)據(jù)處理的復(fù)雜性和計(jì)算成本??飶V生等人利用圖的聚類算法來(lái)識(shí)別數(shù)據(jù)中的相似性,進(jìn)而將相似的數(shù)據(jù)項(xiàng)進(jìn)行融合[6]。圖的聚類算法主要依賴于數(shù)據(jù)間的相似關(guān)系進(jìn)行聚類。然而,當(dāng)數(shù)據(jù)集中存在關(guān)聯(lián)缺失時(shí),該算法無(wú)法準(zhǔn)確地將這些數(shù)據(jù)項(xiàng)劃分為同一聚類,導(dǎo)致數(shù)據(jù)融合結(jié)果無(wú)法完全反映數(shù)據(jù)間的真實(shí)關(guān)系。Gong等人提出了一種多粒度視覺(jué)引導(dǎo)的多模態(tài)異構(gòu)圖實(shí)體級(jí)融合命名實(shí)體識(shí)別方法,該方法通過(guò)在不同視覺(jué)粒度上整合文本與視覺(jué)的跨模態(tài)語(yǔ)義交互信息,構(gòu)建全面的多模態(tài)表示[7]。利用多模態(tài)異構(gòu)圖精確描述實(shí)體級(jí)單詞與視覺(jué)對(duì)象的語(yǔ)義關(guān)系,并借助異構(gòu)圖注意力網(wǎng)絡(luò)實(shí)現(xiàn)細(xì)粒度跨模態(tài)語(yǔ)義交互,顯著提升識(shí)別準(zhǔn)確率,但實(shí)現(xiàn)過(guò)程復(fù)雜度較高,可能影響應(yīng)用效率。

在多源數(shù)據(jù)融合過(guò)程中,數(shù)據(jù)冗余和沖突是常見問(wèn)題。知識(shí)圖譜通過(guò)去重、糾錯(cuò)等步驟,以及關(guān)系網(wǎng)絡(luò)的構(gòu)建,能夠減少數(shù)據(jù)冗余和沖突,提高數(shù)據(jù)融合的準(zhǔn)確性和可靠性。同時(shí),知識(shí)圖譜通過(guò)構(gòu)建實(shí)體之間的關(guān)系網(wǎng)絡(luò),能夠發(fā)現(xiàn)數(shù)據(jù)之間的潛在關(guān)聯(lián),從而補(bǔ)全數(shù)據(jù)關(guān)聯(lián)缺失的問(wèn)題。為此,研究面向多源異構(gòu)數(shù)據(jù)的知識(shí)圖譜可視化融合方法,充分利用各種數(shù)據(jù)資源,避免數(shù)據(jù)浪費(fèi),提高數(shù)據(jù)利用率。


本文詳細(xì)內(nèi)容請(qǐng)下載:

http://m.ihrv.cn/resource/share/2000006561


作者信息:

梁浩1,付達(dá)2

(1.深圳鵬銳信息技術(shù)股份有限公司,廣東 深圳 518055;

2.北京京能能源技術(shù)研究有限責(zé)任公司,北京 100020)


Magazine.Subscription.jpg

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。