數(shù)據(jù)血緣關系構建方法綜述
網(wǎng)絡安全與數(shù)據(jù)治理
呂琳1,田慶業(yè)2,焦冬冬1,郭金磊1,房志奇1,陳瑞1
1.華北計算機系統(tǒng)工程研究所; 2.中電智能科技有限公司
摘要: 隨著數(shù)據(jù)量的急劇增長,如何管理和利用數(shù)據(jù)面臨嚴峻挑戰(zhàn)。而數(shù)據(jù)血緣作為數(shù)據(jù)治理的核心組成部分,在數(shù)據(jù)治理中有重要作用,如提升數(shù)據(jù)質(zhì)量、保障數(shù)據(jù)安全等。研究了構建數(shù)據(jù)血緣關系的方法,包括系統(tǒng)跟蹤法、基于SQL解析的方法、逆置函數(shù)法、標注法和機器學習法,分析了不同數(shù)據(jù)血緣構建方法的優(yōu)缺點和應用場景,并探討未來研究方向,為數(shù)據(jù)血緣關系的應用及后續(xù)研究提供參考。
中圖分類號:TP301文獻標識碼:ADOI:10.19358/j.issn.2097-1788.2025.12.001引用格式:呂琳,田慶業(yè),焦冬冬,等. 數(shù)據(jù)血緣關系構建方法綜述[J].網(wǎng)絡安全與數(shù)據(jù)治理,2025,44(12):1-5.
A review of data lineage relationship construction methods
Lv Lin1,Tian Qingye2,Jiao Dongdong1,Guo Jinlei1,F(xiàn)ang Zhiqi1,Chen Rui1
1. National Computer System Engineering Research Institute of China; 2. Intelligence Technology of CEC Co., Ltd.
Abstract: With the rapid growth of data volume, how to manage and utilize data is facing severe challenges. As a core component of data governance, data lineage plays an important role in data governance, such as improving data quality and ensuring data security. Therefore, this paper studies the construction methods of data lineage, including system tracking method, SQL parsingbased method, inverse function method, annotation method and machine learning method. It also explores the advantages and disadvantages of different data lineage construction methods and their application scenarios, and discusses future research directions, providing a reference for the application of data lineage relationships and subsequent research.
Key words : data lineage; metadata; data governance; big data
引言
在當今數(shù)據(jù)驅(qū)動的時代,數(shù)據(jù)已成為企業(yè)核心資產(chǎn)之一[1]。隨著數(shù)據(jù)規(guī)模的指數(shù)級增長,如何管理和利用數(shù)據(jù)成為挑戰(zhàn)[2]。數(shù)據(jù)血緣(Data Lineage)作為數(shù)據(jù)治理的核心組成部分,旨在通過追蹤數(shù)據(jù)從采集、加工、存儲到消費的全生命周期路徑,揭示數(shù)據(jù)的演化關系與依賴鏈條,優(yōu)化資源配置,從而提高公司的決策水平。 近年來,學術界形成了多種數(shù)據(jù)血緣構建方法。然而,不同方法在自動化程度、粒度等方面仍存在顯著差異。本文梳理了目前構建數(shù)據(jù)血緣關系的方法,對比分析其優(yōu)缺點和應用場景,并探討未來研究方向,為數(shù)據(jù)血緣關系的應用及后續(xù)研究提供參考。
本文詳細內(nèi)容請下載:
http://m.ihrv.cn/resource/share/2000006893
作者信息:
呂琳1,田慶業(yè)2,焦冬冬1,郭金磊1,房志奇1,陳瑞1
(1.華北計算機系統(tǒng)工程研究所,北京100083;
2.中電智能科技有限公司,北京102200)

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權禁止轉(zhuǎn)載。
