《電子技術(shù)應(yīng)用》
您所在的位置:首頁 > 通信與網(wǎng)絡(luò) > 設(shè)計(jì)應(yīng)用 > 基于聚類的HTTP/HTTPS協(xié)議資產(chǎn)發(fā)現(xiàn)
基于聚類的HTTP/HTTPS協(xié)議資產(chǎn)發(fā)現(xiàn)
電子技術(shù)應(yīng)用
馬琰1,2,蘇馬婧1,2,姚旺君1,2,權(quán)曉文3,劉紅1,2
1.中國信息安全研究院有限公司;2.華北計(jì)算機(jī)系統(tǒng)工程研究所;3.遠(yuǎn)江盛邦(北京)網(wǎng)絡(luò)安全科技股份有限公司
摘要: 網(wǎng)絡(luò)探測(cè)掃描是發(fā)現(xiàn)網(wǎng)絡(luò)資產(chǎn)的重要方法,在探測(cè)結(jié)果中HTTP/HTTPS協(xié)議占比較高,是重要的互聯(lián)網(wǎng)資產(chǎn)識(shí)別來源。隨著網(wǎng)絡(luò)環(huán)境的日益復(fù)雜,應(yīng)用HTTP/HTTPS協(xié)議的資產(chǎn)種類和數(shù)量也在急劇增加,這使得傳統(tǒng)基于指紋規(guī)則的網(wǎng)絡(luò)資產(chǎn)識(shí)別方法面臨著識(shí)別效率低、適應(yīng)性差等問題,無法滿足HTTP/HTTPS協(xié)議識(shí)別的需要。因此,提出了一種新型HTTP/HTTPS協(xié)議資產(chǎn)發(fā)現(xiàn)方法,通過自動(dòng)化規(guī)則生成器對(duì)HTTP/HTTPS協(xié)議響應(yīng)數(shù)據(jù)進(jìn)行處理,并基于詞頻統(tǒng)計(jì)和相似度信息對(duì)原始數(shù)據(jù)進(jìn)行預(yù)過濾,利用文本編碼模型實(shí)現(xiàn)對(duì)HTTP/HTTPS協(xié)議響應(yīng)體信息的文本編碼和特征融合,結(jié)合無監(jiān)督聚類算法實(shí)現(xiàn)對(duì)HTTP/HTTPS協(xié)議資產(chǎn)的發(fā)現(xiàn)。實(shí)驗(yàn)結(jié)果表明,所提出的方法能夠顯著提高HTTP/HTTPS協(xié)議資產(chǎn)發(fā)現(xiàn)效率,提升資產(chǎn)標(biāo)注速度,并可在無先驗(yàn)知識(shí)下發(fā)現(xiàn)未知資產(chǎn)。
中圖分類號(hào):TP393.08 文獻(xiàn)標(biāo)志碼:A DOI: 10.16157/j.issn.0258-7998.256341
中文引用格式: 馬琰,蘇馬婧,姚旺君,等. 基于聚類的HTTP/HTTPS協(xié)議資產(chǎn)發(fā)現(xiàn)[J]. 電子技術(shù)應(yīng)用,2025,51(11):98-106.
英文引用格式: Ma Yan,Su Majing,Yao Wangjun,et al. HTTP/HTTPS protocol asset discovery based on clustering[J]. Application of Electronic Technique,2025,51(11):98-106.
HTTP/HTTPS protocol asset discovery based on clustering
Ma Yan1,2,Su Majing1,2,Yao Wangjun1,2,Quan Xiaowen3,Liu Hong1,2
1.China Information Security Research Institute Co., Ltd.;2.National Computer System Engineering Research Institute of China;3.WebRAY Tech (Beijing) Co., Ltd.
Abstract: Network probing and scanning is an essential method for discovering network assets, with HTTP/HTTPS protocols representing a significant proportion of the discovery results and serving as a key source for identifying Internet assets. As the network environment becomes increasingly complex, the variety and volume of assets utilizing the HTTP/HTTPS protocol have grown rapidly, which poses challenges for traditional network asset identification methods based on fingerprinting rules. These conventional approaches suffer from low recognition efficiency and poor adaptability, making them inadequate for identifying HTTP/HTTPS protocol assets. Therefore, this paper proposes a novel method for discovering HTTP/HTTPS protocol assets. The approach processes HTTP/HTTPS response data through an automated rule generator, performs pre-filtering of the raw data based on term frequency statistics and similarity information, and applies a text encoding model to encode the HTTP/HTTPS response body and fuse the features. By integrating an unsupervised clustering algorithm, this method enables the discovery of HTTP/HTTPS protocol assets. Experimental results show that the proposed method significantly improves the efficiency of HTTP/HTTPS protocol asset discovery, accelerates asset labeling, and enables the discovery of unknown assets without prior knowledge.
Key words : network asset discovery;HTTP/HTTPS protocols;automated rule generation;unsupervised clustering;Word2Vec;DBSCAN

引言

在數(shù)字化轉(zhuǎn)型的推動(dòng)下,網(wǎng)絡(luò)資產(chǎn)的種類和數(shù)量呈指數(shù)級(jí)增長(zhǎng),網(wǎng)絡(luò)安全面臨日益復(fù)雜的挑戰(zhàn)。網(wǎng)絡(luò)資產(chǎn)不僅包括傳統(tǒng)的網(wǎng)絡(luò)設(shè)備(如網(wǎng)絡(luò)攝像頭、防火墻),還擴(kuò)展至各種內(nèi)容管理系統(tǒng)和網(wǎng)絡(luò)服務(wù)。當(dāng)前,網(wǎng)絡(luò)資產(chǎn)識(shí)別主要依賴基于靜態(tài)指紋規(guī)則匹配的方法,這種方法雖然在已知類型資產(chǎn)的識(shí)別中表現(xiàn)良好,但其局限性同樣明顯:首先,指紋規(guī)則構(gòu)建和維護(hù)依賴于專家經(jīng)驗(yàn)和大量人力資源投入;其次,基于靜態(tài)指紋庫的方法在面對(duì)新型設(shè)備時(shí)響應(yīng)速度緩慢,導(dǎo)致對(duì)未知類型資產(chǎn)的識(shí)別率顯著降低。這些缺陷限制了當(dāng)前基于指紋規(guī)則匹配的資產(chǎn)識(shí)別技術(shù)的有效性和適應(yīng)性。

為解決上述問題,本文創(chuàng)新性地提出了一種針對(duì)HTTP/HTTPS協(xié)議網(wǎng)絡(luò)資產(chǎn)的發(fā)現(xiàn)方法,通過自動(dòng)化規(guī)則生成器對(duì)主動(dòng)探測(cè)所采集到的HTTP/HTTPS協(xié)議數(shù)據(jù)進(jìn)行指紋規(guī)則生成和數(shù)據(jù)過濾,配合無監(jiān)督聚類方法實(shí)現(xiàn)對(duì)網(wǎng)絡(luò)資產(chǎn)數(shù)據(jù)按共同特征進(jìn)行劃分,以實(shí)現(xiàn)協(xié)議的自動(dòng)發(fā)現(xiàn),此方法可以發(fā)現(xiàn)未知資產(chǎn),提高標(biāo)注效率。本文提出的自動(dòng)化規(guī)則生成器基于層次化分組策略,逐步對(duì)數(shù)據(jù)集進(jìn)行細(xì)化,提煉具有高區(qū)分度的特征字段并構(gòu)建可以進(jìn)行粗分類的指紋規(guī)則,以過濾掉無共性資產(chǎn)特征的數(shù)據(jù)。針對(duì)HTTP/HTTPS響應(yīng)頭部字段的多樣性,本文對(duì)大規(guī)模探測(cè)結(jié)果數(shù)據(jù)集進(jìn)行了統(tǒng)計(jì)分析并結(jié)合專家經(jīng)驗(yàn),篩選出了21個(gè)響應(yīng)頭部字段用于生成自動(dòng)化過濾規(guī)則,設(shè)計(jì)了自動(dòng)化規(guī)則生成器;在此基礎(chǔ)上,對(duì)經(jīng)預(yù)過濾后的數(shù)據(jù),設(shè)計(jì)了面向HTTP/HTTPS響應(yīng)體信息的多特征融合資產(chǎn)聚類算法,該算法采用Word2Vec[1]進(jìn)行特征編碼,將處理后的數(shù)據(jù)轉(zhuǎn)化為特征向量,結(jié)合特征融合技術(shù)與DBSCAN[2]聚類技術(shù),在多維特征空間中進(jìn)行高效聚類以實(shí)現(xiàn)對(duì)潛在資產(chǎn)的發(fā)現(xiàn)。最后,本文通過實(shí)驗(yàn)驗(yàn)證了所提方法的有效性。此方法不僅提高了HTTP/HTTPS協(xié)議資產(chǎn)發(fā)現(xiàn)的效率,還能夠有效發(fā)現(xiàn)未知資產(chǎn),進(jìn)而提高指紋標(biāo)注和規(guī)則提取的效率。


本文詳細(xì)內(nèi)容請(qǐng)下載:

http://m.ihrv.cn/resource/share/2000006847


作者信息:

馬琰1,2,蘇馬婧1,2,姚旺君1,2,權(quán)曉文3,劉紅1,2

(1.中國信息安全研究院有限公司,北京 102200;

2.華北計(jì)算機(jī)系統(tǒng)工程研究所,北京 100083;

3.遠(yuǎn)江盛邦(北京)網(wǎng)絡(luò)安全科技股份有限公司,北京 100084)


subscribe.jpg

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。