123,123

基于知识库的智能问答系统构建技术研究

网络安全与数据治理

王亮，张强，魏韵萧

中国人民解放军91001部队

摘要： 随着信息技术的迅猛发展，数据中心积累了海量的数据，如何高效地从这些数据中获取有价值的信息，成为制约数据中心发挥军事效能的关键问题。智能问答系统作为一种能够理解用户自然语言问题并提供准确答案的工具，在数据中心领域具有广阔的应用前景。提出了基于数据中心知识库的智能问答系统构建技术。首先，聚焦军事领域不同应用场景下用户多样性数据需求，建立数据检索需求模型。其次，针对不同维度数据检索需求模型，提出多样性知识检索关联算法，定义检索需求与候选数据集之间的关联度、推荐度等参数，依据上述参数计算候选数据集与检索需求的匹配度，从而找到最为相关的候选数据集。接着，将生成的候选数据集与用户的检索需求一同传递至大语言模型中，通过检索增强生成算法，生成高质量的答案。最后，通过实验验证，该技术在回答数据中心相关问题时展现出了较好的效果，有助于提升数据中心多源知识库的综合利用水平和服务质量。

關(guān)鍵詞： 数据中心多源知识库大语言模型智能问答

中圖分類(lèi)號(hào)：TP181文獻(xiàn)標(biāo)志碼：ADOI:10.19358/j.issn.2097-1788.2026.04.009
中文引用格式：王亮，張強(qiáng)，魏韻蕭. 基于知識(shí)庫(kù)的智能問(wèn)答系統(tǒng)構(gòu)建技術(shù)研究［J］.網(wǎng)絡(luò)安全與數(shù)據(jù)治理，2026，45（4）：68-73.
英文引用格式：Wang Liang，Zhang Qiang，Wei Yunxiao. Research on construction technology of intelligent questionanswering system based on knowledge bases of data center［J］.Cyber Security and Data Governance，2026，45（4）：68-73.

Research on construction technology of intelligent question-answering system based on knowledge bases of data center

Wang Liang，Zhang Qiang，Wei Yunxiao

The 91001 Unit of the Chinese People′s Liberation Army

Abstract： With the rapid development of information technology, data centers have accumulated massive amounts of data. How to efficiently obtain valuable information from this data has become a key issue restricting data centers from exerting their military effectiveness. To address this issue, considering that intelligent question-answering systems, as tools capable of understanding users′ natural language questions and providing accurate answers, have broad application prospects in the field of data centers, this paper proposes a construction technology for intelligent questionanswering systems based on data center knowledge bases. First, focusing on users′ diverse data needs in different application scenarios within the military field, a data retrieval demand model is established. Second, for data retrieval demand models of different dimensions, a diverse knowledge retrieval correlation algorithm is proposed. Parameters such as correlation degree and recommendation degree between retrieval demands and candidate datasets are defined. Based on these parameters, the matching degree between candidate datasets and retrieval demands is calculated to find the most relevant candidate datasets. Then, the generated candidate datasets and users′ retrieval demands are transmitted to the large language model together, and high-quality answers are generated through the retrievalaugmented generation algorithm. Finally, verified through experiments, this technology has shown good performance in answering data center-related questions, which helps improve the comprehensive utilization level and service quality of multisource knowledge bases in data centers.

Key words : data center; multi-source knowledge bases; large language model; intelligent question-answering

引言

數(shù)據(jù)中心作為數(shù)據(jù)信息資源存儲(chǔ)與處理的核心載體，承載著各業(yè)務(wù)領(lǐng)域的多源異構(gòu)知識(shí)庫(kù)，積淀了海量的數(shù)據(jù)信息資源。在業(yè)務(wù)應(yīng)用場(chǎng)景中，針對(duì)知識(shí)庫(kù)的信息獲取，主要還是使用傳統(tǒng)的人工檢索文檔或數(shù)據(jù)庫(kù)的方式，存在著較大的效率瓶頸，難以滿足高效信息獲取的現(xiàn)實(shí)訴求。近年來(lái)，大語(yǔ)言模型(Large Language Model, LLM)技術(shù)在自然語(yǔ)言處理(Natural Language Processing, NLP)領(lǐng)域取得突破性進(jìn)展［1-3］。憑借大模型參數(shù)規(guī)模和強(qiáng)大的語(yǔ)義學(xué)習(xí)能力，大語(yǔ)言模型能夠?qū)Ａ课谋緮?shù)據(jù)進(jìn)行深度語(yǔ)義理解與知識(shí)挖掘，對(duì)各類(lèi)自然語(yǔ)言的任務(wù)處理表現(xiàn)優(yōu)異。如何依托大語(yǔ)言模型相關(guān)技術(shù)，提升數(shù)據(jù)中心多源異構(gòu)知識(shí)庫(kù)的檢索效能，提高不同用戶(hù)群體從海量數(shù)據(jù)中獲取信息的便捷性與實(shí)用性，已成為數(shù)據(jù)中心業(yè)務(wù)保障中亟待解決的關(guān)鍵問(wèn)題。智能問(wèn)答系統(tǒng)的出現(xiàn)為該問(wèn)題提供了有效的解決方案［4-5］，它通過(guò)自然語(yǔ)言交互模式精準(zhǔn)理解用戶(hù)查詢(xún)意圖并快速返回精準(zhǔn)答案，顯著提升了信息獲取效率。

目前大量學(xué)者在數(shù)據(jù)智能檢索與推薦方面進(jìn)行了廣泛研究，并取得了一定成果。陳曉云等［6］提出Apriori與IOGA融合的增量關(guān)聯(lián)規(guī)則挖掘方法，有效提升AI問(wèn)數(shù)機(jī)器人問(wèn)答準(zhǔn)確性。張超等［7］針對(duì)非結(jié)構(gòu)化數(shù)據(jù)檢索效率問(wèn)題，提出基于語(yǔ)義智能識(shí)別的多模態(tài)檢索方法，實(shí)現(xiàn)精準(zhǔn)檢索。袁鳳源等［8］提出FFSREGNN方法，通過(guò)圖神經(jīng)網(wǎng)絡(luò)與注意力機(jī)制融合特征，生成有效語(yǔ)義表示。楊運(yùn)強(qiáng)［9］基于知識(shí)圖譜構(gòu)建智能問(wèn)答系統(tǒng)，解決知識(shí)獲取與語(yǔ)義理解難題。超木日力格等［10］提出MMSAF模型，借助高階語(yǔ)義增強(qiáng)與自適應(yīng)模態(tài)融合優(yōu)化推薦效果。許惠惠［11］基于BERT模型構(gòu)建算法框架，驗(yàn)證其在常識(shí)問(wèn)答中的應(yīng)用價(jià)值。李俊燕等［12］借助LLM，整合多模態(tài)遙感信息構(gòu)建本地知識(shí)庫(kù)，采用混合檢索增強(qiáng)生成算法，結(jié)合輕量化Embedding模型語(yǔ)義映射，形成檢索—推理鏈路，構(gòu)建智能問(wèn)答系統(tǒng)以實(shí)現(xiàn)遙感信息高效管理與智能化應(yīng)用。董永濤等［13］利用大語(yǔ)言模型和檢索增強(qiáng)生成技術(shù)，構(gòu)建裝備故障智能問(wèn)答系統(tǒng)，提高了故障診斷效率。

考慮到軍隊(duì)數(shù)據(jù)中心主要承載軍事領(lǐng)域業(yè)務(wù)數(shù)據(jù)信息，涉密程度高、專(zhuān)業(yè)性強(qiáng)，而且關(guān)聯(lián)復(fù)雜、形式多樣，導(dǎo)致傳統(tǒng)大語(yǔ)言模型檢索技術(shù)缺乏領(lǐng)域適配性，不僅對(duì)特定領(lǐng)域?qū)I(yè)術(shù)語(yǔ)的解析精度不足，而且難以滿足本地化數(shù)據(jù)存儲(chǔ)與敏感涉密信息保護(hù)的核心訴求。為此，針對(duì)數(shù)據(jù)中心海量本地多源異構(gòu)知識(shí)資源，本文提出面向軍事領(lǐng)域的知識(shí)庫(kù)驅(qū)動(dòng)型智能問(wèn)答構(gòu)建方案。首先面向軍事各業(yè)務(wù)場(chǎng)景中用戶(hù)差異化的數(shù)據(jù)查詢(xún)?cè)V求，構(gòu)建規(guī)范化的數(shù)據(jù)檢索需求表征模型。在此基礎(chǔ)上，設(shè)計(jì)多維度知識(shí)關(guān)聯(lián)檢索機(jī)制，通過(guò)量化檢索需求與候選數(shù)據(jù)集間的關(guān)聯(lián)程度、適配權(quán)重等指標(biāo)，實(shí)現(xiàn)高相關(guān)候選數(shù)據(jù)的精準(zhǔn)匹配與篩選。進(jìn)而將篩選后的知識(shí)數(shù)據(jù)與用戶(hù)原始查詢(xún)一同注入大語(yǔ)言模型，結(jié)合檢索增強(qiáng)生成策略，生成精準(zhǔn)可靠的問(wèn)答回復(fù)。在嚴(yán)格落實(shí)敏感涉密數(shù)據(jù)安全管控的基礎(chǔ)上，基于局域網(wǎng)環(huán)境開(kāi)展系統(tǒng)實(shí)驗(yàn)。結(jié)果顯示，所提方法在問(wèn)答效果上具備較好優(yōu)勢(shì)，有助于提高多源異構(gòu)知識(shí)庫(kù)的整合利用效率與智能化服務(wù)水平。

本文詳細(xì)內(nèi)容請(qǐng)下載：

http://m.ihrv.cn/resource/share/2000007061

作者信息：

基于知識(shí)庫(kù)的智能問(wèn)答系統(tǒng)構(gòu)建技術(shù)研究

王亮，張強(qiáng)，魏韻蕭

原創(chuàng)聲明：此內(nèi)容為AET網(wǎng)站原創(chuàng)，未經(jīng)授權(quán)禁止轉(zhuǎn)載。

相關(guān)內(nèi)容