%0期刊文章%@ 2564-1891 %I JMIR出版物%V 2% 卡塔尔世界杯8强波胆分析N 2% P e38839 %T新闻文章可靠性的数据探索和分类:深度学习研究%A Zhan,Kevin %A Li,Yutong %A Osmani,Rafay %A Wang,Xiaoyu %A Cao,Bo %+阿尔伯塔大学精神病学系,4-142 KATZ集团药学与健康研究中心,87大道和114街,埃德蒙顿,AB, T6G 2E1,加拿大,1 403 926 6628,yutong5@ualberta.ca %K COVID-19 %K深度学习%K新闻文章可靠性%K虚假信息%K infodemic %K集成模型%D 2022 %7 22.9.2022 %9原创论文%J JMIR infodeology %G英文%X背景:在2019冠状病毒病大流行期间,我们每天都会接触到大量信息。世界卫生组织将这种“信息大流行”定义为在大流行期间大规模传播误导性或虚假信息。信息大流行期间错误信息的传播最终导致对公共卫生秩序的误解或对公共政策的直接反对。尽管已经采取了一些措施来打击虚假信息的传播,但目前的人工事实核查方法还不足以打击这种信息泛滥。目的:我们建议使用自然语言处理(NLP)和机器学习(ML)技术来构建一个可用于识别在线不可靠新闻文章的模型。方法:首先,对ReCOVery数据集进行预处理,获得2020年1 - 5月2029篇带有COVID-19关键词的英文新闻,并将其分为可靠或不可靠两类。进行了数据探索,以确定可靠文章和不可靠文章之间的主要差异。我们建立了一个集成深度学习模型,使用正文文本以及特征,如情感、移情派生的词汇类别和可读性,对可靠性进行分类。 Results: We found that reliable news articles have a higher proportion of neutral sentiment, while unreliable articles have a higher proportion of negative sentiment. Additionally, our analysis demonstrated that reliable articles are easier to read than unreliable articles, in addition to having different lexical categories and keywords. Our new model was evaluated to achieve the following performance metrics: 0.906 area under the curve (AUC), 0.835 specificity, and 0.945 sensitivity. These values are above the baseline performance of the original ReCOVery model. Conclusions: This paper identified novel differences between reliable and unreliable news articles; moreover, the model was trained using state-of-the-art deep learning techniques. We aim to be able to use our findings to help researchers and the public audience more easily identify false information and unreliable media in their everyday lives. %M 36193330 %R 10.2196/38839 %U https://infodemiology.www.mybigtv.com/2022/2/e38839 %U https://doi.org/10.2196/38839 %U http://www.ncbi.nlm.nih.gov/pubmed/36193330
Baidu
map