使用无监督词嵌入和机器学习预测快速扩展的COVID-19文献中的新主题卡塔尔世界杯8强波胆分析:基于证据的研究%A Pal,Ridam %A Chopra,Harshita %A Awasthi,Raghav %A Bandhey,Harsh %A Nagori,Aditya %A Sethi,Tavpritesh %+ Indraprastha信息技术研究所德里计算生物系,Okhla工业园区三期新学术大楼三楼,新德里,110020,印度,91 9779908630tavpriteshsethi@iiitd.ac.in %K COVID-19 %K命名实体识别%K无监督词嵌入%K机器学习%K自然语言预处理%D 2022 %7 2.11.2022 %9原创论文%J J Med互联网研究%G英语%X背景:来自同行评议文献的证据是设计应对COVID-19等全球威胁的基础。在大量和快速增长的语料库中,例如COVID-19出版物,吸收和综合信息具有挑战性。利用一个健壮的计算管道来评估多个方面,比如网络拓扑特征、社区及其时间趋势,可以使这个过程更有效。目的:我们的目的是表明新的知识可以被捕获和跟踪使用的时间变化在底层的无监督词嵌入的文献。使用机器学习对单词之间不断发展的关联进行预测,可以进一步预测即将到来的主题。方法:自2020年2月起,从世界卫生组织数据库中每月收集的15万多篇COVID-19文章摘要中提取出现频率高的医疗实体。在每个月的文献上训练的词嵌入被用来构建以余弦相似度作为边缘权重的实体网络。根据先前的模式预测下一个月网络的拓扑特征,并使用监督机器学习预测新的链接。 Community detection and alluvial diagrams were used to track biomedical themes that evolved over the months. Results: We found that thromboembolic complications were detected as an emerging theme as early as August 2020. A shift toward the symptoms of long COVID complications was observed during March 2021, and neurological complications gained significance in June 2021. A prospective validation of the link prediction models achieved an area under the receiver operating characteristic curve of 0.87. Predictive modeling revealed predisposing conditions, symptoms, cross-infection, and neurological complications as dominant research themes in COVID-19 publications based on the patterns observed in previous months. Conclusions: Machine learning–based prediction of emerging links can contribute toward steering research by capturing themes represented by groups of medical entities, based on patterns of semantic relationships over time. %M 36040993 %R 10.2196/34067 %U //www.mybigtv.com/2022/11/e34067 %U https://doi.org/10.2196/34067 %U http://www.ncbi.nlm.nih.gov/pubmed/36040993
Baidu
map