@文章{信息:doi/10.2196/34067,作者=“Pal, Ridam和Chopra, Harshita和Awasthi, Raghav和Bandhey, Harsh和Nagori, Aditya和Sethi, Tavpritesh”,标题=“用无监督词嵌入和机器学习预测快速扩展的COVID-19文献中的新兴主题:基于证据的研究”,期刊=“J Med Internet Res”,年=“2022”,月=“11”,日=“2”,卷=“24”,数=“11”,页=“e34067”,关键词=“COVID-19;命名实体识别;无监督词嵌入;机器学习;背景:来自同行评审文献的证据是设计应对COVID-19等全球威胁的基石。在大量快速增长的语料库中,如COVID-19出版物,吸收和综合信息具有挑战性。利用一个健壮的计算管道来评估多个方面,如网络拓扑特征、社区及其时间趋势,可以使这个过程更有效。目的:我们的目的是证明可以利用文献中潜在的无监督词嵌入的时间变化来捕获和跟踪新知识。进一步的即将到来的主题可以通过机器学习来预测单词之间不断发展的联系。方法:从世界卫生组织数据库中发表的15万多篇COVID-19文章摘要中提取频繁出现的医疗实体,这些文章从2020年2月开始每月收集一次。 Word embeddings trained on each month's literature were used to construct networks of entities with cosine similarities as edge weights. Topological features of the subsequent month's network were forecasted based on prior patterns, and new links were predicted using supervised machine learning. Community detection and alluvial diagrams were used to track biomedical themes that evolved over the months. Results: We found that thromboembolic complications were detected as an emerging theme as early as August 2020. A shift toward the symptoms of long COVID complications was observed during March 2021, and neurological complications gained significance in June 2021. A prospective validation of the link prediction models achieved an area under the receiver operating characteristic curve of 0.87. Predictive modeling revealed predisposing conditions, symptoms, cross-infection, and neurological complications as dominant research themes in COVID-19 publications based on the patterns observed in previous months. Conclusions: Machine learning--based prediction of emerging links can contribute toward steering research by capturing themes represented by groups of medical entities, based on patterns of semantic relationships over time. ", issn="1438-8871", doi="10.2196/34067", url="//www.mybigtv.com/2022/11/e34067", url="https://doi.org/10.2196/34067", url="http://www.ncbi.nlm.nih.gov/pubmed/36040993" }
Baidu
map