使用一个安全的,持续更新的,Web源处理管道来支持科学文献的实时数据合卡塔尔世界杯8强波胆分析成和分析:开发和验证研究%A Vaghela,Uddhav %A Rabinowicz,Simon %A Bratsos,Paris %A Martin,Guy %A Fritzilas,Epameinondas %A Markar,Sheraz %A Purkayastha,Sanjay %A Stringer,Karl %A Singh,Harshdeep %A Llewellyn,Charlie %A Dutta,Debabrata %A Clarke,Jonathan M %A Howard,Matthew %A, %A Serban,Ovidiu %A Kinross,James %+数据科学研究所,伦敦帝国理工学院,William Penney实验室,英国伦敦南肯辛顿校区,o.serban@imperial.ac.uk %K结构化数据合成%K数据科学%K关键分析%K网络抓取数据%K流水线%K数据库%K文献%K研究%K COVID-19 %K信息流行%K决策%K数据%K数据合成%K错误信息%K基础设施%K方法论%D 2021 %7 6.5.2021 %9原创论文%J J医学互联网研究%G英语%X背景:全球对COVID-19大流行的科学应对的规模和质量毫无疑问挽救了生命。然而,2019冠状病毒病大流行也引发了前所未有的“信息大流行”;数据生产的速度和数量使许多关键利益攸关方(如临床医生和决策者)不堪重负,因为他们无法处理结构化和非结构化数据以进行循证决策。旨在缓解这种与数据综合相关的挑战的解决方案无法实时捕获异类web数据,以生成相应的答案,而且不是基于对自由文本查询的响应中的高质量信息。该项目的主要目标是建立一个通用的、实时的、持续更新的策展平台,可以支持科学文献框架的数据合成和分析。我们的第二个目标是通过添加新的非结构化数据来扩展COVID-19开放研究数据集,从而验证该平台和COVID-19相关医学文献的策展方法。方法:为了创建一个满足我们目标的基础设施,帝国理工学院的PanSurg协作开发了一种基于网络爬虫提取方法的独特数据管道。该数据管道使用了一种新颖的策展方法,该方法采用人在循环的方法,对一系列科学文献来源的质量、相关性和关键证据进行描述。 Results: REDASA (Realtime Data Synthesis and Analysis) is now one of the world’s largest and most up-to-date sources of COVID-19–related evidence; it consists of 104,000 documents. By capturing curators’ critical appraisal methodologies through the discrete labeling and rating of information, REDASA rapidly developed a foundational, pooled, data science data set of over 1400 articles in under 2 weeks. These articles provide COVID-19–related information and represent around 10% of all papers about COVID-19. Conclusions: This data set can act as ground truth for the future implementation of a live, automated systematic review. The three benefits of REDASA’s design are as follows: (1) it adopts a user-friendly, human-in-the-loop methodology by embedding an efficient, user-friendly curation platform into a natural language processing search engine; (2) it provides a curated data set in the JavaScript Object Notation format for experienced academic reviewers’ critical appraisal choices and decision-making methodologies; and (3) due to the wide scope and depth of its web crawling method, REDASA has already captured one of the world’s largest COVID-19–related data corpora for searches and curation. %M 33835932 %R 10.2196/25714 %U //www.mybigtv.com/2021/5/e25714 %U https://doi.org/10.2196/25714 %U http://www.ncbi.nlm.nih.gov/pubmed/33835932