Researchers develop question-answering dataset for evaluating NLP tools for COVID-19

April 28, 2020
by Cole McCollum
Very important

In a preprint paper, researchers from the University of Waterloo and other organizations used the COVID-19 Open Research Dataset (CORD-19) to develop a question answering dataset to evaluate the information retrieval capabilities of NLP tools. The dataset included 124 question-answering pairs, which consisted of keyword searches and natural language questions associated with the text containing the answer in a document. As the dataset was too small for training models from scratch, the best-performing models tested against the benchmark used transfer learning to fine-tune unsupervised NLP models. Clients should expect efforts like these and the forthcoming effort from NIST to play a key role in improving AI tools used for COVID-19 research.

