Researchers develop question-answering dataset for evaluating NLP tools for COVID-19

April 28, 2020
by Cole McCollum
Very important

In a preprint paper, researchers from the University of Waterloo and other organizations used the COVID-19 Open Research Dataset (CORD-19) to develop a question answering dataset to evaluate the information retrieval capabilities of NLP tools. The dataset included 124 question-answering pairs, which consisted of keyword searches and natural language questions associated with the text containing the answer in a document. As the dataset was too small for training models from scratch, the best-performing models tested against the benchmark used transfer learning to fine-tune unsupervised NLP models. Clients should expect efforts like these and the forthcoming effort from NIST to play a key role in improving AI tools used for COVID-19 research.

For the original news article, click here .

Further Reading

Putting together the ingredients of AI

Analyst Insight | October 17, 2019

Lux has worked with many clients to help them spot AI opportunities and cutting‑edge technology developers. However, many clients still face challenges in integrating AI across their own organization or even establishing their own AI capabilities. This insight categorizes the AI value chain and ... Not part of subscription

Industrial Big Data

Technology | April 05, 2020

Big data and its gathering, cleaning, and organization in industrial settings. Not part of subscription

Data cleaning adjacencies: Identifying automation opportunities in the broader data preparation process

Analyst Insight | June 29, 2020

In our earlier insights "Data Cleaning 101" and "Emerging data cleaning solutions," Lux discussed the foundational elements of dirty data as well as the various solutions organizations can implement to automate data cleaning. However, as evident from these insights, only a handful of successful ... Not part of subscription