vitaLITy: Promoting Serendipitous Discovery of Academic Literature

Faculty: 
Emily Wall
Students: 
Arpit Narechania

There are a few prominent practices for conducting academic literature reviews, including searching for specific keywords on Google Scholar or checking citations from initial seed paper(s). While these approaches serve a critical purpose for academic literature reviews, there remain challenges in identifying relevant literature when (1) different work may utilize the same terminology (e.g., "transformer" in electronics refers to a device that transfers energy between circuits; whereas in computing, it refers to a type of deep learning model, commonly applied to unstructured text data) or (2) similar work may utilize different terminology (e.g., work on "bias" in visualization seldom mentions "uncertainty" even though bias sometimes emerges when people make decisions under uncertainty).

We developed a visual analytics system, VitaLITy, to promote serendipitous discovery of academic papers wherein users may "stumble upon" relevant literature, when other search approaches may fail. VitaLITy (1) utilizes transformer language models to help users find semantically similar papers given a list of seed paper(s) or a working abstract, (2) visualizes the embedding space in an interactive 2-D scatterplot, and (3) summarizes meta information about the paper corpus (e.g., keywords, co-authors, citation counts, and publication year).

We also curated a comprehensive dataset comprising papers from 38 popular visualization publication venues (e.g., ACM CHI, IEEE VIS) using custom web-scrapers. We have open-sourced the VitaLITy system, dataset, and web-scrapers at https://vitality-vis.github.io/ for the research community to grow the list of supported venues, potentially expanding into other fields, e.g., biology.

Lab: 
Director: 
Alex Endert
Faculty: 
Alex Endert
Our goal is to help people make sense of data. We research and develop interactive visualizations that couple machine learning with visual interfaces of data for exploration and sensemaking.