CAMBRIDGE — Researchers from MIT and the MIT-IBM Computing Research Lab developed and released an open-source dataset called ChartNet to improve vision-language models’ ability to interpret charts. The dataset includes more than a million varied chart images and was used to train a series of open-source models that outperformed larger commercial systems on tasks like data extraction and chart summarization.
ChartNet provides each chart image with accompanying materials: the code used to generate it, a textual description, a table of numerical information, and question-and-answer pairs. The researchers built the dataset using a two-step synthetic data generation pipeline that first translates chart images into code and then augments that code to modify attributes such as chart type, data values, topic, and colors.
“We can start from a single chart that we use as a seed and come up with hundreds of augmentations of it. This is how we were able to build a dataset with more than a million diverse images.” says Jovana Kondic, an MIT electrical engineering and computer science graduate student. She is the lead author of a paper on ChartNet, which will be presented at the IEEE Computer Vision and Pattern Recognition Conference.
Kondic added, “We developed ChartNet to be a one-stop shop for chart understanding, covering basically anything that an AI model and a practitioner who is training that model might need. We hope our work motivates researchers to achieve state-of-the-art performance with smaller models that don’t require infinite amounts of computation.” She also noted, “A vision-language model, unlike our brains, may need to see thousands of examples during training to reliably recognize something as a line chart.”
Dhiraj Joshi, a senior scientist at IBM Research, said in a podcast interview, “The finance industry thrives on charts. If vision-language models can extract information out of charts, like descriptions of trends, that facilitates a lot of workflows that happen downstream.”
The release of ChartNet addresses a major bottleneck in the field: the lack of high-quality training data for teaching AI systems to interpret visual data accurately. The open-source dataset is designed to support improvements in business trend analysis and scientific figure interpretation. Co-authors of the paper include Pengyuan Li, Dhiraj Joshi, Isaac Sanchez, Aude Oliva, and Rogerio Feris from MIT, the MIT-IBM Computing Research Lab, and IBM Research.