Logo OneChart

Purify the Chart Structural Extraction via One Auxiliary Token

MEGVII Technology
*Equal Contribution

Introduction

Chart parsing poses a significant challenge due to the diversity of styles, values, texts, and so forth. Even advanced large vision-language models (LVLMs) with billions of parameters struggle to handle such tasks satisfactorily. To address this, we propose Logo OneChart, a reliable tool specifically devised for the structural extraction of chart. It captures essential components like chart titles, sources, and aligned numerical data and outputs them to a Python-dict format, which can effectively facilitate downstream chart reasoning tasks.

We train a specialized chart encoder using a large amount of synthetic chart data in both English and Chinese. To enhance the numerical parts of the textual output, we introduce an auxiliary token along with an additional decoder. The auxiliary token allows subsequent text tokens to capture enhanced numerical features through causal attention. Furthermore, with the aid of one auxiliary token, we devise a reliable check mechanism during inference by providing self-consistency distance for the generated content.

Moreover, we present a large-scale chart-to-dict benchmark. These charts span a broad spectrum of topics and types and include content in both English and Chinese. Experiments reveal that Logo OneChart achieves SOTA performance in structural extraction, despite enjoying only 0.2B parameters. It shows a 19.1% ~ 29.4% improvement in Average Precision (AP) compared to suboptimal methods in charts lacking numerical annotations. As a chart parsing agent, it also brings 11.2% accuracy gains for LLaVA-1.6 and 32.6% for LLaVA-1.5 in the downstream ChartQA benchmark.

Explorer

Explore Logo OneChart through demo!
(For Barline charts, the open-source version only supports three or fewer legends.)

Logo ChartY Benchmark

Traditional chart QA benchmarks often limit their scope to querying small, isolated segments of information from charts, such as individual values, which may not effectively gauge a modelโ€™s ability to extract and understand the full spectrum of data presented in a chart. In contrast, Logo OneChart aim to establish a benchmark centered around the Structural Extraction (SE) task, which directly assesses the modelโ€™s accuracy in converting chart images into structured python-dict representations. It consists of five dataset part, one part ChartY-zh (2,048 samples) for Chinese and others ChartQA-SE (1,509 samples), PlotQA-SE (33,657 samples), ChartX-SE (2,360 samples), ChartY-en (4,000 samples) for English.

Examples

Some examples for five dataset part in Logo ChartY

Leaderboard

scores on the different part and subset of Logo ChartY.

Average Precision (AP) is evaluated using SCRM (Structuring Chart- oriented Representation Metric). Change the dropdown to view APs under different levels of tolerance.
P is partial charts have numerical values marked on images, N is no charts have numerical values marked on images, Y is all charts have numerical values marked on images.
CQA: ChartQA-SE, PLT: PlotQA-SE, XB: ChartX-SE-bar, XBN: ChartX-SE-barnum, XL: ChartX-SE-line, XLN: ChartX-SE-linenum, XP: ChartX-SE-pie,
ENBL: ChartY-en-barline, ENP: ChartY-en-pie, ZHBL: ChartY-zh-barline, ZHP: ChartY-zh-pie.

BibTeX

@misc{chen2024onechart,
      title={OneChart: Purify the Chart Structural Extraction via One Auxiliary Token}, 
      author={Jinyue Chen and Lingyu Kong and Haoran Wei and Chenglong Liu and Zheng Ge and Liang Zhao and Jianjian Sun and Chunrui Han and Xiangyu Zhang},
      year={2024},
      eprint={2404.09987},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}