Chat BCG: Can AI Read Your Slide Deck?
Abstract
Multimodal models like GPT4o and Gemini Flash are exceptional at inference and summarization tasks, which approach human-level in performance. However, we find that these models underperform compared to humans when asked to do very specific ’reading and estimation’ tasks, particularly in the context of visual charts in business decks. This paper evaluates the accuracy of GPT 4o and Gemini Flash-1.5 in answering straightforward questions about data on labeled charts (where data is clearly annotated on the graphs), and unlabeled charts (where data is not clearly annotated and has to be inferred from the X and Y axis).
On labeled charts, we found that both GPT 4o and Gemini Flash are consistently inaccurate on specific types of charts (multiple figures in one chart, stacked charts, waterfall charts). Human error rates are estimated to be under 5%, while GPT-4o and Gemini Flash have error rates of 16% and 14% 111Measured as % of data points perfectly matching source of truth value., respectively. Both LLMs often make similar mistakes, such as consistently misreading a ’3’ as an ’8’ or mislabelling a negative number as positive. Across charts, neither model consistently outperforms the other.
Similiarly, on unlabeled charts, both GPT and Gemini have similiar failure points. The models were tasked with estimating a number from an unlabeled chart by reading the X and Y axis. As expected, perfectly matched answers were low, with error rates as high as 79% for Gemini and 83% for GPT 4o. Here, we measured the magnitude of the error for each model, compared to a (human read) source of truth value. The errors were substantial, with average deviations of 53% for Gemini Flash-1.5 and 55% for GPT-4o 222Measured as mean absolute percentage difference (see methodology for formula details)., compared to a human margin of error estimated at 10-20%. Driving the deviations are consistently small errors in estimation, but also bigger deviations driven by models misreading labels/numbers. For instance, GPT 4o in one instance misread 2015 as 2009 and estimated the number for the wrong year.
We conclude that these models aren’t currently capable of reading a deck accurately end-to-end if it contains any complex or unlabeled charts. Even if a user created a deck of only labeled charts, the model would only be able to read 7-8 out of 15 labeled charts perfectly end-to-end.
1 Introduction
With the advanced vision capabilities of GPT-4o and Gemini Flash, an important question arises regarding the accuracy of these functionalities in practical business applications. Our assumption was that multimodal models are good at reading and summarizing charts. When given an image of a slide deck, they do a good job of summarizing key insights from it, often including relevant data points.
Existing research into this question has evaluated the efficacy of LLM’s when parsing tables [3], concluding that the LLMs were highly sensitive to input prompts which drive performance. Other works also evaluate LLMs ability to reason and read mathematical graphs [2] and find that GPT models outperform alternatives.
This paper aims to explore whether multimodal models perform well on a variant of this skill - answering straightforward questions that require the models to pick out a number from a slide deck. We test this by asking the models questions about data directly printed on charts (if labeled) or asking the model to estimate data points from the chart (if unlabeled). We ensure that the model does not need to perform any mathematical calculations. We then measure the accuracy of responses across different types of charts to answer the following specific questions:
-
•
How accurately can multimodal models with advanced vision capabilities read data from labeled charts? Is there a consistent accuracy advantage for one model over the other?
-
•
How accurately can the models estimate numerical data from unlabeled charts? On average, how ’incorrect’ is their estimation?
In the next section, we dive into error rates across different chart types, broken into two categories - labeled and unlabeled.
2 How accurate are multimodal models in reading charts?
2.1 Methodology
We sampled 31 charts and classified them into 2 broad categories:
-
•
Labeled (15 charts): This category includes simple bar charts, line charts, and multi-bar charts where each data point is explicitly printed on the chart. We also included some more complex labeled charts, such as waterfall charts, stacked charts, bubble charts, and connected bar charts.
-
•
Unlabeled (16 charts): This category is primarily focused on charts that do not have explicitly printed data points and require some estimation of data by ’reading the position’ relative to the X and Y axis. It does not include charts that do not have any scale at all. The dataset includes simpler unlabeled charts such as bar charts, line charts, mixed charts, and bubble charts, as well as some more complex unlabeled charts such as moon charts and dot charts.
For each chart, we created a dataset of questions. These questions are restricted to 3 types:
-
•
Identify a specific data point
-
•
Identify the largest/lowest data point
-
•
Count the number of data points
The goal of these questions is to test the ability to read and interpret data directly from the chart, without requiring any intensive computation.
We then evaluate the responses of both the models on two metrics:
-
•
Match Rate %: This metric is used for labeled charts where the model is reading data printed on the chart, but reported for both. It was calculated as -
-
•
Mean Absolute Error (MAE): This metric is reported for unlabeled charts where the model is ’estimating’ the number from the graph. It measures the average magnitude of the errors in a set of predictions, without considering their direction. This was calculated as -
where:
-
–
is the number of data points,
-
–
is the actual value for the -th data point,
-
–
is the predicted value for the -th data point.
-
–
-
•
Mean Absolute Percentage Error (MAPE): This metric is reported for unlabeled charts where the model is ’estimating’ the number from the graph. It can be read as how divergent the model’s answer was from the correct answer, normalized to a percentage. This was calculated as -
where:
-
–
is the number of data points,
-
–
is the actual value for the -th data point,
-
–
is the predicted value for the -th data point.
-
–
These metrics are then calculated at an aggregated level across all the charts and reported for labeled and unlabeled datasets separately.
2.1.1 Methodology Walk-Through for Labeled Chart
For instance, in the labeled chart in Figure 1 (below), we generated questions and measured the match % as given in Table 1
Question | Correct Answer | GPT Answer | Gemini Answer |
---|---|---|---|
what % of the employee workforce is white in FY23? | 55 | 55 | 55 |
what % of the employee workforce is female in FY21? | 49 | 49 | 51 |
what % of the employee workforce is ’two or more races’ in FY22? | 3 | 3 | 8 |
what % of the employee workforce elected not to provide their race in FY21? | 4 | 4 | 4 |
what % of the employee workforce is Asian in FY23? | 22 | 22 | 7 |
Match % | 100% | 60% |
2.1.2 Methodology Walk-Through for Unlabeled Chart
For unlabeled charts (see Figure 2 below), we generated questions and estimated the MAPE, as given in Table 2
Question | Correct Answer | GPT Answer | Gemini Answer |
---|---|---|---|
How many deals happened in 2018? | 30 | 20 | 10 |
How much funding (in millions) was deployed in 2021? | 1500 | 1500 | 1500 |
What was the deal count in 2019? | 40 | 20 | 40 |
How much funding (in millions) was deployed in 2017? | 100 | 0.5 | 10 |
What was the highest deal count between 2017 and 2022? | 110 | 120 | 110 |
What was the highest funding (in millions) between 2017 and 2020? | 1450 | 1000 | 300 |
MAPE % | 37% | 39% |
2.2 Summary Findings
Using the above methodology, the following table reports the overall match % and MAPE % for labeled and unlabeled charts across the sample of 31 charts:
Chart Type | GPT Match % | Gemini Match % |
Labeled Charts | 84% | 86% |
Chart Type | GPT MAPE % | Gemini MAPE % |
Unlabeled Charts | 55% | 53% |
2.3 Labeled Charts: What is the match rate across charts?
Across 15 charts and a total of 71 questions, GPT 4o reads 8 charts perfectly end-to-end and Gemini Flash-1.5 read 7 charts perfectly. In charts where it makes errors, the models are never incorrect on all the questions. Both models make large errors on a few data points, usually due to misreading a label. Since these aren’t estimation errors, the range of errors is erratic, as evident from Table 5. Both models struggle with charts where multiple figures are present.
Chart Name | Gemini Match % | GPT Match % | Gemini Error Range | GPT Error Range |
---|---|---|---|---|
stacked bar chart | 100% | 80% | (0, 0) | (0, 5) |
bar and table chart | 100% | 100% | (0, 0) | (0, 0) |
piechart | 100% | 100% | (0, 0) | (0, 0) |
simple table | 100% | 100% | (0, 0) | (0, 0) |
icons on chart | 100% | 100% | (0, 0) | (0, 0) |
stacked piechart | 100% | 100% | (0, 0) | (0, 0) |
world map chart | 100% | 100% | (0, 0) | (0, 0) |
bar chart diagonal labels | 100% | 75% | (0, 0) | (0, 2) |
connected bar graphs | 86% | 100% | (0, 20) | (0, 0) |
horizontal stacked bar chart | 80% | 100% | (0, 15) | (0, 0) |
positive-negative bar charts | 80% | 80% | (0, 878) | (0, 178) |
waterfall chart | 80% | 60% | (0, 194) | (0, 882) |
eight multi-bar charts | 75% | 50% | (0, 1) | (0, 22) |
two figures: stacked multi-bar | 67% | 67% | (0, 4) | (0, 4) |
two figures - multi-bars | 20% | 40% | (0, 52) | (0, 46) |
Overall | 86% | 84% |
GPT 4o failure points
In some cases, 4o misunderstands the data associated with a given label in stacked charts. In ’two figures: stacked multi-bar’, when asked the Athleta store count in 2012, it returned the count for Microsoft stores instead. Similarly, in ’eight multi-bar charts’, it mistakenly returns the diabetes rate for Japan instead of the obesity rate
In other cases, 4o misidentifies numbers on lower resolution charts. In ’stacked bar chart’, for instance, GPT 4o misreads ’3’ as ’8’ when asked to identify the % of public sector publications by KPMG, a mistake that Gemini did not make.
Gemini failure points
Similiar to 4o, Gemini errors are driven by misinterpreting the label but also contain an element of misinterpreting the question, which GPT 4o doesn’t demonstrate as frequently. In ’two figures - multi-bars’, for instance, when asked what % of financing sources come from private investment (other than EPR), it answers from the wrong bar graph. 4o on the other hand answers from the correct table, but the wrong label within that table. While outside the scope of these error rates, we also noticed that 4o was better at answering some reasoning / calculation questions compared to Gemini
2.4 Unlabeled Charts: What is the magnitude of error across charts?
Across 82 questions in 15 unlabeled charts, both GPT 4o and Gemini Flash are at par on the mean absolute percentage error at 55%.
Chart Name | Gemini MAPE % | GPT MAPE % |
stacked bar low res | 226 | 131 |
bubble chart | 147 | 177 |
two figures chart | 113 | 155 |
stacked bar and pie chart | 62 | 101 |
single line chart | 49 | 37 |
bar and line | 45 | 68 |
line on bar 2 | 39 | 37 |
simple line on bar | 35 | 8 |
tagged dots chart | 33 | 29 |
range line chart | 32 | 10 |
comparison bar charts | 29 | 24 |
seven line chart | 20 | 18 |
moon chart | 13 | 13 |
dots chart | 12 | 37 |
black line chart | 8 | 7 |
three line chart | 6 | 24 |
Overall | 52.79 | 54.64 |
Observations
Both Gemini and GPT 4o struggle with charts that have multiple unlabeled figures, with both reaching >100% error rates (113% and 155% respectively). This is driven by high magnitude errors across many/all of the questions for that chart. Another common failure point is stacked charts, where the estimation of a middle figure requires estimating all the labels in the chart (error rates here reach upto 85% for a high resolution charts and a staggering 230% for a low resolution chart)
Both models also struggle with stacked bar charts, where estimating the number may require isolating a sub-bar. In the ’stacked bar and pie chart’ for instance, both models responded with the total value of the bar instead of the value for a specific sub-bar label (50,000 vs. correct answer of 35,000)
Both models perform best on line charts with deviation rates as low as 6% on a complex three line chart.
A detailed breakdown of the chart-wise questions and errors are given in the appendix.
3 Conclusion
While both GPT-4o and Gemini Flash 1.5 exhibit many advanced capabilities in reading charts, our evaluation reveals some consistent limitations in their current performance. Despite being better than other alternatives, these models still require human oversight to achieve acceptable accuracy levels. For labeled charts, the models demonstrate an average error rate of 15%, which may not be suitable for high-stakes business applications. The performance on unlabeled charts is particularly inconsistent, with error rates exceeding 100% for more complex visuals. When assessed using a comprehensive set of 30 charts, representing a typical business deck, the models are only able to read 7-8 charts with 100% match accuracy (all questions correct). Thus, for any use case demanding high precision, these models are not yet ready to operate without human intervention.
4 Appendix
4.1 Labeled Charts: Questions and Model Responses
Chart Type | Question | Correct Answer | Gemini Flash Answer | GPT 4o Answer |
---|---|---|---|---|
two figures: stacked multi-bar | ||||
How many stores did Warby Parker open by 2013? | 13 | 13 | 13 | |
In 2012, how many stores did Athleta have? | 35 | 35 | 31 | |
In 2013, how many stores did Boston Proper have? | 4 | 8 | 4 | |
In 2009, how many stores did Microsoft have? | 2 | 2 | 0 | |
What is the % of capex to store related investments in 2012? | 71 | 71 | 71 | |
What was the compound annual growth rate of the % of capex to store related investments from 2011 to 2013? | 6 | 4 | 6 | |
Perfect Match % | 67 | 67 | ||
stacked bar chart | ||||
What % of retail and distribution publications did PwC publish from 2013-15? | 6 | 6 | 6 | |
How many industry publications did KPMG have from 2013-15? | 626 | 626 | 626 | |
What % of retail and distribution publications did EY publish from 2013-15? | 4 | 4 | 4 | |
What % of public sector publications did Deloitte publish from 2013-15? | 9 | 9 | 9 | |
What % of public sector publications did KPMG publish from 2013-15? | 3 | 3 | 8 | |
Perfect Match % | 100 | 80 | ||
connected bar graphs | ||||
What is the projected total global fintech revenue (in $ billion) for 2030? | 1500 | 1500 | 1500 | |
How much of the global fintech revenue in 2021 came from the banking sector? | 225 | 245 | 225 | |
By how many times is the global fintech revenue expected to grow from 2021 to 2030? | 6 | 6 | 6 | |
What is the projected fintech penetration of banking revenues (%) for 2030? | 13 | 13 | 13 | |
What was the fintech penetration of banking valuations (%) in 2021? | 9 | 9 | 9 | |
What is the projected revenue for the insurance sector in 2030 in billion dollars? | 200 | 200 | 200 | |
What was the total global fintech revenue in 2021 in billion dollars? | 245 | 245 | 245 | |
Perfect Match % | 86 | 100 | ||
eight multi-bar charts | ||||
What is the prevalence of high blood pressure in France? | 28.5 | 28.5 | 28.5 | |
How much more prevalent is diabetes in France compared to India? | 2.7 | 2.2 | 3.9 | |
What is the prevalence of obesity in Brazil? | 22.3 | 22.3 | 22.3 | |
What is the prevalence of obesity in Japan? | 4.4 | 4.4 | 26.7 | |
Perfect Match % | 75 | 50 | ||
two figures - multi-bars | ||||
What percentage of financing sources that supported the implementation of the solution came from private investment (other than EPR)? | 42 | 0 | 16 | |
What is the percentage of governmental funding that supported the implementation of the solution? | 16 | 21 | 16 | |
What percentage of financing for scaling the solution came from blended financing? | 21 | 0 | 21 | |
How much did philanthropic funding contribute to the scaling of the solution in percentage terms? | 21 | 21 | 0 | |
What is the combined percentage of multilateral funding for both the implementation and scaling of the solution? | 11 | 63 | 57 | |
Perfect Match % | 20 | 40 | ||
bar and table chart | ||||
What is the overall unbanked percentage? | 27 | 27 | 27 | |
What is the cash usage in APAC? | 59 | 59 | 59 | |
How many underbanked adults in Europe? | 357 | 357 | 357 | |
How many more unbanked adults in APAC than in Europe? | 766 | 766 | 766 | |
Perfect Match % | 100 | 100 | ||
piechart | ||||
What percentage of TMT executives are ready to disclose Scope 1 emissions? | 69 | 69 | 69 | |
What percentage of TMT executives are ready to disclose all other indirect emissions that occur in the company’s value chain? | 34 | 34 | 34 | |
What percentage of executives overall are ready to disclose Scope 3 emissions? | 37 | 37 | 37 | |
What is the difference in the % of TMT executives willing to disclose Scope 2 emissions and % of executives overall willing to disclose Scope 2 emissions? | 7 | 7 | 7 | |
Perfect Match % | 100 | 100 | ||
horizontal stacked bar chart | ||||
What % of the employee workforce is white in F&23? | 55 | 55 | 55 | |
What % of the employee workforce is female in F&21? | 49 | 49 | 49 | |
What % of the employee workforce is ’two or more races’ in F&22? | 3 | 3 | 3 | |
What % of the employee workforce elected not to provide their race in FY21? | 4 | 4 | 4 | |
What % of the employee workforce is Asian in F&23? | 22 | 7 | 22 | |
Perfect Match % | 80 | 100 | ||
simple table | ||||
What is the trades per month in market C? | 10 | 10 | 10 | |
What are the net proceeds in market A? | 46 | 46 | 46 | |
What is the price in Market B? | 48 | 48 | 48 | |
What is the annual volume in market B? | 12000 | 12000 | 12000 | |
What is the net proceed in market B? | 43 | 43 | 43 | |
Perfect Match % | 100 | 100 | ||
icons on charts | ||||
What was the NPS of Netflix? | 68 | 68 | 68 | |
What was the NPS of Amazon? | 62 | 62 | 62 | |
What was the NPS of Spotify? | 54 | 54 | 54 | |
What was the NPS of Life? | 26 | 26 | 26 | |
What was the NPS of Apple? | 68 | 68 | 68 | |
Perfect Match % | 100 | 100 | ||
positive-negative bar charts | ||||
What is the change in medical claims spent per participant (in $) for ’Hello Heart Users’ in the surgery/inpatient service category? | -482 | -482 | -482 | |
What is the change in medical claims spent per participant (in $) for ’Matched Control’ group in the diagnostics service category? | 580 | -298 | 580 | |
What is the change in medical claims spent per participant (in $) for ’Matched Control’ group in the physician visit service category? | 40 | 40 | 218 | |
What is the reduction in total medical costs per participant per year (in $)? | 1865 | 1865 | 1865 | |
What is the total change in medical claims spent per participant (in $) for ’Hello Heart Users’? | -880 | -880 | -880 | |
Perfect Match % | 80 | 80 | ||
waterfall chart | ||||
What is the starting IH09? (in $ millions) | 1911 | 1911 | 1911 | |
What is the deduction for investing activities (in $ millions)? | 97 | -97 | -97 | |
What is the addition for cash capex (in $ millions)? | 441 | 441 | -441 | |
What is the pension contribution (in $ millions)? | -187 | -187 | -187 | |
What is the cash from operations amount (in $ millions)? | 285 | 285 | 285 | |
Perfect Match % | 80 | 60 | ||
world map chart | ||||
What is the % of women in the Americas? | 49 | 49 | 49 | |
What is the number of people in Asia Pacific? | 56386 | 56386 | 56386 | |
What is the number of people in EMA? | 142368 | 142368 | 142368 | |
What is the total number of member firms? | 143 | 143 | 143 | |
What is the overall % of women globally? | 48.5 | 48.5 | 48.5 | |
What is the % of women in Asia-Pacific? | 54 | 54 | 54 | |
Perfect Match % | 100 | 100 | ||
stacked piechart | ||||
What is the % share of digital engagement in the personal care category? | 17 | 17 | 17 | |
What is the % share of digital engagement in the computer software category? | 98 | 98 | 98 | |
What is the % share of digital engagement in the food and drink category? | 3 | 3 | 3 | |
What % of middle east and africa uses digital banking? | 17 | 17 | 17 | |
What is the % share of digital engagement in the consumer banking category? | 58 | 58 | 58 | |
Perfect Match % | 100 | 100 | ||
bar chart diagonal labels | ||||
What is the accuracy % for resort in Graph Layer = 2 | 86.7 | 86.7 | 88.4 | |
What is the accuracy % for viaduct in Graph Layer = 1 | 84 | 84 | 84 | |
What is the accuracy % for forest in Graph Layer = 3 | 84.9 | 84.9 | 84.9 | |
What is the accuracy % for forest in Graph Layer = 1 | 83.9 | 83.9 | 83.9 | |
Perfect Match % | 100 | 75 |
4.2 Unlabeled Charts: Questions and Model Responses
Chart Type | Question | Correct Answer | Gemini Answer | OpenAI Answer |
---|---|---|---|---|
bar and line | ||||
What was the highest revenue growth rate % seen post the financial crisis? | 11 | 10 | 8 | |
What was the lowest revenue growth rate % seen before the financial crisis? | 5 | 10 | 8 | |
What was the lowest R&D growth rate % seen before the financial crisis? | 10 | 5 | 5 | |
What was the revenue growth rate in 2009? | 4 | 4 | 0 | |
What was the revenue growth rate in 2019? | 11 | 10 | 6 | |
What was the R&D growth rate in 2015? | 7.5 | 0 | -2 | |
Mean Absolute Error (MAE) | 3.25 | 4.92 | ||
Mean Absolute Percentage Error (MAPE) % | 44.70 | 68.23 | ||
bubble chart | ||||
What is the change in disease burden between 2020 and 2040 for nutritional deficiencies? | -20 | -20 | -20 | |
What is the minimum change in disease burden between 2020 and 2040 for infectious diseases? | -5 | -40 | -40 | |
What is the highest increase in disease burden expected from a disease in the ’other diseases’ category? | 60 | 40 | 50 | |
What is the change in disease burden between 2020 and 2040 for neglected tropical disease and malaria? | -30 | -30 | -40 | |
How many bubbles which represents an ’infectious disease’ have a change in disease burden that ranges from -20 to -40? | 3 | 3 | 4 | |
Mean Absolute Error (MAE) | 11 | 15.20 | ||
Mean Absolute Percentage Error (MAPE) % | 146.67 | 176.67 | ||
line on bar 2 | ||||
How many deals happened in 2018? | 30 | 10 | 20 | |
How much funding (in millions) was deployed in 2021? | 1500 | 1500 | 1500 | |
What was the deal count in 2019? | 40 | 40 | 20 | |
How much funding (in millions) was deployed in 2017? | 100 | 10 | 0.5 | |
What was the highest deal count between 2017 and 2022? | 110 | 110 | 120 | |
What was the highest funding (in millions) between 2017 and 2020? | 1450 | 300 | 1000 | |
Mean Absolute Error (MAE) | 210 | 98.25 | ||
Mean Absolute Percentage Error (MAPE) % | 39.33 | 37.16 | ||
tagged dots chart | ||||
What % of survey respondents think Advanced IT will be a widely used skill in the future? | 35 | 26 | 30 | |
What % of survey respondents think Basic IT will be a future skill needed? | 15 | 26 | 30 | |
What percentage of respondents think ’Entrepreneurship’ will be an expected future skill needed? | 27 | 18 | 20 | |
How many categories of skills are included in the limited but growing quadrant? | 3 | 4 | 3 | |
How many skills are in the ’limited and stable’ quadrant of this chart? | 4 | 5 | 5 | |
What percentage of respondents said that ’complex information processing’ will be an expected future skill needed? | 23 | 25 | 30 | |
What percentage of respondents said that ’complex information processing’ is a most widely used skill today? | 22 | 15 | 20 | |
Mean Absolute Error (MAE) | 5.71 | 5.29 | ||
Mean Absolute Percentage Error (MAPE) % | 33.03 | 29.25 | ||
moon chart | ||||
What is the relative applicability of Machine Learning (AI) (as a %) in the negotiation stage of the procurement process? | 100 | 100 | 100 | |
What is the relative applicability of Machine Learning (AI) (as a %) in the invoice checking stage of the procurement process? | 25 | 25 | 25 | |
What is the relative applicability of Automation (as a %) in the payment processing stage of the procurement process? | 100 | 100 | 100 | |
What is the relative applicability of Automation (as a %) in the RFX process stage of the procurement process? | 75 | 25 | 25 | |
What is the relative applicability of cognitive agents (as a %) in the ’Buyer’ stage of the procurement process? | 25 | 25 | 25 | |
Mean Absolute Error (MAE) | 10 | 10 | ||
Mean Absolute Percentage Error (MAPE) % | 13.33 | 13.33 | ||
black line chart | ||||
What was the traction in Sep ’15 (in $ thousands)? | 60 | 55 | 60 | |
What was the traction in Nov ’15 (in $ thousands)? | 75 | 80 | 80 | |
What was the traction in Jul ’15 (in $ thousands)? | 48 | 40 | 60 | |
What was the highest traction between May ’15 and May ’16 (in $ thousands)? | 140 | 130 | 137.3 | |
What was the % increase in traction in the ’Last 30 days’? | 22 | 22 | 22 | |
Mean Absolute Error (MAE) | 5.60 | 3.93 | ||
Mean Absolute Percentage Error (MAPE) % | 7.76 | 6.71 | ||
two figures chart | ||||
What % of respondents say their frequency of use is 1-3 times a month because of airline miles? | 4 | 18 | 25 | |
What % of respondents say their duration of most used credit card is 10 years or more because of low interest rates? | 23 | 10 | 20 | |
What % of respondents say their frequency of use is 1-2 times a week because of unlimited cash back? | 14 | 13 | 10 | |
What is the highest % of respondents who say their duration of most used credit card is 1 year but less than 2 years? | 12 | 28 | 35 | |
What is the lowest % of respondents who say their frequency of use is 7-10 times a week? | 6 | 5 | 5 | |
Mean Absolute Error (MAE) | 9 | 10.40 | ||
Mean Absolute Percentage Error (MAPE) % | 112.73 | 154.99 | ||
seven line chart | ||||
What were the influenza rates in Feb 2016/17? | 28000 | 25000 | 30000 | |
What were the influenza rates in Apr 2014/15? | 8000 | 6000 | 5000 | |
What were the influenza rates in Jan 2015/16? | 10000 | 10000 | 10000 | |
What was the average influenza rate in Mar between 2014-19? | 27000 | 15000 | 20000 | |
Mean Absolute Error (MAE) | 4250 | 3000 | ||
Mean Absolute Percentage Error (MAPE) % | 20.04 | 17.64 | ||
single line chart | ||||
What was the growth projection on 12/29/2003? | 80000 | 150000 | 100000 | |
What was the growth projection on 5/17/2004? | 210000 | 400000 | 300000 | |
What was the actual data on 4/19/2004? | 480000 | 400000 | 102000 | |
What was the highest actual growth between 5/03 - Present? | 900000 | 900000 | 900000 | |
Mean Absolute Error (MAE) | 85000 | 122000 | ||
Mean Absolute Percentage Error (MAPE) % | 48.66 | 36.65 | ||
stacked bar low res | ||||
What were Origin’s ’load and bank guarantee - undrawn’ (in $ million) in 2011? | 800 | 1500 | 400 | |
What were Origin’s ’US private placement’ (in $ million) in 2014? | 180 | 400.0 | 0.0 | |
What were Origin’s ’load and bank guarantee - undrawn’ (in $ million) in 2010? | 90 | 400 | 200 | |
What were Origin’s ’load and bank guarantee - drawn’ (in $ million) in 2012? | 1100 | 1500 | 200 | |
What was Origin’s highest US private placement (in $ million) between 2010 and 2020? | 250 | 1600 | 1000 | |
Mean Absolute Error (MAE) | 596 | 468 | ||
Mean Absolute Percentage Error (MAPE) % | 226.11 | 130.81 | ||
three lines chart | ||||
What was the total return to shareholders by non resilient companies in 2015? | 120 | 120 | 150 | |
What was the total return to shareholders by resilient companies in 2013? | 230 | 240 | 175 | |
What was the total return to shareholders by resilient companies at the end of recovery phase? | 150 | 150 | 200 | |
What was the total return to shareholders by S&P 500 companies at the end of growth phase? | 210 | 250 | 200 | |
What was the total return to shareholders by S&P 500 companies at the end of downturn phase? | 75 | 70 | 100 | |
What was the total return to shareholders by resilient companies in 2014? | 250 | 250 | 200 | |
What was the total return to shareholders by non-resilient companies in 2010? | 80 | 90 | 100 | |
Mean Absolute Error (MAE) | 9.29 | 34.29 | ||
Mean Absolute Percentage Error (MAPE) % | 6.08 | 23.62 | ||
stacked bar and pie chart | ||||
What is the lead supply ’Today’ in Type 1 channel? | 35000 | 50000 | 50000 | |
What is the lead supply in ’12 months’ in Type 1 channel? | 90000 | 100000 | 200000 | |
What is the lead supply in ’12 months’ in Type 4 channel? | 25000 | 30 | 50000 | |
What is the lead supply in ’Today’ in Type 4 channel? | 10000 | 20 | 50000 | |
What % of loan distribution is with Lender 4? | 9 | 18 | 5 | |
What % of loan distribution is with Lender 2? | 34 | 34 | 34 | |
What % of loan distribution is with ’Other’ Lender? | 5 | 9 | 5 | |
Mean Absolute Error (MAE) | 8566.14 | 27143.43 | ||
Mean Absolute Percentage Error (MAPE) % | 61.95 | 101.36 | ||
dots chart | ||||
What’s the percentage of the population above empowerment line in Israel in 2022? | 73 | 70 | 75 | |
What’s the GDP per capita 2022 in the US? (in $ thousands) | 75 | 65 | 70 | |
What’s the GDP per capita 2022 in Ireland (in $ thousands)? | 105 | 90 | 80 | |
What’s the percentage of the population above empowerment line in Ireland in 2022? | 73 | 70 | 75 | |
What’s the highest percentage of the population above empowerment line for a country with empowerment line set at $12 PPP floor? | 40 | 50 | 100 | |
Mean Absolute Error (MAE) | 8.20 | 18.80 | ||
Mean Absolute Percentage Error (MAPE) % | 12.17 | 37.19 | ||
comparison bar charts | ||||
What was the instrument handling performance score when using Osso VR Training? | 4 | 3 | 5 | |
What was the total performance score when using Standard training? | 7.5 | 8 | 6 | |
What was the knowledge of instruments performance score when using Osso VR Training? | 3.5 | 2 | 5 | |
What was the time and motion performance score when using standard training? | 2.5 | 1 | 2 | |
What was the total performance score when using Osso VR training? | 17.5 | 19 | 20 | |
Mean Absolute Error (MAE) | 1.20 | 1.40 | ||
Mean Absolute Percentage Error (MAPE) % | 28.62 | 24.43 | ||
simple line on bar | ||||
What was the number of mocap systems (in thousands) in the market in 2024? | 75 | 87.5 | 80 | |
What was the number of mocap systems (in thousands) in the market in 2022? | 15.0 | 20 | 20 | |
What was the number of assets created by users (in million) per year in 2022? | 22.5 | 0.02 | 22.5 | |
What was the number of assets created by users (in million) per year in 2023? | 44 | 40 | 45 | |
What was the highest number of assets created by users (in million) between 2021 and 2024? | 80 | 67.5 | 80 | |
Mean Absolute Error (MAE) | 11.30 | 2.20 | ||
Mean Absolute Percentage Error (MAPE) % | 34.93 | 8.45 | ||
range line chart | ||||
What percentage of respondents prefer in-store experience during the purchase step of shopping journey for the furniture category? | 78 | 39 | 80 | |
What percentage of respondents prefer in-store experience during the discovery step of shopping journey for the computer and electronics category? | 30 | 57 | 30 | |
What percentage of respondents prefer in-store experience during the pickup step of shopping journey for the furniture category? | 35 | 38 | 30 | |
What is the average in-store preference (percentage of respondents) during the trial step of the shopping journey? | 80 | 50 | 60 | |
What is the lowest in-store preference % in the pickup step of the shopping journey? | 35 | 36 | 30 | |
What is the highest in-store shopping preference % in the trial step of the shopping journey? | 85 | 80 | 80 | |
Mean Absolute Error (MAE) | 17.50 | 6.17 | ||
Mean Absolute Percentage Error (MAPE) % | 32.47 | 10.34 |
References
- [1] B. Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs, arXiv, 2023. https://arxiv.org/pdf/2305.18641.
- [2] D. Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis, arXiv, 2023. https://arxiv.org/pdf/2308.11224.
- [3] E. Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study, arXiv, 2023. https://arxiv.org/pdf/2305.13062.
- [4] F. Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey, arXiv, 2024. https://arxiv.org/pdf/2404.01869.
- [5] Y. UniChart, "A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning", arXiv, 2023. https://arxiv.org/pdf/2305.14761.
- [6] X. Zhang, Y. Xie, H. Zhang, et al. "A Survey on Evaluation of Large Language Models." arXiv preprint arXiv:2307.03109, 2023. Available: https://arxiv.org/pdf/2307.03109