We put 5 AI models to the KPI test: Here's what happened

Table of contents

AI is everywhere these days, seamlessly becoming part of our daily lives. Just look at how many apps use AI - from streaming platforms giving personalised recommendations to voice assistants that understand and carry out our commands and smart home devices taking care of household chores. Beyond just entertainment and convenience, AI is making waves in crucial areas like healthcare, where it helps diagnose diseases, and finance, where it detects fraud and manages risks. This isn't just a fad - AI is changing the game across industries and revolutionising our lives.

AI helps us work smarter in the business world by providing insights and tools that enhance decision-making, streamline operations, and drive better results. A key area where AI can make a big difference is in setting and achieving key performance indicators (KPIs). AI helps businesses set more accurate and meaningful KPIs aligned with their specific goals. By using AI to improve KPI setting, companies can focus on the right metrics, optimise their performance, and stay agile. 

We decided to run an experiment to see which of five AI models - ChatGPT, Claude, Gemini, Perplexity, and Copilot - is the best at helping set customer service KPIs. Read on for all the details and results…

Methodology

To ensure a thorough and unbiased evaluation of AI models for determining and setting KPIs, we used the following methodology: 

1. Selection of prompts

Develop a set of clear and concise prompts related to customer service. For this experiment, these prompts will be as follows: 

  • Test 1 - Identifying KPIs: "What are 10 KPIs I should be tracking for customer service in my business?" 
  • Test 2 - Clarifying a KPI definition: "I don’t understand [select metric]. Explain what this is and why it’s important." 
  • Test 3 - Identifying tools for tracking KPIs: "What tools can I use to track my KPIs effectively against my goals?" 
  • Test 4 - KPI benchmarks and targets: "Provide a list of benchmarks and realistic targets for each of these KPIs, including sources/citations."

2. Testing and analysis

Input the same prompts into each AI model, ensuring consistency across all tests. The first prompt will be entered into a new chat for each AI model, and the subsequent prompts will be entered into the same chat to maintain context clarity, keep conversations focused, and avoid information spillover. Record and analyse the responses generated and compare the performance of each AI model.

3. Evaluation and scoring

Evaluation criteria will vary for each test and will be outlined in the results below. Numerical scores will be given for each criterion and tallied for an overall score in each test. The overall scores for each test will be added up to give a total score and ranking. The AI model with the highest total score will be the winner.

Test 1 - Identifying KPIs

Prompt: What are 10 KPIs I should be tracking for customer service in my business?

Purpose

This prompt is fundamental for establishing a baseline understanding of which KPIs are essential for monitoring the effectiveness and efficiency of customer service operations.

Insights

Consistencies and differences in KPIs

All AI models agreed on key KPIs like First Response Time, Average Resolution Time, CSAT, and NPS. This shows these metrics are widely seen as crucial for judging customer service.

ChatGPT and Claude kept it general, while Gemini broke it down into resolution rates, response times, customer effort, efficiency and volume, and loyalty and advocacy. Perplexity added in Call Abandonment Rate and Knowledge Base Views, focusing on call centre stats and self-service. Copilot stressed Consistent Resolutions across channels, showing the importance of a consistent customer experience.

Unique KPIs 

ChatGPT: Service Level Agreement (SLA) Compliance, Quality Assurance (QA) Score

Claude: Ticket Volume, Ticket Backlog, Employee Engagement

Gemini: Average Handle Time (AHT)

Perplexity: Cost per Resolution, Agent Utilisation Rate

Copilot: Consistent Resolutions, Cost Per Conversation

Level of detail and explanation

ChatGPT and Claude give short descriptions of each KPI without adding much context. Gemini goes into more detail, explaining how to understand the KPIs and what they mean for customer service. Perplexity backs up its KPI suggestions with citations and references. Copilot uses practical examples to show why certain KPIs matter.

Evaluation criteria

Understanding of the prompt: 

How well does the AI comprehend the nuances and context of the queries? [Poor understanding (1), Limited (2), Moderate (3), Good (4), Excellent (5)]

Accuracy of insights and recommendations: 

How closely do the AI-generated outputs align with established best practices and industry standards? [Highly inaccurate (1), Moderately inaccurate (2), Neutral (3), Mostly accurate (4), Highly accurate (5)]

Effectiveness in guiding KPI setting: 

How well does the AI assist the user in defining relevant and actionable KPIs? [Ineffective (1), Somewhat effective (2), Moderately effective (3), Effective (4), Highly effective (5)]

Outputs and scores

ChatGPT

  • Understanding of the prompt: 4 (Good)

ChatGPT demonstrates a good understanding of the prompt by providing a comprehensive list of customer service KPIs.

  • Accuracy of insights and recommendations: 4 (Mostly accurate)

The KPIs mentioned by ChatGPT align well with industry standards and best practices.

  • Effectiveness in guiding KPI setting: 3 (Moderately effective)

While ChatGPT provides a solid list of KPIs, it lacks detailed explanations or guidance on how to prioritise and implement them effectively.

Total score = 11/15

Claude

  • Understanding of the prompt: 4 (Good)

Claude understands the prompt well, delivering a well-structured list of customer service KPIs.

  • Accuracy of insights and recommendations: 4 (Mostly accurate)

The KPIs provided by Claude are mostly accurate and in line with industry expectations.

  • Effectiveness in guiding KPI setting: 4 (Effective)

Claude offers helpful context on how to use the KPIs to gain insights and make data-driven decisions, enhancing its effectiveness in guiding KPI setting.

Total score = 12/15

Gemini

  • Understanding of the prompt: 4 (Good)

Gemini showcases an excellent understanding of the prompt by categorising KPIs and providing detailed explanations.

  • Accuracy of insights and recommendations: 5 (Highly accurate)

The KPIs and insights provided by Gemini are highly accurate and closely align with industry best practices.

  • Effectiveness in guiding KPI setting: 5 (Highly effective)

Gemini's categorised approach and detailed explanations make it highly effective in guiding users to set relevant and actionable KPIs.

Total score = 14/15

Perplexity

  • Understanding of the prompt: 4 (Good)

Perplexity demonstrates a good understanding of the prompt by providing a diverse set of customer service KPIs.

  • Accuracy of insights and recommendations: 4 (Mostly accurate)

The KPIs mentioned by Perplexity are mostly accurate and supported by citations from reputable sources.

  • Effectiveness in guiding KPI setting: 3 (Moderately effective)

While Perplexity offers a solid list of KPIs, it lacks guidance on how to prioritise and implement them effectively in a specific business context.

Total score = 11/15

Copilot

  • Understanding of the prompt: 4 (Good)

Copilot exhibits an excellent understanding of the prompt, providing a mix of common and unique KPIs with practical examples.

  • Accuracy of insights and recommendations: 5 (Highly accurate)

The KPIs and insights provided by Copilot are highly accurate and closely align with industry standards and best practices.

  • Effectiveness in guiding KPI setting: 4 (Effective)

Copilot's use of practical examples and emphasis on consistency across channels make it effective in guiding users to set meaningful KPIs.

Total score = 13/15

Test 1 chart

Test 1 WINNER = Gemini 🏆

Test 2 - Clarifying a KPI definition

Since all AI models listed Net Promoter Score (NPS) as a key metric, we will use this for our second test. 

Prompt: I don’t understand Net Promoter Score (NPS). Explain what this is and why it’s important.

Purpose

Understanding each KPI is crucial for proper implementation and analysis. This prompt tests the AI’s capability to break down complex concepts into understandable terms.

Insights

Consistency in NPS definition and calculation

All AI models provide a clear and consistent definition of NPS. They emphasise that NPS measures customer loyalty and the likelihood of recommending a company's products or services. The models unanimously explain how NPS is calculated by subtracting the percentage of detractors from the percentage of promoters.

All models consistently categorise NPS responses into three groups: Promoters (score 9-10), Passives (score 7-8), and Detractors (score 0-6). The descriptions of each category are similar across the models, highlighting the potential impact of each group on business growth and reputation.

Importance of NPS

The AI models collectively stress the significance of NPS in measuring customer loyalty, identifying brand advocates, and driving business growth through word-of-mouth referrals. They also mention the role of NPS in benchmarking against industry standards or competitors and providing insights for improving the customer experience.

Simplicity and actionability

Gemini and Copilot specifically highlight the simplicity of NPS, as it is based on a single question, making it easy for customers to answer and businesses to track. Claude and Perplexity emphasise the actionability of NPS, as it enables businesses to identify areas for improvement and make data-driven decisions to enhance customer satisfaction.

Additional observations

ChatGPT and Gemini mention the potential range of NPS scores from -100 to +100, providing context for interpreting the results. Perplexity includes citations and references to support its explanations, adding credibility to the information. Copilot uses a mathematical formula to illustrate the NPS calculation, enhancing clarity for users.

Consistency in key takeaways

All models conclude that a higher NPS indicates stronger customer loyalty, while a lower score suggests potential issues that must be addressed. They consistently emphasise the importance of tracking NPS over time and using the insights gained to improve customer experience and business performance.

Evaluation criteria

Accuracy of NPS definition and calculation:

Does the AI model provide an accurate definition of NPS and correctly explain the calculation of NPS by subtracting the percentage of detractors from the percentage of promoters? Are the NPS response categories (Promoters, Passives, Detractors) accurately described? [Inaccurate (1), Partially accurate (2), Mostly accurate (3), Fully accurate (4)]

Clarity and comprehensiveness of NPS explanation:

Is the explanation of NPS clear, concise, and easy to understand? Does the AI model provide a comprehensive overview of NPS, including its purpose and significance for businesses? Are the key concepts and implications of NPS thoroughly explained? [Unclear and incomplete (1), Somewhat clear but lacks depth (2), Clear and moderately comprehensive (3), Exceptionally clear and comprehensive (4)]

Relevance and usefulness of NPS insights:

Does the AI model provide relevant insights into how NPS can be used to drive business success? Are the insights and recommendations practical and actionable for businesses looking to implement NPS as a key performance indicator? Does the model offer valuable information on interpreting and acting upon NPS results? [Irrelevant and impractical (1), Somewhat relevant but limited usefulness (2), Relevant and moderately useful (3), Highly relevant and extremely useful (4)]

Credibility and supportiveness of information

Does the AI model provide credible information about NPS, backed by industry knowledge or relevant citations? Are the claims and recommendations made by the model well-supported and trustworthy? Does the model include additional resources or references to enhance the credibility of its responses? [Lacks credibility and support (1), Somewhat credible but limited support (2), Credible and moderately supported (3), Highly credible and well-supported (4)]

Outputs and scores

ChatGPT

  • Accuracy of NPS definition and calculation: 4 (Fully accurate)

ChatGPT provides an accurate definition of NPS and correctly explains the calculation process, including categorising responses.

  • Clarity and comprehensiveness of NPS explanation: 4 (Exceptionally clear and comprehensive)

The explanation of NPS is clear, concise, and easy to understand. It covers its purpose and significance for businesses comprehensively.

  • Relevance and usefulness of NPS insights: 3 (Relevant and moderately useful)

ChatGPT offers relevant insights into how NPS can drive business success, but the recommendations could be more practical and actionable.

  • Credibility and supportiveness of information: 3 (Credible and moderately supported)

The information provided is credible and aligned with industry knowledge, but there are no direct citations or additional resources to enhance credibility.

Total score = 14/16

Claude

  • Accuracy of NPS definition and calculation: 4 (Fully accurate)

Claude accurately defines NPS, explains the calculation process, and describes the response categories correctly.

  • Clarity and comprehensiveness of NPS explanation: 4 (Exceptionally clear and comprehensive)

The explanation of NPS is clear, concise, and comprehensive, covering its purpose, significance, and key concepts thoroughly.

  • Relevance and usefulness of NPS insights: 4 (Highly relevant and extremely useful)

Claude provides highly relevant insights into how NPS can drive business success, with practical and actionable recommendations for implementing NPS as a KPI.

  • Credibility and supportiveness of information: 3 (Credible and moderately supported)

The information provided is credible and aligns with industry knowledge, but there are no direct citations or additional resources to support the claims.

Total score = 15/16

Gemini

  • Accuracy of NPS definition and calculation: 4 (Fully accurate)

Gemini accurately defines NPS, explains the calculation process, and describes the response categories correctly.

  • Clarity and comprehensiveness of NPS explanation: 3 (Clear and moderately comprehensive)

The explanation of NPS is clear and easy to understand, but it could provide more depth in exploring the key concepts and implications.

  • Relevance and usefulness of NPS insights: 3 (Relevant and moderately useful)

Gemini offers relevant insights into how NPS can be used to understand customer sentiment and identify areas for improvement, but the recommendations could be more actionable.

  • Credibility and supportiveness of information: 3 (Credible and moderately supported)

The information provided is credible and aligns with industry knowledge, but there are no direct citations or additional resources to enhance credibility.

Total score = 13/16

Perplexity

  • Accuracy of NPS definition and calculation: 4 (Fully accurate)

Perplexity accurately defines NPS, explains the calculation process, and describes the response categories correctly.

  • Clarity and comprehensiveness of NPS explanation: 4 (Exceptionally clear and comprehensive)

The explanation of NPS is clear, comprehensive, and easy to understand. It thoroughly covers its purpose, significance, and key concepts.

  • Relevance and usefulness of NPS insights: 4 (Highly relevant and extremely useful)

Perplexity provides highly relevant insights into how NPS can track customer loyalty, benchmark against competitors, and drive business success with practical recommendations.

  • Credibility and supportiveness of information: 4 (Highly credible and well-supported)

The information provided is highly credible, backed by relevant citations and additional resources, enhancing the trustworthiness of the responses.

Total score = 16/16

Copilot

  • Accuracy of NPS definition and calculation: 4 (Fully accurate)

Copilot accurately defines NPS, explains the calculation process, and describes the response categories correctly.

  • Clarity and comprehensiveness of NPS explanation: 3 (Clear and moderately comprehensive)

The explanation of NPS is clear and easy to understand, but it could provide more depth in exploring the key concepts and implications.

  • Relevance and usefulness of NPS insights: 3 (Relevant and moderately useful)

Copilot offers relevant insights into how NPS can be used to gauge customer loyalty and identify areas for improvement, but the recommendations could be more actionable.

  • Credibility and supportiveness of information: 3 (Credible and moderately supported)

The information provided is credible and aligns with industry knowledge, but there are no direct citations or additional resources to enhance credibility.

Total score = 13/16

Test 2 chart

Test 2 WINNER = Perplexity 🏆

Test 3 - Identifying tools for tracking KPIs

Prompt: What tools can I use to track my KPIs effectively against my goals?

Purpose

Tools for tracking KPIs are essential for data collection, analysis, and reporting. The right tools can significantly enhance your ability to monitor performance, identify trends, and make data-driven decisions. This prompt evaluates the AI’s knowledge of the available tools and their functionalities.

Insights

Tools

The AI models recommend different tools for tracking KPIs against goals, such as goal-tracking platforms, business intelligence (BI) tools, spreadsheet software, project management tools, and specialised KPI tracking software. While some models mention common tools like Tableau, Power BI, and spreadsheet software, each model also introduces unique tools, providing a comprehensive overview of available options.

Gemini categorises the tools into basic, intermediate, and advanced levels based on their functionalities and complexity, making it easier for users to select the most appropriate tool. Perplexity groups the tools into categories such as dashboarding tools, spreadsheet software, and dedicated KPI tracking software, focusing on their primary functionalities.

Claude and Copilot emphasise the importance of goal-tracking platforms that are specifically designed for setting, tracking, and managing goals and KPIs across organisations. Gemini and ChatGPT also mention goal-tracking platforms but provide more specific examples like Tability and Weekdone, highlighting their unique features and benefits.

Integration and customisation

All models stress the importance of selecting tools that integrate well with existing data sources and systems to ensure seamless data collection and analysis. Claude and Gemini mention the need for customisation options to adapt the tools to unique business requirements and effectively visualise progress against goals.

Collaboration and communication

Claude, Gemini, and Perplexity highlight the importance of collaboration and communication features in goal-tracking tools to facilitate teamwork, accountability, and sharing of insights. Copilot mentions Hive as a tool that allows collaboration on tasks, chat updates, and meeting notes, emphasising the importance of collaborative features in goal-tracking platforms.

Best practices and considerations

Perplexity provides a list of best practices for effective KPI tracking, such as selecting relevant KPIs tied to business goals, using visual dashboards, regularly reviewing and updating KPIs, fostering a culture of accountability, and balancing simplicity and depth in reporting. Claude and Gemini also mention factors to consider when selecting tools, such as alignment with goal-setting requirements, ease of use, and scalability.

Evaluation criteria

Comprehensiveness:

Does the AI model provide a wide range of tools and platforms, covering various categories such as goal-tracking platforms, BI tools, spreadsheet software, project management tools, and specialised KPI tracking software? Are the recommended tools diverse enough to cater to different business needs and preferences? [Limited (1), Moderate (2), Comprehensive (3)]

Relevance and specificity:

Are the recommended tools and platforms relevant to the task of tracking KPIs against goals? Does the AI model provide specific examples of tools and platforms, mentioning their unique features and benefits? Does the model offer insights into how these tools can be effectively utilised for goal tracking? [Irrelevant (1), Somewhat relevant (2), Highly relevant and specific (3)]

Categorisation and organisation:

Does the AI model categorise or organise the recommended tools and platforms in a logical and easily understandable manner? Are the categories or groupings meaningful and helpful for users to select the most appropriate tool based on their needs? [Poorly organised (1), Moderately organised (2), Well-structured and organised (3)]

Factors and considerations:

Does the AI model discuss important factors and considerations for selecting goal-tracking tools, such as integration with existing systems, customisation options, collaboration features, and ease of use? Are the mentioned factors and considerations relevant and helpful for users to make informed decisions? [No factors mentioned (1), Some factors mentioned (2), Comprehensive factors and considerations (3)]

Best practices and insights:

Does the AI model provide valuable best practices and insights for effective KPI tracking and goal management? Are the best practices and insights practical, actionable, and aligned with industry standards? Do the insights go beyond just recommending tools and offer guidance on how to leverage them effectively? [No best practices or insights (1), Some best practices and insights (2), Comprehensive and valuable best practices and insights (3)]

Outputs and scores

ChatGPT

  • Comprehensiveness: 2 (Moderate)

ChatGPT provides a range of tools, including goal-tracking platforms, BI platforms, dashboarding tools, project management software, and integrated business systems. However, the list is not as extensive as some other models.

  • Relevance and specificity: 2 (Somewhat relevant)

The recommended tools are relevant to tracking KPIs against goals, but ChatGPT does not provide many specific examples or insights into how these tools can be effectively utilised for goal tracking.

  • Categorisation and organisation: 2 (Moderately organised)

ChatGPT categorises the tools into different types, but the categories are not as well-defined or structured as in some other models.

  • Factors and considerations: 2 (Some factors mentioned)

ChatGPT briefly mentions factors like ease of use, customisation options, integration capabilities, and scalability, but does not provide a comprehensive discussion of these factors.

  • Best practices and insights: 1 (No best practices or insights)

ChatGPT does not provide any best practices or insights for effective KPI tracking and goal management.

Total score = 9/15

Claude

  • Comprehensiveness: 3 (Comprehensive)

Claude provides a wide range of tools, including goal-tracking platforms, BI and analytics tools, spreadsheet software, project and task management tools, and specialised KPI tracking software.

  • Relevance and specificity: 3 (Highly relevant and specific)

The recommended tools are highly relevant to tracking KPIs against goals, and Claude provides specific examples of tools and their key features.

  • Categorisation and organisation: 3 (Well-structured and organised)

Claude organises the tools into clear categories, making it easy for users to understand and select the most appropriate tool based on their needs.

  • Factors and considerations: 3 (Comprehensive factors and considerations)

Claude discusses important factors and considerations for selecting goal-tracking tools, such as alignment with goal-setting requirements, integration with existing systems, ease of use, customisation options, and collaboration features.

  • Best practices and insights: 2 (Some best practices and insights)

Claude provides some insights into how goal-tracking platforms can help ensure transparency, accountability, and alignment throughout the organisation, but does not offer a comprehensive set of best practices.

Total score = 14/15

Gemini

  • Comprehensiveness: 3 (Comprehensive)

Gemini provides a diverse range of tools, including basic tools like spreadsheets and project management tools, intermediate tools like BI tools, goal-tracking platforms, and advanced tools like CRM and performance management software.

  • Relevance and specificity: 3 (Highly relevant and specific)

The recommended tools are highly relevant to tracking KPIs against goals, and Gemini provides specific examples of tools and their functionalities.

  • Categorisation and organisation: 3 (Well-structured and organised)

Gemini categorises the tools into basic, intermediate, and advanced levels, making it easy for users to select the most appropriate tool based on their needs and capabilities.

  • Factors and considerations: 3 (Comprehensive factors and considerations)

Gemini discusses important factors and considerations for selecting goal-tracking tools, such as data integration, customisation, and collaboration.

  • Best practices and insights: 1 (No best practices or insights)

Gemini does not provide any best practices or insights for effective KPI tracking and goal management.

Total score = 13/15

Perplexity

  • Comprehensiveness: 2 (Moderate)

Perplexity provides a range of tools, including dashboarding tools, spreadsheet software, and dedicated KPI tracking software. However, the list is not as extensive as some other models.

  • Relevance and specificity: 3 (Highly relevant and specific)

The recommended tools are highly relevant to tracking KPIs against goals, and Perplexity provides specific examples of tools and their functionalities.

  • Categorisation and Organisation: 2 (Moderately organised)

Perplexity categorises the tools into different types, but the categories are not as well-defined or structured as in some other models.

  • Factors and considerations: 2 (Some factors mentioned)

Perplexity briefly mentions factors like ease of use, integration with existing systems, collaboration features, customisation options, and pricing model, but does not provide a comprehensive discussion of these factors.

  • Best practices and insights: 3 (Comprehensive and valuable best practices and insights)

Perplexity provides a comprehensive list of best practices for effective KPI tracking, including selecting relevant KPIs tied to business goals, using visual dashboards, regularly reviewing and updating KPIs, fostering a culture of accountability, and balancing simplicity and depth in reporting.

Total score = 12/15

Copilot

  • Comprehensiveness: 2 (Moderate)

Copilot provides a range of goal-tracking platforms and tools, but the list is primarily focused on productivity and habit-building apps rather than comprehensive business tools.

  • Relevance and specificity: 2 (Somewhat relevant)

While the recommended tools are related to goal tracking, they are not as specifically relevant to tracking KPIs in a business context.

  • Categorisation and organisation: 1 (Poorly organised)

Copilot does not provide any clear categorisation or organisation of the recommended tools.

  • Factors and considerations: 1 (No factors mentioned)

Copilot does not discuss any factors or considerations for selecting goal-tracking tools.

  • Best practices and insights: 1 (No best practices or insights)

Copilot does not provide any best practices or insights for effective KPI tracking and goal management.

Total score = 7/15

Test 3 chart

Test 3 WINNER = Claude 🏆

Test 4 - KPI benchmarks and targets

Prompt: Provide a list of benchmarks and realistic targets for each of these KPIs. Include sources/citations.

Purpose

Setting benchmarks and realistic targets is crucial for performance management and goal setting. It helps in evaluating where your business stands in comparison to industry standards and in setting achievable objectives. This prompt assesses the AI’s ability to provide not only the benchmarks and targets but also credible sources for this information. It tests the AI's capability to deliver well-researched, accurate, and actionable data.

Insights

Benchmarks and targets

The AI models offer various benchmarks and realistic targets for each KPI, providing a general guideline for businesses to evaluate their performance. However, the specific benchmarks and targets may vary slightly between models, possibly due to differences in data sources and industry focus. All models stress the importance of tailoring benchmarks and targets to specific industries, company sizes, and business goals.

Sources and citations

ChatGPT, Claude, and Copilot provide specific sources for their benchmarks, such as Zendesk Benchmark Reports, HubSpot Research, and industry-specific studies. Perplexity includes citations but does not mention the specific sources directly in the response. Gemini mentions sources but provides invalid URLs, making it difficult to verify the information.

Level of detail

Gemini and Claude provide a comprehensive list of KPIs along with benchmarks, targets, and sources, giving a well-rounded overview. Gemini takes it a step further by categorising the KPIs into resolution rates, response times, customer effort, efficiency and volume, and loyalty and advocacy, offering a structured approach to understanding the metrics. Perplexity and Copilot offer a more concise list of KPIs with benchmarks and targets, focusing on the essential metrics.

Additional insights

Gemini emphasises the importance of regularly monitoring KPIs, analysing trends, and adjusting goals as needed for continuous improvement. Copilot highlights the need to optimise costs while maintaining quality when setting targets for the cost per conversation. Perplexity stresses the importance of setting realistic yet challenging targets based on industry averages and benchmarks of top performers.

Formatting and presentation

ChatGPT, Claude, and Gemini use bullet points and headings to organise the information, making it easier to read and understand. Perplexity presents the information in a numbered list format, which is clear but lacks visual separation between KPIs. Copilot uses a combination of bullet points and bold text to highlight key information.

Evaluation criteria

Comprehensiveness:

Does the AI model cover a wide range of relevant customer service KPIs? Are the KPIs well-defined and explained clearly? [Limited (1), Moderate (2), Comprehensive (3)]

Benchmark and target quality:

Are the provided benchmarks and targets realistic and aligned with industry standards? Does the AI model offer a clear distinction between benchmarks and realistic targets? Are the benchmarks and targets specific and measurable? [Poor (1), Average (2), High (3)]

Sources and credibility:

Does the AI model provide credible sources to support the benchmarks and targets? Are the sources relevant and authoritative in the customer service industry? [No sources (1), Some sources but lacking credibility (2), Credible and relevant sources (3)]

Clarity and organisation:

Is the information presented in a clear, concise, and well-organised manner? Does the AI model use formatting techniques (e.g., bullet points, headings) to enhance readability? [Unclear and disorganised (1), Somewhat clear and organised (2), Very clear and well-organised (3)]

Actionable insights:

Does the AI model offer actionable insights or recommendations beyond just providing benchmarks and targets? Are the insights valuable for businesses looking to improve their customer service performance? [No actionable insights (1), Some actionable insights (2), Highly actionable and valuable insights (3)]

Outputs and scores

ChatGPT

  • Comprehensiveness: 3 (Comprehensive)

ChatGPT covers a wide range of relevant customer service KPIs, including CSAT, NPS, CES, FRT, ART, FCR, SLA compliance, customer retention rate, churn rate, and CLV. The KPIs are well-defined and explained clearly.

  • Benchmark and target quality: 3 (High)

The provided benchmarks and targets are realistic and aligned with industry standards. ChatGPT offers a clear distinction between benchmarks and realistic targets for each KPI. The benchmarks and targets are specific and measurable.

  • Sources and credibility: 3 (Credible and relevant sources)

ChatGPT provides credible sources to support the benchmarks and targets, such as Zendesk Benchmark Reports, Net Promoter Network, and Harvard Business Review. The sources are relevant and authoritative in the customer service industry.

  • Clarity and organisation: 3 (Very clear and well-organised)

The information is presented in a clear, concise, and well-organised manner. ChatGPT uses formatting techniques like bullet points and bold text to enhance readability.

  • Actionable insights: 2 (Some actionable insights)

While ChatGPT provides some insights on improving customer experiences and loyalty, the actionable recommendations are limited.

Total score = 14/15

Claude

  • Comprehensiveness: 3 (Comprehensive)

Claude covers a wide range of relevant customer service KPIs, including FRT, ART, CSAT, NPS, ticket volume, ticket backlog, FCR, customer retention rate, churn rate, and employee engagement. The KPIs are well-defined and explained clearly.

  • Benchmark and target quality: 3 (High)

The provided benchmarks and targets are realistic and aligned with industry standards. Claude offers a clear distinction between benchmarks and realistic targets for each KPI. The benchmarks and targets are specific and measurable.

  • Sources and credibility: 3 (Credible and relevant sources)

Claude provides credible sources to support the benchmarks and targets, such as HubSpot Research, Zendesk Benchmark, and Gallup State of the Global Workplace. The sources are relevant and authoritative in the customer service industry.

  • Clarity and organisation: 3 (Very clear and well-organised)

The information is presented in a clear, concise, and well-organised manner. Claude uses formatting techniques like bullet points and numbering to enhance readability.

  • Actionable insights: 2 (Some actionable insights)

Claude provides some guidance on tailoring the benchmarks and targets to specific business contexts but lacks in-depth actionable insights.

Total score = 14/15

Gemini

  • Comprehensiveness: 3 (Comprehensive)

Gemini covers a wide range of relevant customer service KPIs, categorised into resolution rates, response times, customer effort, efficiency and volume, and loyalty and advocacy. The KPIs are well-defined and explained clearly.

  • Benchmark and target quality: 2 (Average)

While Gemini provides benchmarks and targets for most KPIs, some benchmarks are missing or not clearly defined. The targets are mostly specific and measurable, but some lack clarity.

  • Sources and credibility: 1 (No sources)

Gemini mentions sources but provides invalid URLs, making it difficult to verify the credibility of the information.

  • Clarity and organisation: 3 (Very clear and well-organised)

The information is presented in a clear, concise, and well-organised manner. Gemini uses formatting techniques like headings, bullet points, and bold text to enhance readability.

  • Actionable insights: 3 (Highly actionable and valuable insights)

Gemini provides highly actionable insights and recommendations, such as analysing trends, identifying areas for improvement, and setting achievable targets. The insights are valuable for businesses looking to improve their customer service performance.

Total score = 12/15

Perplexity

  • Comprehensiveness: 3 (Comprehensive)

Perplexity covers a wide range of relevant customer service KPIs, including first response time, average resolution time, FCR, CSAT, NPS, call abandonment rate, agent utilisation rate, cost per resolution, knowledge base views, and churn rate. The KPIs are well-defined and explained clearly.

  • Benchmark and target quality: 3 (High)

The provided benchmarks and targets are realistic and aligned with industry standards. Perplexity offers a clear distinction between benchmarks and realistic targets for each KPI. The benchmarks and targets are specific and measurable.

  • Sources and credibility: 2 (Some sources but lacking credibility)

Perplexity includes citations but does not mention the specific sources directly in the response, making it difficult to assess their credibility.

  • Clarity and organisation: 2 (Somewhat clear and organised)

The information is presented in a numbered list format, which is clear but lacks visual separation between KPIs. The formatting could be improved to enhance readability.

  • Actionable insights: 2 (Some actionable insights)

Perplexity provides some guidance on adjusting targets based on specific business contexts but lacks in-depth actionable insights.

Total score = 12/15

Copilot

  • Comprehensiveness: 2 (Moderate)

Copilot covers a range of customer service KPIs but misses some important ones like customer retention rate and churn rate. The KPIs are well-defined and explained clearly.

  • Benchmark and target quality: 3 (High)

The provided benchmarks and targets are realistic and aligned with industry standards. Copilot offers a clear distinction between benchmarks and realistic targets for each KPI. The benchmarks and targets are specific and measurable.

  • Sources and credibility: 3 (Credible and relevant sources)

Copilot provides credible sources to support the benchmarks and targets, such as Zendesk Benchmark Report, Freshdesk Benchmark Report, and HubSpot Customer Service Benchmark. The sources are relevant and authoritative in the customer service industry.

  • Clarity and organisation: 3 (Very clear and well-organised)

The information is presented in a clear, concise, and well-organised manner. Copilot uses formatting techniques like bullet points and bold text to enhance readability.

  • Actionable insights: 2 (Some actionable insights)

Copilot provides some guidance on continuously optimising costs while maintaining quality but lacks in-depth actionable insights.

Total score = 13/15

Test 4 chart

Test 4 WINNERS = ChatGPT & Claude 🏆

Final results

High-level takeaways

ChatGPT

ChatGPT did a great job giving thorough and accurate info, backed up with credible sources and clear organisation. It explained NPS well. To improve, it could offer more actionable insights and practical tips for putting KPIs and goal-tracking tools into practice.

Claude

Claude consistently aced all the tests, scoring high in comprehensiveness, accuracy, relevance, and clarity. It excelled in giving specific examples and actionable recommendations. To step it up, it could dive deeper into best practices and insights for effective KPI tracking and goal management.

Gemini

Gemini shone in organising and structuring information, making it user-friendly and practical. It offered actionable insights and detailed explanations. However, it could improve by including credible sources to back up the information and ensuring consistency in benchmark and target quality.

Perplexity

Perplexity did exceptionally well in explaining NPS, offering relevant insights, and citing credible sources. It also provided comprehensive best practices for effective KPI tracking. To enhance its performance, it could improve the clarity and organisation of the information presented and offer more specific recommendations for goal-tracking tools.

Copilot

Copilot did well in providing accurate and relevant information, backed by credible sources. It excelled in offering clear and well-organised responses. However, it lacked comprehensiveness in covering all relevant KPIs and goal-tracking tools, and it provided limited actionable insights and best practices.

Overall results chart

Claude emerged as the top performer, consistently providing comprehensive, accurate, and actionable information across all tests. ChatGPT followed closely, with strong performance in most areas. Gemini and Perplexity had mixed results, excelling in some aspects but lacking in others. Copilot, while providing clear and accurate information, had the lowest overall score due to limitations in comprehensiveness and actionable insights.

The wrap

From our experiment, we observed significant variability in the performance of the AI models we tested. While Claude emerged as the top performer overall, each tool provided valuable insights and was helpful in its own way.

It's important to remember that the effectiveness of an AI's output largely depends on the specificity, clarity, and context of the prompts provided (input). This underscores the importance of carefully crafting inputs to align with the desired outcomes. When using AI to help establish KPIs, it is crucial to use well-defined and contextually appropriate prompts to ensure the AI delivers relevant and actionable insights. By doing so, you can fully leverage AI to establish meaningful, data-driven KPIs that accurately reflect business goals and drive performance improvements.

Generate OKRs and SMART goals with AI

Say hello to a smarter way to set and track your goals. With Tability’s AI-powered goals generator, you can create meaningful goals that align with your business objectives in seconds. This feature is available inside Tability, or you can try out our free tool. Start achieving your goals with confidence 👉 try Tability today for free.

Enjoy this post?

You might also like:

🎯 SMART marketing goals: 50 ChatGPT prompts to try now

🎯 How to leverage generative AI to set awesome OKRs

🎯 100+ examples of KPIs and success metrics for every business and function

Author photo

Jeremy Yancey

Head of Content, Tability

Share this post
Weekly insights for outcome-driven teams
Subscribe to our newsletter to get actionable insights in your inbox.
Related articles
More articles →