Cisco Speeds Up Generative AI with Amazon SageMaker Inference

This post is co-authored with Travis Mehlinger and Karthik Raghunathan from Cisco.

Webex by Cisco is a leading provider of cloud-based collaboration solutions, including video meetings, calling, messaging, events, polling, asynchronous video, and customer experience solutions like contact center and purpose-built collaboration devices. Webex’s focus on delivering inclusive collaboration experiences fuels their innovation, which uses artificial intelligence (AI) and machine learning (ML), to remove the barriers of geography, language, personality, and familiarity with technology. Its solutions are underpinned with security and privacy by design. Webex works with the world’s leading business and productivity apps—including AWS.

Cisco’s Webex AI (WxAI) team plays a crucial role in enhancing these products with AI-driven features and functionalities, using large language models (LLMs) to improve user productivity and experiences. In the past year, the team has increasingly focused on building AI capabilities powered by LLMs to improve productivity and experience for users. Notably, the team’s work extends to Webex Contact Center, a cloud-based omni-channel contact center solution that empowers organizations to deliver exceptional customer experiences. By integrating LLMs, the WxAI team enables advanced capabilities such as intelligent virtual assistants, natural language processing (NLP), and sentiment analysis, allowing Webex Contact Center to provide more personalized and efficient customer support. However, as these LLM models grew to contain hundreds of gigabytes of data, the WxAI team faced challenges in efficiently allocating resources and starting applications with the embedded models. To optimize its AI/ML infrastructure, Cisco migrated its LLMs to Amazon SageMaker Inference, improving speed, scalability, and price-performance.

Enhancing collaboration and customer engagement with generative AI: Webex’s AI-powered solutions

In this section, we discuss Cisco’s AI-powered use cases.

Meeting summaries and insights

For Webex Meetings, the platform uses generative AI to automatically summarize meeting recordings and transcripts. This extracts the key takeaways and action items, helping distributed teams stay informed even if they missed a live session. The AI-generated summaries provide a concise overview of important discussions and decisions, allowing employees to quickly get up to speed. Beyond summaries, Webex’s generative AI capabilities also surface intelligent insights from meeting content. This includes identifying action items, highlighting critical decisions, and generating personalized meeting notes and to-do lists for each participant. These insights help make meetings more productive and hold attendees accountable.

Enhancing contact center experiences

Webex is also applying generative AI to its contact center solutions, enabling more natural, human-like conversations between customers and agents. The AI can generate contextual, empathetic responses to customer inquiries, as well as automatically draft personalized emails and chat messages. This helps contact center agents work more efficiently while maintaining a high level of customer service.

Webex customers realize positive outcomes with generative AI

Webex’s adoption of generative AI is driving tangible benefits for customers. Clients using the platform’s AI-powered meeting summaries and insights have reported productivity gains. Webex customers using the platform’s generative AI for contact centers have handled hundreds of thousands of calls with improved customer satisfaction and reduced handle times, enabling more natural, empathetic conversations between agents and clients. Webex’s strategic integration of generative AI is empowering users to work smarter and deliver exceptional experiences.

For more details on how Webex is harnessing generative AI to enhance collaboration and customer engagement, see Webex | Exceptional Experiences for Every Interaction on the Webex blog.

Using SageMaker Inference to optimize resources for Cisco

Cisco’s WxAI team is dedicated to delivering advanced collaboration experiences powered by cutting-edge ML. The team develops a comprehensive suite of AI and ML features for the Webex ecosystem, including audio intelligence capabilities like noise removal and optimizing speaker voices, language intelligence for transcription and translation, and video intelligence features like virtual backgrounds. At the forefront of WxAI’s innovations is the AI-powered Webex Assistant, a virtual assistant that provides voice-activated control and seamless meeting support in multiple languages. To build these sophisticated capabilities, WxAI uses LLMs, which can contain up to hundreds of gigabytes of training data.

Initially, WxAI embedded LLM models directly into the application container images running on Amazon Elastic Kubernetes Service (Amazon EKS). However, as the models grew larger and more complex, this approach faced significant scalability and resource utilization challenges. Operating the resource-intensive LLMs through the applications required provisioning substantial compute resources, which slowed down processes like allocating resources and starting applications. This inefficiency hampered WxAI’s ability to rapidly develop, test, and deploy new AI-powered features for the Webex portfolio. To address these challenges, the WxAI team turned to SageMaker Inference—a fully managed AI inference service that allows seamless deployment and scaling of models independently from the applications that use them. By decoupling the LLM hosting from the Webex applications, WxAI could provision the necessary compute resources for the models without impacting the core collaboration and communication capabilities.

“The applications and the models work and scale fundamentally differently, with entirely different cost considerations; by separating them rather than lumping them together, it’s much simpler to solve issues independently.”

– Travis Mehlinger, Principal Engineer at Cisco

This architectural shift has enabled Webex to harness the power of generative AI across its suite of collaboration and customer engagement solutions.

Solution overview: Improving efficiency and reducing costs by migrating to SageMaker Inference

To address the scalability and resource utilization challenges faced with embedding LLMs directly into their applications, the WxAI team migrated to SageMaker Inference. By taking advantage of this fully managed service for deploying LLMs, Cisco unlocked significant performance and cost-optimization opportunities. Key benefits include the ability to deploy multiple LLMs behind a single endpoint for faster scaling and improved response latencies, as well as cost savings. Additionally, the WxAI team implemented an LLM proxy to simplify access to LLMs for Webex teams, enable centralized data collection, and reduce operational overhead. With SageMaker Inference, Cisco can efficiently manage and scale their LLM deployments, harnessing the power of generative AI across the Webex portfolio while maintaining optimal performance, scalability, and cost-effectiveness.

Conclusion

By using AWS services like SageMaker Inference and Amazon Bedrock for generative AI, Cisco’s WxAI team has been able to optimize their AI/ML infrastructure, enabling them to build and deploy AI-powered features more efficiently, reliably, and cost-effectively. This strategic approach has unlocked significant benefits for Cisco in deploying and scaling its generative AI capabilities for the Webex platform. Cisco’s own journey with generative AI, as showcased in this post, offers valuable lessons and insights for other uses of SageMaker Inference.

Recognizing the impact of generative AI, Cisco has played a crucial role in shaping the future of these capabilities within SageMaker Inference. By providing valuable insights and hands-on collaboration, Cisco has helped AWS develop a range of powerful features that are making generative AI more accessible and scalable for organizations. From optimizing infrastructure costs and performance to streamlining model deployment and scaling, Cisco’s contributions have been instrumental in enhancing the SageMaker Inference service.

Refer to the AWS Case Study: Accelerating LLMs Using Amazon SageMaker with Cisco for a comprehensive overview of Cisco’s generative AI journey on AWS, the challenges they faced, the solutions they implemented, and the strategic impact of their collaboration with the SageMaker Inference team.

Moving forward, the Cisco-AWS partnership aims to drive further advancements in areas like conversational and generative AI inference. As generative AI adoption accelerates across industries, Cisco’s Webex platform is designed to scale and streamline user experiences through various use cases discussed in this post and beyond. You can expect to see ongoing innovation from this collaboration in SageMaker Inference capabilities, as Cisco and SageMaker Inference continue to push the boundaries of what’s possible in the world of AI.

For more information on Webex Contact Center’s Topic Analytics feature and related AI capabilities, refer to The Webex Advantage: Navigating Customer Experience in the Age of AI on the Webex blog.

About the Authors

Travis Mehlinger is a Principal Software Engineer in the Webex Collaboration AI group, where he helps teams develop and operate cloud-centered AI and ML capabilities to support Webex AI features for customers around the world. In his spare time, Travis enjoys cooking barbecue, playing video games, and traveling around the US and UK to race go-karts.

Karthik Raghunathan is the Senior Director for Speech, Language, and Video AI in the Webex Collaboration AI Group. He leads a multidisciplinary team of software engineers, machine learning engineers, data scientists, computational linguists, and designers who develop advanced AI-driven features for the Webex collaboration portfolio. Prior to Cisco, Karthik held research positions at MindMeld (acquired by Cisco), Microsoft, and Stanford University.

Introducing AI for customer service

Top Stories

Threads Introduces Custom Feeds, Media Tab on Profiles

Deutsche Telekom expands global IoT operations

Get Wired: Adding a Network Without Wi-Fi or Ethernet Cable

Cisco Speeds Up Generative AI with Amazon SageMaker Inference

This post is co-authored with Travis Mehlinger and Karthik Raghunathan from Cisco.

Enhancing collaboration and customer engagement with generative AI: Webex’s AI-powered solutions

Enhancing contact center experiences

Webex customers realize positive outcomes with generative AI

Using SageMaker Inference to optimize resources for Cisco

Solution overview: Improving efficiency and reducing costs by migrating to SageMaker Inference

Conclusion

About the Authors

Leave a Reply Cancel reply

Related Strories

Google’s 7 Principles for Ethical AI | Rudy Martin | Sep 2024

Supporting LGBTQ+ In AI Research

Generation Z: The Next Wave of Youth

Simplifying Large Language Model Tuning with LoRA | by Shunya Vichaar | Sep, 2024

Quick Links

Follow Socials

Introducing AI for customer service

Top Stories

Threads Introduces Custom Feeds, Media Tab on Profiles

Deutsche Telekom expands global IoT operations

Get Wired: Adding a Network Without Wi-Fi or Ethernet Cable

Cisco Speeds Up Generative AI with Amazon SageMaker Inference

This post is co-authored with Travis Mehlinger and Karthik Raghunathan from Cisco.

Enhancing collaboration and customer engagement with generative AI: Webex’s AI-powered solutions

Enhancing contact center experiences

Webex customers realize positive outcomes with generative AI

Using SageMaker Inference to optimize resources for Cisco

Solution overview: Improving efficiency and reducing costs by migrating to SageMaker Inference

Conclusion

About the Authors

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Google’s 7 Principles for Ethical AI | Rudy Martin | Sep 2024

Supporting LGBTQ+ In AI Research

Generation Z: The Next Wave of Youth

Simplifying Large Language Model Tuning with LoRA | by Shunya Vichaar | Sep, 2024

Get Insider Tips and Tricks in Our Newsletter!