Enhanced Visibility with Amazon Kendra Document-Level Reporting
Amazon Kendra is revolutionizing enterprise search with its intelligent search service powered by machine learning. By aggregating content from various repositories into a centralized index, Amazon Kendra enables you to quickly search and find accurate answers within your enterprise data.
With over 40 data sources securely connected, Amazon Kendra enhances visibility into document processing lifecycle during data source sync jobs. The latest release introduces a comprehensive document-level report integrated into the sync history, offering granular indexing status, metadata, and ACL details for every processed document. This new feature empowers administrators to investigate and resolve ingestion and access issues efficiently.
Lifecycle of a Document in a Data Source Sync Run Job
Understanding the lifecycle of a document in a data source sync in Amazon Kendra is crucial to gaining valuable insights into the sync process. The sync comprises crawling, syncing, and indexing stages, where documents are extracted, synced, and made searchable within the Amazon Kendra environment.
Crawling Stage
During the crawling stage, documents are extracted from the data source, and their metadata is captured. Documents are then compared against the index to determine if they need to be added, modified, or deleted. The document-level report includes details on document processing status, error messages, ACLs, and metadata for each document.
Syncing Stage
In the syncing stage, documents are sent to Amazon Kendra ingestion service APIs for processing. Validation checks are performed, and documents are marked as successful, failed, or skipped based on their sync status.
Indexing Stage
During indexing, documents are parsed, processed, and persisted in the index. Success and failure statuses are captured for each document, and details are emitted as Amazon CloudWatch events for real-time visibility.
Key Features and Benefits of Document-Level Reports
- Enhanced Sync Run History Page – A new Actions column provides access to the document-level report for each sync run.
- Dedicated Log Stream – A log stream named
SYNC_RUN_HISTORY_REPORT
contains detailed document reports. - Comprehensive Document Information – Reports include document ID, title, status, error messages, ACLs, metadata, hashed document ID, and timestamp for thorough troubleshooting.
Conclusion
The document-level report in Amazon Kendra enhances visibility and observability into the document processing lifecycle, addressing critical needs for better troubleshooting capabilities. It offers detailed information on document status, metadata, and ACLs, enabling administrators to efficiently manage and troubleshoot syncing issues.
To get started with Amazon Kendra and explore its features, check out the Getting Started guide and best practices for creating data source connectors.
About the Authors
Aneesh Mohan is a Senior Solutions Architect at Amazon Web Services (AWS) with expertise in architecting solutions for mission-critical workloads. He is dedicated to designing innovative solutions that meet customers’ unique needs.
Ashwin Shukla is a Software Development Engineer II at Amazon, focusing on developing enterprise software solutions. He plays a key role in designing foundational features for Amazon Q for Business.