Enhanced data sync visibility with Amazon Kendra’s document-level sync reports

SeniorTechInfo
4 Min Read

Enhanced Visibility with Amazon Kendra Document-Level Reporting

Amazon Kendra is revolutionizing enterprise search with its intelligent search service powered by machine learning. By aggregating content from various repositories into a centralized index, Amazon Kendra enables you to quickly search and find accurate answers within your enterprise data.

With over 40 data sources securely connected, Amazon Kendra enhances visibility into document processing lifecycle during data source sync jobs. The latest release introduces a comprehensive document-level report integrated into the sync history, offering granular indexing status, metadata, and ACL details for every processed document. This new feature empowers administrators to investigate and resolve ingestion and access issues efficiently.

Lifecycle of a Document in a Data Source Sync Run Job

Understanding the lifecycle of a document in a data source sync in Amazon Kendra is crucial to gaining valuable insights into the sync process. The sync comprises crawling, syncing, and indexing stages, where documents are extracted, synced, and made searchable within the Amazon Kendra environment.

Crawling Stage

During the crawling stage, documents are extracted from the data source, and their metadata is captured. Documents are then compared against the index to determine if they need to be added, modified, or deleted. The document-level report includes details on document processing status, error messages, ACLs, and metadata for each document.

Syncing Stage

In the syncing stage, documents are sent to Amazon Kendra ingestion service APIs for processing. Validation checks are performed, and documents are marked as successful, failed, or skipped based on their sync status.

Indexing Stage

During indexing, documents are parsed, processed, and persisted in the index. Success and failure statuses are captured for each document, and details are emitted as Amazon CloudWatch events for real-time visibility.

Key Features and Benefits of Document-Level Reports

  • Enhanced Sync Run History Page – A new Actions column provides access to the document-level report for each sync run.
  • Dedicated Log Stream – A log stream named SYNC_RUN_HISTORY_REPORT contains detailed document reports.
  • Comprehensive Document Information – Reports include document ID, title, status, error messages, ACLs, metadata, hashed document ID, and timestamp for thorough troubleshooting.

Conclusion

The document-level report in Amazon Kendra enhances visibility and observability into the document processing lifecycle, addressing critical needs for better troubleshooting capabilities. It offers detailed information on document status, metadata, and ACLs, enabling administrators to efficiently manage and troubleshoot syncing issues.

To get started with Amazon Kendra and explore its features, check out the Getting Started guide and best practices for creating data source connectors.


About the Authors

Aneesh Mohan is a Senior Solutions Architect at Amazon Web Services (AWS) with expertise in architecting solutions for mission-critical workloads. He is dedicated to designing innovative solutions that meet customers’ unique needs.

Ashwin Shukla is a Software Development Engineer II at Amazon, focusing on developing enterprise software solutions. He plays a key role in designing foundational features for Amazon Q for Business.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *