OpenSearch Index State Management (ISM) Guide
1. Overview
This document outlines the standard procedure for managing index lifecycles within the OpenSearch cluster. The objective is to automate data retention by backing up indices to S3 and deleting them once they exceed a 90-day threshold.2. Prerequisites
Before implementing the policy, ensure the S3 Snapshot Repository is correctly registered.Verify Snapshot Repository
Run this command to ensure the cluster can communicate with the S3 bucket:- Target Bucket:
opensearch-index-backup-org-name-co-in - Region:
ap-south-1 - Role ARN:
arn:aws:iam::372360814385:role/PartnerOpenSearch-Role
3. Implementation Steps
Step 1: Create/Update the ISM Policy
This policy defines the three states: Hot (Active), Snapshot (Backup), and Delete (Cleanup).Critical Production Rule: Snapshot names must be lowercase. Using uppercase letters (e.g.,90DEL) will trigger anillegal_argument_exceptionand fail the process.
Step 2: Apply Policy to Existing Indices
For indices already present in the cluster, the policy must be attached manually to begin management.4. Verification & Status Commands (The “Get” Commands)
Use these commands to verify the setup is working as expected.A. Check if Policy is Attached correctly
Verify that indices are “Enabled” and linked to the correctpolicy_id.
B. Verify Snapshot Creation in S3
Check the repository to see the actual files being created. Look for theautosnap-90del- prefix.
C. Check Current Active Snapshot
If a backup is currently running, this will show the percentage completion.5. Debugging & Troubleshooting
Problem: Index is stuck in “Hot” even if > 90 days
Solution: The ISM runner may need a “kickstart” or the index may have failed metadata.Problem: “Index has no metadata information”
Solution: This happens if the index was removed from ISM but not re-added. Run theadd command from Step 2 again.
Problem: “Version Conflict” when updating policy
Solution: OpenSearch requires you to delete the old policy first if you are making major changes.Problem: Snapshot failed (Lowercase Error)
Solution: Check theexplain output for a cause field. Ensure the snapshot name in the policy is strictly lowercase.
6. Maintenance Summary Table
| Task | Command |
|---|---|
| Check Health | GET _plugins/_ism/explain/<index_name> |
| Force Retry | POST _plugins/_ism/retry/<index_pattern> |
| Swap Policy | POST _plugins/_ism/change_policy/<index_pattern> { "policy_id": "..." } |
| Stop ISM | POST _plugins/_ism/remove/<index_pattern> |
