I'm working on a project that involves auto scaling groups, where instances get created and terminated frequently. To monitor these instances, I use a Lambda function triggered by EventBridge upon instance creation. The function gathers all instance information and retrieves their tags to get the instance names for alarm creation.
I've set up a fallback to use the instance ID for the alarm name if the instance name isn't available, but this shouldn't happen since there's a part in the instance's user data that sets its name. Despite this, I still encounter a few alarms using the instance IDs instead of their respective names. What can I do to ensure this race issue is resolved?
1 Answer
It sounds like your UserData script runs after the instance is up, which can create a race condition. Sometimes, the Lambda may trigger before the tags are fully set. You might want to modify your Lambda function to wait a couple of seconds if it doesn’t find a name and then retry fetching the tags. After a few attempts without success, it can default to using the instance ID for the alarm name.
Actually, I added a 60-second wait if the name tag is missing, but I still see instance IDs in about 10% of the alarms. Do you know if there's a way to delay the start of the entire script to prevent this?