Maybe you’re trying to work with AWS but AWS does not seem to be working with you. The content of your requests looks good on your end. Maybe AWS is even a 200, letting you know everything is ok. BUT something is still not working… If only you could see into AWS’ servers to see what was going on and figure out why things aren’t working as you envision.
If this is you, then you need AWS CloudTrail.
Why CloudTrail?
AWS is essentially a collection of APIs. Calls to these APIs are recorded in CloudTrail.
To grasp the significance of CloudTrail, it’s crucial to understand that AWS is fundamentally a set of APIs. AWS pioneered this approach, with their API-first culture dating back to at least 2002 when Jeff Bezos issued his legendary API Mandate memo.
When you interact with AWS, your computer makes API calls to AWS endpoints. If you send the correct data, AWS performs the requested actions internally (e.g., launching an Amazon EC2 instance, and creating an Amazon S3 bucket). This follows a basic client-server architecture.
What Is Logged in CloudTrail?
Most of these API calls are categorized as “Management Events” and are logged in CloudTrail. This lets you see if AWS is receiving the API call and what it thinks about it. Sometimes API callers don’t have access and other times the data they’re sending isn’t quite right. CloudTrail will tell you that. CloudTrail also records calls from your AWS resources to the AWS control plane. In addition to seeing requests from humans, you can see requests from services like AutoScaling group describing instance health.
There’s another, smaller category of events called Data Events. These include actions like PutObject calls to S3 buckets. By default, these are not logged to CloudTrail.
How Can You Use CloudTrail to Troubleshoot AWS Issues?
How Can You Use Cloudtrail to Troubleshoot AWS Issues?
When tasked with AWS Identity and Access Management (IAM) troubleshooting, I typically follow this order:
Check the CloudTrail Event history
Examine CloudWatch logs
If necessary, use Athena
Let’s explore why CloudTrail is an excellent tool for monitoring AWS activity and how we can leverage these three tools to work with CloudTrail.
CloudTrail Events History
The CloudTrail console is the easiest place to view CloudTrail events, BUT its search capabilities are limited.
This is the easiest way to access CloudTrail logs. If you know the username or have one of the other “Lookup attributes” (and that attribute is unique), you can find the API call, as long as it’s within the last 90 days.
The drawback of viewing events here is that the search functionality is restricted to just a few attributes, and you can’t combine them. So complicated searches are out of the question.
Example
To illustrate the CloudTrail Events history and CloudWatch approaches, let’s walk through a practical example. I created a user account and attempted to perform an action:
Then, I tried to list S3 buckets (my user has no permissions, so almost any operation would have resulted in an error):
aws s3 ls --profile testUser
An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied
In this example, you’ll see I’m making a ListBuckets request to the S3 API. Each of these API actions can be authorized or denied by IAM. IAM policies can allow an action like this over specific resources, for particular principles, and with certain conditions. The S3 list is here for reference.
This interaction would be logged in CloudTrail, allowing you to see the API call, the user who made it, and the resulting error. When troubleshooting, you could search for this event in the CloudTrail Event History using the user ARN or the error code as a lookup attribute.
Amazon CloudWatch
CloudTrail can stream to CloudWatch Logs, which are searchable and ideal for viewing recent or real-time events.
In most cases, we create a CloudWatch Log Group named cloudtrail2cwl that receives CloudTrail logs but only stores them for a week. You can check which log group (if any) receives CloudTrail events from the CloudTrail console.
Live Tail
Before running the aws s3 ls operation above, I started a Live Tail of the cloudtrail2cwl Log Group. To get calls my user made, I added the filter pattern:
Then I waited a minute or two which was all I needed to see my API call in CloudTrail:
After waiting for a minute or two, I could see my API call in CloudTrail, revealing that I didn’t have the necessary permissions.
Log Insights
But let’s say I wanted to see these calls after the fact. To get historical events I can search this log group in CloudWatch Log Insights. I’m able to query the events with a similar query to find events for a specific user:
This query revealed two instances where access was denied within the search period.
If you’re unsure about which fields to use, the “Discovered fields” on the right side of the screen can help. The field names are descriptive, but you may need to combine this information with a few example log events or the CloudTrail log event reference.
The query syntax offers several operations but for most purposes, fields and filter will suffice. fields lets you select the fields you want to view, and filter lets you specify which results you want, similar to a SQL WHERE condition.
Amazon Athena
CloudTrail can stream to S3, where Athena can search for it. Athena lets you use SQL to search, but it requires more setup.
Athena enables SQL queries over unstructured data in S3. If you’re storing your CloudTrail data in S3, then Athena can query it. However, CloudTrail can generate a substantial amount of data, and if the Athena table isn’t partitioned, it will have to search all the data to find what you want. This can lead to long query times, especially on busy AWS accounts.
For example, the last time I used Athena, my initial queries on an unpartitioned trail took over ten minutes, and this was not even on a particularly busy AWS Account. To improve performance, I created a partitioned trail.
If there’s no Athena table or your query takes over five minutes, you should probably ask for help. However, the documentation is quite comprehensive and can guide you through setting up and optimizing Athena for CloudTrail queries.
Set-Up
Athena, when set up correctly, allows you to quickly and easily find the API calls using a query like this:
select *
from use2_cloudtrail_logs_pp
where eventname = 'CreateStack'
and eventtime >= '2023-08-02T00:00:00Z'
and eventtime < '2023-08-03T00:00:00Z';
Conclusion
CloudTrail is an indispensable tool in the AWS ecosystem, offering a window into the API interactions that form the backbone of your cloud infrastructure. By leveraging CloudTrail in conjunction with CloudWatch and Athena, you can gain powerful insights into your AWS environment, efficiently troubleshoot issues, and maintain robust security practices.
Recap
CloudTrail Event History provides a quick, user-friendly way to view recent API activity.
CloudWatch Logs offer more advanced search capabilities and real-time monitoring.
Athena enables complex SQL queries for deep dives into historical data.
As you become more proficient with these tools, troubleshooting AWS issues becomes more straightforward and less time-consuming. Effective use of CloudTrail not only helps in resolving problems but also in proactively monitoring your AWS environment for potential security risks or operational anomalies.
As AWS continues to evolve, mastering tools like CloudTrail remains crucial for any cloud professional. Whether you’re an administrator, developer, or security specialist, the skills you’ve learned here will serve as a solid foundation for managing and optimizing your AWS infrastructure.
Keep exploring, stay curious, and happy troubleshooting!
Link copied!
Subscribe
Subscribe to our monthly newsletter and stay up to date with all news and events.