Blog

Everything You Wanted to Know About Data Perimeters in AWS: Part Deux

Black and gold digital texture

We know what a data perimeter is. Now what?

In part one of our blog Data Perimeters in AWS, we examined the different types of data perimeter mechanisms. In this second part, we will look at options for providing authorized and accountable access through these perimeters. In particular, we will look at two scenarios: AWS RDS and AWS S3.

AWS RDS

The diagram below shows the various perimeters that need to be addressed in our setup.

Example of an RDS instance that will run MySQL and use IAM authentication

Example of an RDS instance that will run MySQL and use IAM Authentication. 

In this example above, we are setting up an RDS instance that will run MySQL and use IAM Authentication. In this model, users are defined in the database engine, but those users can only be used by principals in IAM (either Users or Roles).

AWS Organization (1)

Starting with the outermost perimeter—the “AWS Organization”—we want to ensure that our principals have the option of using the API needed for RDS IAM Authentication. To do this, we start with a Service Control Policy (SCP). SCPs are typically used to deny access to AWS APIs at the Organization, Organizational Unit, or Account level. Upon creation, the Root OU and sub-OUs have an SCP that grants all access to AWS APIs; customer-supplied SCPs are intended to scale those permissions back. In our scenario, we will want to ensure that the API rds-db:connect is not blocked. Note that even if an API is not blocked by an SCP, it still needs to be explicitly granted to an identity in order for that identity to use it.

IAM (2)

Next in our perimeter list is IAM. This includes any access provided to the identity at the account level. We are using IAM Authentication, so we need to create a local database user and an IAM role and policy. We will create our database user with the following command:

create user ‘iamAuth’@’%’ identified with AWSAuthenticationPlugin as ‘RDS’

Instead of using local database authentication, AWS provides a plugin (AWSAuthenticationPlugin) to authenticate against IAM. Note that the local database user does not have a password set. This is intentional. On the IAM side, I have created a role and will add this as an inline policy:

{
    “Version”: “2012-10-17”,
    “Statement”: [
        {
            “Sid”: “AllowConnection”,
            “Effect”: “Allow”,
            “Action”: “rds-db:*”,
            “Resource”: “arn:aws:rds-db:us-west-2:115437351579:dbuser:db-IVQVHTD2HLTFVVOERJI5PFYB3E/iamAuth”
        }
    ]
}

Another possible perimeter in IAM is the Permissions Boundary. This is an IAM policy attached to an identity that limits the identitie’s API access regardless of other attached or inline policies. As an example, if I have a role with the AdministratorAccess policy attached, and I then attach the AmazonS3ReadOnlyAccess as a Permissions Boundary, that role will have effective permissions of AmazonS3ReadOnlyAccess. Permissions Boundaries are often enforced by deployment controls to ensure that developers do not create over-privileged identities.

Those two configurations make up the Account perimeter. They also satisfy part of the Resource perimeter. We will dive deeper into that later on.

Network Connectivity (3)

MySQL requires network connectivity for usability and MySQL on RDS is no different. For the Network perimeter, the MySQL IP address needs to be exposed to those resources that should be able to connect. Depending on the environment, this could be achieved via DirectConnect, Transit Gateway, VPC Peering, VPC, or a combination of some or all of them. Of note is that any resources that can connect to the MySQL address can potentially read data in the database.

Network Controls (4)

The Network Connectivity control operates at layer 3 of the OSI model (Network), whereas Network Controls operates at layer 4 (Transport). They do blend together a bit at the edges and could be considered a single control, but they are configured in different places in AWS so I have kept them separate. The tool for this perimeter is the VPC Network Access Control List (NACL). The NACL is a stateless firewall; traffic must be explicitly allowed both in and out. Because it is stateless, it is very common to allow all ephemeral ports (1025-65535) in to allow for bidirectional communication. For our RDS example, we need to allow the MySQL port (3306/TCP) in and we can decide which IP addresses or IP ranges we wish to allow. The tighter the scope, the more secure the instance is. Ideally, this should be limited to just those application servers or network nodes that have a legitimate need. To keep a more secure footing, you can also restrict outbound access to those same resources. Please note that a NACL provides the ability to both allow and deny traffic from specific IPs and ports.

Security Group (5)

The VPC NACL discussed above applies to any subnets it is linked to and all the resources within. A Security Group is also a firewall, but it is stateful and applied to a resource. The Security Group can also be applied to many resources, but they don’t have to be in the same subnet. (They do have to be in the same VPC; a Security Group is VPC-specific). Because the Security Group is stateful, incoming traffic linked to an outgoing request does not have to be explicitly allowed. Security Groups can only allow traffic and cannot deny. In our scenario, we would need to explicitly allow the MySQL port (3306/TCP) from the IP addresses that need it. Outbound traffic can be restricted for a more secure posture.

Resource (6)

The last perimeter is the Resource perimeter. For a MySQL RDS instance, this consists of the authentication and authorization data. The identity has to be able to authenticate to the database engine and be authorized for access to the data. Authentication was addressed in the IAM control above, but authorization remains to be configured. In the model of least privilege, an identity should have access to only the data it needs and for only as long as it needs it. In MySQl, the first part is handled via grants and the second is out of scope for this article. Service accounts need special care as it can take longer to notice if they have been compromised.

AWS S3

The available perimeter controls for S3 are shown below.

Perimeter controls for S3.

Perimeter controls for S3

AWS Organization (1)

Much like our RDS scenario above, we need to ensure that there are no SCPs or Permissions Boundaries denying access to the necessary S3 APIs.

IAM (2)

For S3, identity permissions as defined in IAM are the most common way to provide access to identities within the account (Roles, Users, and Groups). There are Amazon-managed policies like AmazonS3ReadOnlyAccess and AmazonS3FullAccess, and custom policies can also be created as either managed policies or inline policies. IAM policies for S3 allow for exceptional granularity, including to specific objects. For an identity to have access to any object in S3, they must have an identity policy attached granting them the access.

Resource (3)

S3 buckets have resource-level access control defined with bucket policies and ACLs. Bucket policies are similar to IAM identity policies in structure and syntax. They can grant or deny access to principals and include conditional statements to tune a policy for a specific need. They are most often used to grant access to an S3 bucket to an external entity. ACLs are a legacy access control mechanism largely supplanted by bucket policies. ACLs can be applied at the bucket and object level and have been used to grant access to external entities. It is a best practice to use bucket policies instead of ACLs. Note that for an IAM identity with an identity policy attached that grants access to S3, no further access control needs to be specified at the resource level. If an IAM identity has a policy attached that denies access to S3, that will supersede any access granted at the resource level.

Accountability

Authentication and authorization have been addressed above, but what about accountability? There are log sources for each data perimeter:

  • AWS Organizations – CloudTrail
  • IAM – CloudTrail and AWS Config
  • Network Connectivity – CloudTrail, AWS Config, and VPC FlowLogs
  • Network Controls – CloudTrail, AWS Config, and VPC FlowLogs
  • Security Groups – CloudTrail, AWS Config, and VPC FlowLogs
  • Resource (RDS) – Cloudwatch Logs. RDS allows for database engine logs to be exported to CloudWatch logs.
  • Resource (S3) – CloudTrail, S3 Server Access Logging. 

If they are managed and processed by the right tools, they can provide great insight into the activities occurring with your data. Ideally, you will have a tool that can take in all of these inputs and produce a correlated output that shows a full picture of the activity.

Over-Permissioned Identities? Try DSPM

Your organization has data. And that data is being used. Managing and securing this vast load of data—potentially millions of data objects across thousands of data stores—is complex. Multiply this by a perpetually infinite combination of roles and permissions for every employee, contractor, vendor and machine identity across the organization, and identity access management can become circuitous and tangled. Even with a fully staffed and funded IT, data, or security team, it can be a struggle to properly inventory, classify, control, and protect critical data assets, while also securing the data from attacks, insider threats, third-party supply chain attacks, vendor threats, and data breaches. Add to this increasing governance and compliance concerns and you have the makings of a perfect data security storm.

The majority of organizations have identities with too much access to data. The least privilege model states that an identity should have exactly the data access it needs (and no more), for exactly the duration of time it needs it. In practice, this is very difficult to do. With the adoption of role-based access control (RBAC), identities in an organization are given access based on their role (i.e., Finance or Business Development). This often results in too much access, as not every Finance employee is going to need access to every finance document. In order to remedy over-permissioning, you need to know where to start—who has access to what resources, are they actively accessing those resources, and do they need access to those resources? 

If you are struggling to understand what kind of data you have, where your data is, and who is using it, you’re in good company because virtually every company has the same challenge. Today’s businesses are required to store and secure different types of data across complex cloud and on-premise infrastructures. The key to enhanced Zero Trust and improved data security is in realizing that data protection can no longer begin and end at the perimeter or the devices being used.

While there is no magic formula for data security, a data security posture management (DSPM) solution can be an immense help in understanding your datascape and enable your teams to implement controls that stave off data incidents. A DSPM solution can tell you:

  • Where your data is located.
  • What kind of data do you have.
  • Who can access your data.
  • How is your data being used and by whom.

To learn more about DSPM or see a DSPM solution in action, please reach out. We’d love to show you how DataGuard can help improve AWS RDS and AWS S3.