Amazon Bedrock implementation leveraging AWS Lake Formation Tag Based Access Control (LF-TBAC)

While an easy access to pretrained LLM’s revolutionized applications, they have also led us to scenarios which are difficult and complicated. The challenges that LLM’s pose in terms of a safe and secure deployments, can often become the biggest hurdle in LLM implementation.

Some of the critical points in a typical LLM deployment are -

  1. LLM’s do not differentiate among users; and respond to every question leveraging their training data.
  2. Organizations often implement LLM use cases using Retrieval Augmented Generation (RAG) technique which optimizes LLM responses by providing it access to the organization data.
  3. Prompt engineering furthers the RAG implementation by ensuring that the LLM responses are limited to the organization data that it has access to.
  4. While this “fencing” of LLM responses using RAG and Prompt engineering is useful, it also leads to a scenario where every user now has access to all the data that the LLM has access to. This means all users have access to all data – Public, Internal Restricted and Confidential; which may lead to unintended consequences.

In this blog, let us look at how we can implement an LLM use case in AWS using Amazon Bedrock and AWS Lake Formation. We will use the AWS Lake Formation Tag Based Access Control ((LF-TBAC) to ensure data access based on user entitlement.

For this example, let us use a scenario where LLM responses are fenced around the user data entitlements in terms of the data classification of - Public, Internal, Restricted and Confidential.


Steps to setup Lake Formation Tag Based Access Control:

  1. Using Lake formation Tags (LF-TBAC), define tags and the values for those tags. In this example let’s define 4 tags with values – Public, Internal Restricted and Confidential.
  2. Assign these 4 tags to your resources – Data Lake Database, tables, columns. The tags are hierarchical and so if you tag a DB, all tables and columns in that DB by default will inherit the same tag. This means if you tag as DB as Confidential, all tables and table columns under that DB will inherit the same tag as confidential. You can override this if required, but this is the default scenario.
  3. Create 4 IAM Roles with similar names as the tag values – IAM-Role-Public, IAM-Role-Internal, IAM-Role-Restricted and IAM-Role-Confidential.
  4. Next define the policies for the tag and scale this permission model. Grant permissions to principals/IAM Roles to assign LF-Tags to resources. You can add Public tag permissions to IAM-Role-Public etc.

Your Lake Formation Tag Based Access Control is now ready to authorize access to catalog resource and S3 objects. When users access the data, LF authorizes access based on the permissions set using the tags. 

Solution Architecture: 
The diagram below shows the architecture we can use to build an application to use LLM’s provided in Amazon Bedrock; and also leverage AWS Lake Formation for ensuring user data entitlement.



(1)   Users are assigned Roles and Policies based on their entitlements. A user can have access to all data including data classified as Restricted or a user can have access only to data classified as Public.

(2)   Users are granted permission based on the IAM Roles and Policies.

(3)   AWS Lambda can be used to capture the User inputs. This is first checked against the Amazon ElastiCache for Redis. If the cache is empty or outdated, AWS Lambda calls AWS Lake Formation.

(4)   AWS Lake Formation is made of the Amazon S3 based Data Lake, Permission set and the Data Catalog. The data in the data lake are tagged based on the underlying data classification.
AWS Lake Formation uses the Tag Based Access Control to compare the Data Tags with the User Role Tags and data is returned if the Data Tag and the Role Tag are matched. For example – if the User Role is Public, Lake formation will return only data tagged as Public.

(5)   The AWS Lambda sends this Lake Formation response to Amazon Bedrock. Amazon Bedrock implementation here uses RAG and this ensures that Amazon Bedrock responses are limited to the data returned from the Lake Formation.

(6)   Once after the Amazon Bedrock response is sent to the customer, the responses are added to the Amazon ElastiCache for Redis.


Conclusion:

Using Lake Formation -TBAC is one way to ensure that LLM responses to users are based on the user data entitlements only. This means, a user with Public entitlement will see LLM responses with data classified as Public only, while a User with Restricted access can see LLM responses built using Restricted data.


~Narendra V Joshi 




Comments