Posts

Amazon Bedrock implementation leveraging AWS Lake Formation Tag Based Access Control (LF-TBAC)

Image
While an easy access to pretrained LLM’s revolutionized applications, they have also led us to scenarios which are difficult and complicated. The challenges that LLM’s pose in terms of a safe and secure deployments, can often become the biggest hurdle in LLM implementation. Some of the critical points in a typical LLM deployment are - LLM’s do not differentiate among users; and respond to every question leveraging their training data. Organizations often implement LLM use cases using Retrieval Augmented Generation (RAG) technique which optimizes LLM responses by providing it access to the organization data. Prompt engineering furthers the RAG implementation by ensuring that the LLM responses are limited to the organization data that it has access to. While this “fencing” of LLM responses using RAG and Prompt engineering is useful, it also leads to a scenario where every user now has access to all the data that the LLM has access to. This means all users have access to all data – Pub

Processing Notification and Queue data in AWS

Image
Data ingestion in Batch (typically large datasets) and Real time (trickle feeds) are the most common forms of ingesting data. In this blog let us look processing the data which has been ingested via Notifications and Queues. Notifications are typically used to inform users of events, while queues are typically used to store and process messages in a specific order. Examples of notifications : An email notification that you have received a new message. A push notification on your phone that a new app update is available. A pop-up notification on your computer that a new file has been downloaded. Examples of queues : A queue of print jobs waiting to be printed. A queue of emails waiting to be sent. A queue of tasks waiting to be executed by a background worker process. The table below summarizes some key concepts of Notifications and queues: Feature Notification Queue Purpose To inform

A case for Narrow - Purpose built LLM’s

Image
The days of using weather as conversation starter seems to have long gone, at least within the tech circles. Generative AI has not only overtaken weather as the conversation starter but is often coming out to be the only conversation topic. Everyone wants to know what everyone else is building using Gen AI, what proof of concept (POC) are being considered etc. Large Language Model (LLM) names and their capabilities are talked with the excitement trailing even the Thanksgiving deals. Yet, a new report about Gen AI adoption and usage published last week seems to contradict this weather replacing hype. The report indicated a decrease in traffic to ChatGPT and other Gen AI services, for the first time in several months. This traffic report further showed that this decreasing trend was in both new traffic, as well as traffic from existing users of these Gen AI services. What might be causing this gap between the hype and the actual usage? When Gen AI opened for public, it was that shi

Accessing Data from an AWS data lake using AWS Lake Formation – Part 1 - Data Filtering based access control

Image
A data lake is a repository to store your data. Like a database, a data lake is expected to have the data in an organized manner, provide tools for data processing and data access, and have well defined methods for authentication and authorization. And unlike a database, data lake is expected to hold structured, semi-structured or unstructured data, and are envisioned to hold this data forever.   It is these subtle differences that make data lakes fit more for analytic needs such as deriving patterns, detailed comparisons, build an exhaustive story etc. A database can also be used for analytic use cases, but they are not meant for storing large volumes of data indefinitely, or store unstructured data, and this in turn puts a limit on how detailed of a result it can provide. Whether you need a data lake is a question that depends on the volume of data in your organization. If your organization data is small; you do not have much history or you do not have a need to store large history;

AWS Instance Role and Instance Profile

 In this blog, let us try to understand what are Instance Profile and Instance Roles in AWS. An Instance profile is an identifier for an Amazon EC2 instance while an Instance Role defines what can the user (in this case the Amazon EC2 which assumes this role) accomplish. In some way, think of the Instance Profile as the designation – Architect, which the Instance Role defines the roles and responsibilities of the person who holds that designation. You may now be wondering, Why is this understanding and the distinction between these two important. The answer lies in how AWS services create these; or expects us to create these. If you have always used the AWS console to create the Instance Role, then there is a chance you may not know what an Instance Profile is, as the console automatically creates the Instance Profile too and gives it the same name as the name you used for the Instance Role. Now when you use the console to create and launch the Amazon EC2 attaching the role you cre

Amazon Redshift and the “Low/Zero-Code - Low/Zero-ETL” narrative

Image
Amazon Redshift is a column-oriented, fully managed, petabyte-scale data warehouse that makes it easy and cost-effective to analyze all your data. It achieves efficient query performance through a combination of massively parallel processing, columnar data storage, data compression, and ML powered system optimizations.  In this blog let's look at how Amazon Redshift helps the “Low/Zero Code – Low/Zero-ETL” narrative.  Before we get started, let's first understand what ETL means and why we do ETL.  ETL stands for Extract, Transform and Load, and it means exactly what each of these terms sounds like. We Extract data from a source, Transform the data based on the need and Load it to a target. For example - You can use a Cobol program to Extract data from a GDG, Transform the data in a series of Cobol code and Load it into a DB2 table. Another example can be where you use an ETL tool such as Informatica or Talend to extract data from a table or a file, Transform the data and then L