AWS SAA - Services - Databases

# AWS SAA - Services - Databases ### Database - Agenda and Introduction - Database Types ![[AWS-SAA-DB-Types.png]] - Reference [[AWS Cloud Practitioner#Core AWS Services - Database|Core AWS Services - Databases]] ### AWS RDS - **Amazon RDS (Relational Database Service)** is a **fully managed database service** that simplifies the setup, operation, and scaling of **relational databases** in the cloud. It supports multiple database engines, including **MySQL, PostgreSQL, MariaDB, SQL Server, Oracle, and Amazon Aurora**. RDS automates **database provisioning, backups, patching, and scaling**, while offering features like **multi-AZ deployments, read replicas, and encryption** for high availability and security. It is ideal for applications requiring **scalable, managed relational databases without administrative overhead**. - AWS **RDS Instance Types** are categorized based on workload requirements. Below are the **General Purpose** and **Memory Optimized** instance types: #### 1. General Purpose (Balanced Performance & Cost) Designed for a broad range of workloads with moderate performance needs. - **db.t3** – Burstable performance instances with baseline CPU credits. - **db.t4g** – ARM-based (Graviton2) burstable instances for cost efficiency. - **db.m5** – Latest generation, balanced CPU and memory performance. - **db.m6g** – Graviton2-based general-purpose instances for improved efficiency. #### 2. Memory Optimized (High-Performance Databases Needing More RAM) Designed for workloads that require **high memory-to-CPU ratio**, like analytics and large caching applications. - **db.r5** – High-memory instances for large-scale databases and analytics. - **db.r6g** – Graviton2-based memory-optimized instances for better cost efficiency. - **db.r6i** – Intel-based high-memory instances with increased performance. - **db.x2g** – Memory-optimized, Graviton2-based for extremely high RAM needs. - **db.x2idn / db.x2iedn** – High-performance memory-optimized instances for enterprise workloads. #### AWS RDS Deployment Types 1. **Single-AZ Deployment** - A **single database instance** running in one Availability Zone (AZ). - Suitable for **development, testing, and non-critical workloads**. - Lower cost but no automatic failover in case of failure. 2. **Multi-AZ Deployment (High Availability)** - A **primary database instance** is synchronously replicated to a **standby instance** in a different AZ. - Provides **automatic failover** in case of instance failure. - Used for **production workloads** requiring high availability. 3. **Read Replicas (Scalability & Performance Optimization)** - Asynchronous replicas of the primary database used to **offload read traffic**. - Can be deployed across **multiple AZs or Regions**. - Helps improve **read performance** for applications with heavy read workloads. 4. **Multi-Region Deployment** - A **primary database in one region** with **read replicas in another region**. - Provides **disaster recovery and global application scaling**. - Supports **manual promotion** of a read replica to primary in case of failure. #### Blue/Green Deployments - **AWS RDS Blue/Green Deployments** is a **fully managed feature** that simplifies **safe database updates** by creating a **blue (current) environment** and a **green (staging) environment**. The **green environment** is an identical copy of the production database where updates, schema changes, and new configurations can be tested without affecting live traffic. When ready, a **zero-downtime switchover** redirects traffic from the blue to the green environment, ensuring a **fast rollback option** in case of issues. This deployment strategy enhances **resiliency, minimizes risk, and improves availability** for database upgrades and maintenance. #### AWS RDS Storage Types - AWS RDS offers different storage types optimized for various workloads: 1. **General Purpose (SSD) – GP3 & GP2** - **GP3**: Latest generation, offers **provisioned performance** independent of storage size. Cost-effective and suitable for **most workloads**. - **GP2**: Legacy option with **baseline and burstable IOPS**, scaling with storage size. 2. **Provisioned IOPS (SSD) – IO1 & IO2** - Designed for **high-performance, I/O-intensive applications**. - Supports **low-latency, high-throughput workloads** like OLTP databases. - **IO2** provides better durability than IO1. 3. **Magnetic (Legacy HDD) – Standard Storage** - Older storage type, **not recommended for new deployments**. - Suitable for **small, infrequent access workloads**. - Each storage type balances **performance, cost, and durability**, allowing users to choose based on their database workload needs. #### AWS RDS Configurations Overview - AWS RDS provides multiple configuration options to optimize **performance, availability, security, and cost-efficiency**: 1. **Instance Configuration** - Choose between **General Purpose (t3, m5)** or **Memory-Optimized (r5, x2g)** instance types. - Configure **CPU, RAM, and storage** based on workload needs. 2. **Storage Configuration** - Select **GP3 (cost-effective SSD), IO1/IO2 (high-performance SSD), or Magnetic (legacy HDD)**. - Enable **storage auto-scaling** to dynamically adjust capacity. 3. **Availability & Scaling** - Deploy as **Single-AZ** (basic availability) or **Multi-AZ** (automatic failover). - Use **Read Replicas** for **horizontal scaling** of read workloads. - Implement **Blue/Green Deployments** for seamless database upgrades. 4. **Security Configuration** - Enable **IAM authentication, SSL/TLS encryption, and AWS KMS** for data security. - Use **VPC security groups and network ACLs** to control access. 5. **Backup & Maintenance** - Enable **automated backups** with retention up to **35 days**. - Use **manual snapshots** for long-term storage. - Configure **maintenance windows** for scheduled updates. 6. **Monitoring & Performance Optimization** - Use **Amazon CloudWatch** for real-time monitoring. - Enable **Performance Insights** to analyze database workload performance. - Configure **Enhanced Monitoring** for OS-level metrics. - These configurations help **optimize performance, reliability, and security** for AWS RDS databases. #### AWS RDS Summary **Amazon Relational Database Service (RDS)** is a **fully managed database service** that simplifies the setup, operation, and scaling of **relational databases** in the cloud. It supports multiple database engines, including **MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Amazon Aurora**. ##### Key Features: - **Deployment Options:** Single-AZ, Multi-AZ for high availability, Read Replicas for scalability, and Blue/Green Deployments for safe updates. - **Storage Types:** General Purpose (GP3, GP2), Provisioned IOPS (IO1, IO2), and Magnetic (legacy). - **Security & Compliance:** IAM authentication, SSL/TLS encryption, KMS for data encryption, and VPC isolation. - **Backup & Maintenance:** Automated backups (up to 35 days), manual snapshots, and maintenance windows. - **Performance Monitoring:** CloudWatch, Performance Insights, and Enhanced Monitoring. ##### Use Cases: - Web and mobile applications requiring scalable relational databases. - Enterprise applications with high availability and disaster recovery needs. - Data warehousing and analytics with Amazon Aurora and RDS Read Replicas. - RDS enables **easy database management** with **automatic scaling, patching, and failover capabilities**, making it a reliable and efficient solution for cloud-based relational databases. 🚀 ### AWS Aurora **Amazon Aurora** is a **fully managed, high-performance relational database service** designed for **high availability, scalability, and cost efficiency**. It is compatible with **MySQL and PostgreSQL**, offering up to **5x better performance** than standard MySQL and **3x better performance** than PostgreSQL, while maintaining compatibility with existing database tools and applications. #### Key Features: - **Highly Available & Durable** – Multi-AZ architecture with **automatic failover** and **6-way replication** across 3 Availability Zones. - **Scalability** – Auto-scales storage up to **128 TB** with read replicas for performance optimization. - **Performance & Cost Efficiency** – Faster query performance with **serverless and provisioned options**. - **Security** – Supports **IAM authentication, encryption (at-rest and in-transit), and VPC isolation**. - **Global Database** – Enables cross-region replication with **low-latency global access**. #### Use Cases: - **Enterprise applications** needing high availability and performance. - **SaaS applications** requiring automatic scaling and multi-region replication. - **Analytics & reporting** with read replicas for heavy read workloads. Aurora provides **enhanced performance, reliability, and security** while being **fully managed**, reducing database maintenance overhead. #### AWS Aurora Architecture Overview **Amazon Aurora** is designed with a **distributed, fault-tolerant, and highly scalable architecture** to ensure **high performance, availability, and durability**. ##### Key Architectural Components: 1. **Storage Layer (Distributed & Auto-Scaling)** - Aurora **automatically scales storage** up to **128 TB** per database instance. - **6-way replication** across **3 Availability Zones (AZs)** ensures fault tolerance. - **Log-based storage architecture** optimizes durability and crash recovery. 2. **Compute Layer (Decoupled & Scalable)** - Aurora runs on **separate compute instances**, allowing independent scaling. - Supports **Aurora Auto Scaling** for adjusting read replicas based on demand. - **Aurora Serverless** enables automatic compute scaling based on usage. 3. **Replication & High Availability** - **Multi-AZ support** with automatic failover to a standby replica in case of failure. - **Aurora Replicas (up to 15)** for **read scaling** and disaster recovery. - **Global Database** allows replication across regions with sub-second latency. 4. **Performance Optimizations** - **Parallel query execution** and **low-latency data access** improve query performance. - **Buffer cache survives instance restarts**, reducing recovery time. ##### Use Cases: - **Mission-critical applications** requiring high availability and resilience. - **SaaS and enterprise applications** needing global scalability. - **Big data analytics** with high throughput and multiple read replicas. Aurora’s **unique architecture** combines the scalability and cost-effectiveness of open-source databases with the reliability and performance of commercial databases. #### AWS Aurora Types ##### 1. Aurora Standard (Provisioned) - Traditional **provisioned** database instances with manually allocated compute resources. - Ideal for **predictable workloads** requiring **high availability and performance**. - Supports **Multi-AZ deployments**, **replicas**, and **automatic failover**. ##### 2. Aurora Serverless v2 - **Auto-scales compute resources** based on application demand. - Eliminates the need to manage database capacity manually. - Cost-efficient for **variable or unpredictable workloads**. - Supports **Multi-AZ for high availability** and **replica scaling**. ##### 3. Aurora Global Database - **Multi-region replication** with sub-second latency. - Allows fast **disaster recovery** and **low-latency access** for global applications. - Supports **automatic failover** and promotes a read replica to primary in another region. #### Aurora v1 and v2 Differences ![[AWS_RDS_Aurora_v1v2_Differences.png]] - v1 is essentially phased out & recommended to use v2. ### RDS - RDS proxy **AWS RDS Proxy** is a **fully managed database proxy** that improves the **scalability, security, and availability** of **Amazon RDS** and **Aurora databases**. It **pools and manages database connections**, reducing the overhead on databases from **excessive open connections** and improving performance for **serverless applications** and **highly concurrent workloads**. #### **Key Features:** - **Connection Pooling** – Efficiently reuses and manages connections to prevent database overload. - **High Availability** – Supports **Multi-AZ failover** for minimal downtime. - **Security Integration** – Works with **IAM authentication and AWS Secrets Manager** for credential management. - **Compatibility** – Supports **MySQL, PostgreSQL, and Aurora** databases. - **Cost Optimization** – Reduces database compute costs by minimizing idle connections. #### **Use Cases:** - **Lambda-based serverless applications** that frequently open and close database connections. - **Web applications with high concurrent user traffic** requiring **efficient connection management**. - **Applications needing high availability** with **automatic failover support**. AWS RDS Proxy helps improve **application responsiveness, reliability, and cost-efficiency** for **database-heavy workloads**. 🚀 ### RDS - Demo - A brief overview & demo of the RDS service & features. ### Redshift - Main **Amazon Redshift** is a **fully managed, petabyte-scale data warehouse service** designed for **fast and scalable analytics**. It enables businesses to efficiently **run complex queries on large datasets** using standard SQL and integrates seamlessly with AWS services like **S3, Glue, and Athena**. #### Key Features: - **Columnar Storage & Compression** – Optimized for analytical queries by reducing I/O operations. - **Massively Parallel Processing (MPP)** – Distributes queries across multiple nodes for faster execution. - **Redshift Spectrum** – Allows querying **data stored in S3** without loading it into Redshift. - **Concurrency Scaling** – Automatically adds capacity to handle spikes in workload demand. - **Security & Compliance** – Supports **encryption, VPC isolation, and IAM authentication**. #### Use Cases: - **Big data analytics & business intelligence** (BI) workloads. - **ETL processing** and data lake integration with **AWS Glue**. - **Machine learning model training** with Redshift ML. AWS Redshift is **cost-effective, scalable, and optimized for performance**, making it ideal for **high-speed analytical processing** in enterprise environments. #### **AWS Redshift - Main Components** ![[AWS_Redshift_Components-0.png]] ##### **1. Cluster** - A **Redshift cluster** is the fundamental unit, consisting of **one or more nodes**. - It contains a **leader node** and **compute nodes** to process queries and store data. ##### **2. Nodes** - **Leader Node**: Manages query coordination, planning, and execution. - **Compute Nodes**: Store and process data in parallel using **Massively Parallel Processing (MPP)**. - **Node Types**: Dense Compute (DC2) for high-performance workloads, Dense Storage (DS2) for large datasets, and RA3 for scalable storage. ##### **3. Slices** - Each **compute node** is divided into **slices** that handle portions of the workload. - Queries are **distributed across slices** for faster execution. ##### **4. Columnar Storage** - Stores data **by column rather than row**, reducing I/O and improving query performance. - Uses **compression** to optimize storage and query speed. ##### **5. Workload Management (WLM)** - Allows users to define **query queues and priorities** to optimize performance. - Ensures efficient resource allocation for different workloads. ##### **6. Redshift Spectrum** - Enables **direct querying of data in Amazon S3** without loading it into Redshift. - Expands Redshift capabilities for **big data lake analytics**. ##### **7. Security & Encryption** - Supports **VPC isolation, IAM authentication, and encryption (KMS & HSM)**. - Ensures **data at rest and in transit** remains secure. ### Redshift - Serverless **AWS Redshift Serverless** is a **fully managed, on-demand data warehouse service** that allows users to run **analytics at any scale without managing infrastructure**. It **automatically provisions, scales, and optimizes compute capacity** based on query workloads, eliminating the need for manual cluster management. #### **Key Features:** - **No Cluster Management** – Automatically scales resources up or down based on demand. - **Pay-Per-Use Pricing** – Charges based on the actual compute used rather than pre-provisioned capacity. - **Automatic Scaling & Optimization** – Adapts resources dynamically to match workload requirements. - **Seamless Data Integration** – Works with **Amazon S3, Glue, Athena, and BI tools** for ETL and analysis. - **Security & Compliance** – Supports **IAM authentication, VPC isolation, and encryption** for secure data processing. #### **Use Cases:** - **Ad-hoc analytics** without the need for dedicated Redshift clusters. - **Data warehousing for unpredictable or variable workloads**. - **Business intelligence (BI) and data lake queries using Redshift Spectrum**. Redshift Serverless is ideal for **organizations needing scalable, cost-efficient, and fully automated data analytics** without the operational overhead of managing infrastructure. ### DynamoDB - Reference [[NoSQL Databases]]. **Amazon DynamoDB** is a **fully managed NoSQL database service** designed for **high-performance, scalability, and low-latency applications**. It supports **key-value and document data models** and delivers **millisecond response times** at any scale. DynamoDB is **serverless**, meaning it automatically **scales, replicates data across multiple Availability Zones (AZs), and handles performance tuning** without manual intervention. #### **Key Features:** - **Fully managed & serverless** – No infrastructure management required. - **Scalability & high availability** – Automatically scales throughput and storage. - **Low-latency performance** – Single-digit millisecond response times. - **Flexible data model** – Supports **key-value and document-based storage**. - **Built-in security & compliance** – IAM authentication, encryption, and VPC integration. - **Advanced capabilities** – DynamoDB Streams, Global Tables, and On-Demand mode for dynamic scaling. #### **Use Cases:** - **Real-time applications** – Gaming leaderboards, IoT device tracking, session storage. - **Serverless applications** – Backend for Lambda, API Gateway, and mobile apps. - **High-traffic workloads** – E-commerce, financial transactions, and personalized content delivery. #### Summary **Amazon DynamoDB** is a **highly scalable, fully managed NoSQL database service** that provides **fast, consistent, and low-latency performance**. It supports **key-value and document-based data models**, making it ideal for **high-traffic and real-time applications**. With **built-in security, automatic scaling, and global replication**, DynamoDB is a **powerful solution for modern cloud-native applications** requiring **high availability and reliability**. #### **AWS DynamoDB Key Types** - **Primary Key** - A unique identifier for each item in a DynamoDB table. - Two types of primary keys: - **Partition Key (Hash Key)** - **Composite Key (Partition Key + Sort Key)** - **Partition Key (Hash Key)** - A single attribute that uniquely identifies each item in the table. - Used for fast lookup and determines how data is distributed across storage nodes. - Example: A `UserID` as the partition key in a **Users table**. - **Composite Primary Key (Partition Key + Sort Key)** - A combination of **Partition Key** and **Sort Key** to create a unique identifier. - Allows multiple items with the same **Partition Key**, but each must have a unique **Sort Key**. - Example: A `CustomerID` as the partition key and `OrderDate` as the sort key in an **Orders table**. - **Sort Key (Range Key)** - Used with a **Partition Key** to enable efficient querying of sorted data. - Supports range queries (e.g., retrieving all orders for a customer within a specific date range). - **Global Secondary Index (GSI)** - An index with a different partition key than the table’s primary key. - Allows flexible querying by duplicating data into an alternative index. - Example: A **Users table** with a primary key `UserID` but a GSI on `EmailAddress` to allow lookups by email. - **Local Secondary Index (LSI)** - An index that shares the **Partition Key** with the main table but has a different **Sort Key**. - Enables efficient querying of **sorted data within a partition**. - Example: An **Orders table** with `CustomerID` as the partition key and an LSI on `OrderStatus` to filter orders by status. #### DynamoDB Streams **DynamoDB Streams** is a **change data capture (CDC) feature** that captures a **time-ordered sequence of item-level modifications** in a DynamoDB table. It allows applications to **react to data changes in near real-time** by triggering AWS services like **Lambda, Kinesis, or EC2-based consumers**. ##### **Key Features** - Captures **INSERT, UPDATE, and DELETE** operations on a DynamoDB table. - Stores change records for **up to 24 hours**. - Provides **at-least-once delivery** of change events. - Enables event-driven architectures with **AWS Lambda triggers**. - Supports different stream view types: - **Keys only** – Captures only primary keys. - **New image** – Captures the entire new version of the item. - **Old image** – Captures the item's previous version before the change. - **New and old images** – Captures both before and after images. ##### **Use Cases** - **Event-driven processing** – Triggering Lambda functions for real-time updates. - **Replication & synchronization** – Copying changes to another database or storage system. - **Auditing & logging** – Keeping a history of changes for compliance. - **Analytics & monitoring** – Processing data changes for reporting and insights. DynamoDB Streams enables **efficient real-time processing** of database changes, making it a powerful tool for **serverless applications, analytics, and data replication**. #### **DynamoDB Table Classes** DynamoDB offers two **table classes** to optimize cost and performance based on workload patterns: ##### **1. Standard Table Class** - **Best for**: Frequently accessed data with unpredictable or high read/write traffic. - **Storage Cost**: **Higher** compared to the Infrequent Access table class. - **Request Cost**: **Lower** for read/write operations, making it cost-effective for high-traffic applications. - **Use Cases**: - Web and mobile applications with **consistent and high traffic**. - **Gaming leaderboards**, real-time analytics, and e-commerce applications. ##### **2. Standard-Infrequent Access (Standard-IA) Table Class** - **Best for**: Tables with low read/write activity but large data storage needs. - **Storage Cost**: **Lower**, making it ideal for long-term data retention. - **Request Cost**: **Higher** per read/write operation. - **Use Cases**: - **Archival or historical data** that is infrequently accessed. - **Compliance and audit logs** that need long-term storage. ##### **Key Differences** | Feature | Standard Table Class | Standard-IA Table Class | | ---------------- | --------------------------- | ------------------------------- | | **Storage Cost** | Higher | Lower | | **Request Cost** | Lower | Higher | | **Use Case** | High-traffic apps | Infrequent data access | | **Best For** | Web, gaming, real-time apps | Archival, logs, historical data | ### DynamoDB - Demo - A brief overview & demo of DynamoDB & features. ### LAB - Scaling DynamoDB using Autoscaling - Task Request - Part 0: - Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/. - In the navigation pane on the left side of the console, choose Dashboard. - On the right side of the console, choose Create Table. - Enter the table details as follows: - a. For the table name, enter Music. - b. For the partition key, enter Artist. - c. Enter SongTitle as the sort key. - d. Leave Default settings selected. - Choose Create to create the table. - Task Request - Part 1: - Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/. - In the navigation pane on the left side of the console, choose Tables. - In the table list, choose the Music table. - Select Explore Table Items. - In the Items view, choose Create item. - Enter the following values for your item: - a. For Artist, enter **KodeKloud** as the value. - b. For SongTitle, enter **AWS Certified Solutions Architect**. - Task Request - Part 2: - Delete the Music table. ### DynamoDB Accelerator (DAX) **Amazon DynamoDB Accelerator (DAX)** is a **fully managed, in-memory caching service** designed to **accelerate read performance** for DynamoDB. It **reduces response times from milliseconds to microseconds** by caching frequently accessed data, making it ideal for **high-read and low-latency applications**. #### **Key Features** - **In-memory cache** – Stores frequently accessed data for ultra-fast lookups. - **Microsecond response times** – Speeds up read-heavy workloads. - **Fully managed** – No need to manage caching infrastructure. - **Highly scalable** – Supports millions of requests per second. - **Strong consistency support** – Ensures cache remains synchronized with DynamoDB. - **Integration with existing DynamoDB API** – Requires minimal code changes. #### **Use Cases** - **Real-time applications** – Gaming, social media feeds, and recommendation engines. - **High-traffic APIs** – E-commerce product catalogs and IoT data retrieval. - **Latency-sensitive workloads** – Applications requiring sub-millisecond response times. DAX enhances **DynamoDB performance and scalability**, making it a powerful solution for **low-latency, high-read applications** without impacting database consistency. ### OpenSearch **Amazon OpenSearch Service** is a **fully managed search and analytics service** that enables users to perform **fast, scalable searches on large datasets**. It is based on **OpenSearch (a fork of Elasticsearch)** and provides real-time indexing, full-text search, and data visualization capabilities. #### **Key Features** - **Full-Text Search** – Supports powerful text-based querying across structured and unstructured data. - **Real-Time Analytics** – Enables log monitoring, observability, and security analytics. - **Scalability & High Availability** – Supports **multi-AZ deployments, automated scaling, and backups**. - **Built-in Security** – Integrates with **IAM, VPC, encryption, and fine-grained access control**. - **Integration with AWS Services** – Works with **Kinesis, CloudWatch, S3, and DynamoDB Streams**. - **Kibana & OpenSearch Dashboards** – Provides intuitive data visualization and monitoring. #### **Use Cases** - **Log & Event Monitoring** – Analyzing application logs and security data. - **Enterprise Search** – Powering search functionality for websites and internal data. - **Real-Time Data Analytics** – Processing and visualizing business and IoT data. - **Security Analytics** – Detecting anomalies in security logs. AWS OpenSearch offers **powerful, real-time search and analytics capabilities**, making it ideal for **log management, observability, and search-driven applications**. #### Types of OpenSearch - AWS OpenSearch Service - AWS OpenSearch Serverless #### **AWS OpenSearch Components Overview** AWS OpenSearch consists of several key components that enable **search, analytics, and real-time data processing**. ##### **1. Cluster** - A collection of **OpenSearch nodes** working together. - Manages **indexing, searching, and storing data**. - Can be deployed in a **single-AZ or multi-AZ** setup for high availability. ##### **2. Nodes** - **Master Node** – Manages cluster health, node allocation, and metadata. - **Data Node** – Stores indexed data and processes search queries. - **Ingest Node** – Pre-processes data before indexing (used for transformations). ##### **3. Indices** - Logical collections of **documents** similar to tables in a relational database. - Each index contains **shards**, which store subsets of data. - Supports **replication** for fault tolerance and performance optimization. ##### **4. Shards & Replicas** - **Shards** – Break data into smaller segments for parallel processing. - **Replicas** – Duplicate shards for **fault tolerance** and **load balancing**. - The number of **shards and replicas** can be configured per index. ##### **5. Query DSL (Domain-Specific Language)** - OpenSearch uses **JSON-based queries** for powerful **search and filtering**. - Supports **full-text search, structured queries, aggregations, and ranking**. ##### **6. OpenSearch Dashboards** - A visualization tool similar to **Kibana**, used for monitoring and analyzing data. - Provides **interactive charts, graphs, and dashboards** for log and metric analysis. ##### **7. Security & Access Control** - Supports **IAM authentication, VPC integration, fine-grained access control, and encryption**. - Offers **role-based access (RBAC)** to restrict data access at different levels. ##### **8. Integration with AWS Services** - **Kinesis Data Firehose** – Streams real-time log data to OpenSearch. - **CloudWatch Logs** – Sends AWS logs for monitoring and analysis. - **S3 & DynamoDB Streams** – Enables real-time indexing from AWS data sources. #### **AWS OpenSearch Integrations** AWS OpenSearch integrates seamlessly with various AWS services to enhance **search, analytics, and monitoring capabilities**. ##### **1. Data Ingestion & Streaming** - **Amazon Kinesis Data Firehose** – Streams real-time log and event data into OpenSearch for indexing. - **AWS Glue** – ETL service that transforms and loads structured/unstructured data into OpenSearch. - **Amazon S3** – Index and analyze logs, documents, and large datasets stored in S3. - **Amazon DynamoDB Streams** – Captures table changes and sends them to OpenSearch for indexing. ##### **2. Monitoring & Observability** - **Amazon CloudWatch Logs** – Streams AWS service logs to OpenSearch for monitoring and alerting. - **AWS Lambda** – Processes and sends logs, metrics, and events to OpenSearch for real-time analysis. - **AWS X-Ray** – Traces application requests and integrates with OpenSearch for distributed tracing analysis. ##### **3. Security & Access Control** - **AWS IAM** – Manages authentication and access control for OpenSearch. - **AWS Key Management Service (KMS)** – Encrypts OpenSearch data at rest. - **Amazon Cognito** – Provides user authentication and access control for OpenSearch Dashboards. ##### **4. Machine Learning & AI** - **Amazon SageMaker** – Uses OpenSearch as a data source for training and deploying ML models. - **Kendra Search Integration** – Enhances OpenSearch-based search applications with AI-driven enterprise search. ##### **5. Business Intelligence & Analytics** - **Amazon QuickSight** – Connects to OpenSearch for interactive data visualization and analytics. - **OpenSearch Dashboards** – Built-in visualization tool for creating dashboards and monitoring search trends. ##### **6. Application & API Integration** - **Amazon API Gateway** – Enables RESTful API search endpoints backed by OpenSearch. - **Amazon EventBridge** – Routes AWS events to OpenSearch for real-time processing. AWS OpenSearch's deep integration with **AWS services** makes it a **powerful tool for real-time analytics, log management, security monitoring, and machine learning applications**. ### OpenSearch - Demo - A brief overview & demo of the OpenSearch service & features. - Reference OpenSearch documentation - Getting Started ### ElastiCache AWS ElastiCache is a **fully managed, in-memory caching service** that improves application performance by reducing **latency and database load**. It supports two caching engines: **Memcached** and **Redis**, both designed to **accelerate read-heavy and compute-intensive workloads**. #### Key Features - Low-latency, high-throughput performance for real-time applications. - Fully managed, including patching, monitoring, and scaling. - Supports Multi-AZ replication for high availability and disaster recovery. - Integration with AWS services like RDS, DynamoDB, and Lambda. - Enhanced security with VPC isolation, IAM authentication, and encryption. #### ElastiCache Engines - **[[AWS SAA - Services - Databases#Memcached Overview|Memcached]]** – Simple, high-speed caching for ephemeral data storage. - **[[AWS SAA - Services - Databases#Redis Overview|Redis]]** – Advanced caching with **persistence, data replication, and pub/sub messaging**. #### Use Cases - Database query caching to reduce load on relational or NoSQL databases. - Real-time session storage for gaming, e-commerce, and web applications. - Leaderboards, recommendation engines, and machine learning inference caching. - Distributed locking and messaging with Redis for microservices architectures. AWS ElastiCache helps **optimize performance, scale applications, and reduce database costs**, making it ideal for **real-time applications and in-memory processing**. #### Redis Overview Redis (**Remote Dictionary Server**) is an **open-source, in-memory data store** known for **high-performance caching, persistence, and advanced data structures**. It supports **key-value storage**, real-time analytics, and **pub/sub messaging**. ##### Key Features - **Persistence** – Supports AOF (Append-Only File) and RDB (Snapshot) persistence options. - **Data Structures** – Supports strings, lists, sets, sorted sets, hashes, bitmaps, and more. - **Replication & Clustering** – Multi-AZ, read replicas, and automatic failover for high availability. - **Transactions & Scripting** – Uses Lua scripting and atomic transactions. ##### Use Cases - **Session storage** for web applications and gaming. - **Message brokering** via **pub/sub** messaging. - **Leaderboard rankings** for real-time applications. - **Real-time analytics** and machine learning caching. #### Memcached Overview Memcached is a **high-performance, distributed memory caching system** designed for **simple key-value storage and rapid data retrieval**. It is best suited for **stateless, ephemeral caching** with **horizontal scaling**. ##### Key Features - **Simple key-value caching** with no complex data structures. - **Multi-threaded architecture** for handling large workloads efficiently. - **Horizontal scaling** across multiple nodes for distributed caching. - **No persistence** – Data is lost when the server restarts. ##### Use Cases - **Database query caching** to reduce load and response time. - **Object caching** for web applications and API responses. - **Session caching** for user authentication and shopping carts. - **Dynamic content caching** for frequently accessed data. #### Redis vs. Memcached | Feature | Redis | Memcached | | ---------------------------- | ------------------------------------------- | -------------------------- | | **Persistence** | Yes (AOF, RDB) | No | | **Data Structures** | Advanced (Lists, Sets, Hashes, Sorted Sets) | Simple (Key-Value) | | **Replication & Clustering** | Yes | No | | **Scaling** | Read Replicas, Clustering | Horizontal Scaling | | **Best Use Case** | Complex caching, messaging, analytics | Simple, high-speed caching | **Redis is best for advanced caching, real-time analytics, and messaging**, while **Memcached is ideal for lightweight, high-speed caching** with simple key-value storage. ### MemoryDB for Redis AWS MemoryDB for Redis is a **fully managed, durable, in-memory database service** designed for **ultra-fast performance with high availability and persistence**. Unlike ElastiCache for Redis, MemoryDB **stores data both in-memory and with durable multi-AZ replication**, ensuring **low-latency performance** while maintaining data integrity. #### Key Features - **In-memory performance** with sub-millisecond latency. - **Multi-AZ durability** with automatic replication and failover. - **Full Redis compatibility** with support for native Redis data structures and commands. - **Strong consistency** using distributed transactional log storage. - **Encryption & Security** with AWS IAM, KMS encryption, and VPC integration. #### Use Cases - **Caching with persistence** – Ideal for applications needing high-speed data access with durability. - **Real-time applications** – Leaderboards, session storage, and financial transaction processing. - **Event-driven microservices** – Message queues and real-time analytics. - **Gaming & AI inference** – Low-latency access for fast decision-making. AWS MemoryDB for Redis provides **the speed of Redis with built-in durability**, making it ideal for applications requiring **high-performance, real-time processing with persistent data storage**.