# AWS SAA - Services - Databases
### Database - Agenda and Introduction
- Database Types ![[AWS-SAA-DB-Types.png]]
- Reference [[AWS Cloud Practitioner#Core AWS Services - Database|Core AWS Services - Databases]]
### AWS RDS
- **Amazon RDS (Relational Database Service)** is a **fully managed database service** that simplifies the setup, operation, and scaling of **relational databases** in the cloud. It supports multiple database engines, including **MySQL, PostgreSQL, MariaDB, SQL Server, Oracle, and Amazon Aurora**. RDS automates **database provisioning, backups, patching, and scaling**, while offering features like **multi-AZ deployments, read replicas, and encryption** for high availability and security. It is ideal for applications requiring **scalable, managed relational databases without administrative overhead**.
- AWS **RDS Instance Types** are categorized based on workload requirements. Below are the **General Purpose** and **Memory Optimized** instance types:
#### 1. General Purpose (Balanced Performance & Cost)
Designed for a broad range of workloads with moderate performance needs.
- **db.t3** – Burstable performance instances with baseline CPU credits.
- **db.t4g** – ARM-based (Graviton2) burstable instances for cost efficiency.
- **db.m5** – Latest generation, balanced CPU and memory performance.
- **db.m6g** – Graviton2-based general-purpose instances for improved efficiency.
#### 2. Memory Optimized (High-Performance Databases Needing More RAM)
Designed for workloads that require **high memory-to-CPU ratio**, like analytics and large caching applications.
- **db.r5** – High-memory instances for large-scale databases and analytics.
- **db.r6g** – Graviton2-based memory-optimized instances for better cost efficiency.
- **db.r6i** – Intel-based high-memory instances with increased performance.
- **db.x2g** – Memory-optimized, Graviton2-based for extremely high RAM needs.
- **db.x2idn / db.x2iedn** – High-performance memory-optimized instances for enterprise workloads.
#### AWS RDS Deployment Types
1. **Single-AZ Deployment**
- A **single database instance** running in one Availability Zone (AZ).
- Suitable for **development, testing, and non-critical workloads**.
- Lower cost but no automatic failover in case of failure.
2. **Multi-AZ Deployment (High Availability)**
- A **primary database instance** is synchronously replicated to a **standby instance** in a different AZ.
- Provides **automatic failover** in case of instance failure.
- Used for **production workloads** requiring high availability.
3. **Read Replicas (Scalability & Performance Optimization)**
- Asynchronous replicas of the primary database used to **offload read traffic**.
- Can be deployed across **multiple AZs or Regions**.
- Helps improve **read performance** for applications with heavy read workloads.
4. **Multi-Region Deployment**
- A **primary database in one region** with **read replicas in another region**.
- Provides **disaster recovery and global application scaling**.
- Supports **manual promotion** of a read replica to primary in case of failure.
#### Blue/Green Deployments
- **AWS RDS Blue/Green Deployments** is a **fully managed feature** that simplifies **safe database updates** by creating a **blue (current) environment** and a **green (staging) environment**. The **green environment** is an identical copy of the production database where updates, schema changes, and new configurations can be tested without affecting live traffic. When ready, a **zero-downtime switchover** redirects traffic from the blue to the green environment, ensuring a **fast rollback option** in case of issues. This deployment strategy enhances **resiliency, minimizes risk, and improves availability** for database upgrades and maintenance.
#### AWS RDS Storage Types
- AWS RDS offers different storage types optimized for various workloads:
1. **General Purpose (SSD) – GP3 & GP2**
- **GP3**: Latest generation, offers **provisioned performance** independent of storage size. Cost-effective and suitable for **most workloads**.
- **GP2**: Legacy option with **baseline and burstable IOPS**, scaling with storage size.
2. **Provisioned IOPS (SSD) – IO1 & IO2**
- Designed for **high-performance, I/O-intensive applications**.
- Supports **low-latency, high-throughput workloads** like OLTP databases.
- **IO2** provides better durability than IO1.
3. **Magnetic (Legacy HDD) – Standard Storage**
- Older storage type, **not recommended for new deployments**.
- Suitable for **small, infrequent access workloads**.
- Each storage type balances **performance, cost, and durability**, allowing users to choose based on their database workload needs.
#### AWS RDS Configurations Overview
- AWS RDS provides multiple configuration options to optimize **performance, availability, security, and cost-efficiency**:
1. **Instance Configuration**
- Choose between **General Purpose (t3, m5)** or **Memory-Optimized (r5, x2g)** instance types.
- Configure **CPU, RAM, and storage** based on workload needs.
2. **Storage Configuration**
- Select **GP3 (cost-effective SSD), IO1/IO2 (high-performance SSD), or Magnetic (legacy HDD)**.
- Enable **storage auto-scaling** to dynamically adjust capacity.
3. **Availability & Scaling**
- Deploy as **Single-AZ** (basic availability) or **Multi-AZ** (automatic failover).
- Use **Read Replicas** for **horizontal scaling** of read workloads.
- Implement **Blue/Green Deployments** for seamless database upgrades.
4. **Security Configuration**
- Enable **IAM authentication, SSL/TLS encryption, and AWS KMS** for data security.
- Use **VPC security groups and network ACLs** to control access.
5. **Backup & Maintenance**
- Enable **automated backups** with retention up to **35 days**.
- Use **manual snapshots** for long-term storage.
- Configure **maintenance windows** for scheduled updates.
6. **Monitoring & Performance Optimization**
- Use **Amazon CloudWatch** for real-time monitoring.
- Enable **Performance Insights** to analyze database workload performance.
- Configure **Enhanced Monitoring** for OS-level metrics.
- These configurations help **optimize performance, reliability, and security** for AWS RDS databases.
#### AWS RDS Summary
**Amazon Relational Database Service (RDS)** is a **fully managed database service** that simplifies the setup, operation, and scaling of **relational databases** in the cloud. It supports multiple database engines, including **MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Amazon Aurora**.
##### Key Features:
- **Deployment Options:** Single-AZ, Multi-AZ for high availability, Read Replicas for scalability, and Blue/Green Deployments for safe updates.
- **Storage Types:** General Purpose (GP3, GP2), Provisioned IOPS (IO1, IO2), and Magnetic (legacy).
- **Security & Compliance:** IAM authentication, SSL/TLS encryption, KMS for data encryption, and VPC isolation.
- **Backup & Maintenance:** Automated backups (up to 35 days), manual snapshots, and maintenance windows.
- **Performance Monitoring:** CloudWatch, Performance Insights, and Enhanced Monitoring.
##### Use Cases:
- Web and mobile applications requiring scalable relational databases.
- Enterprise applications with high availability and disaster recovery needs.
- Data warehousing and analytics with Amazon Aurora and RDS Read Replicas.
- RDS enables **easy database management** with **automatic scaling, patching, and failover capabilities**, making it a reliable and efficient solution for cloud-based relational databases. 🚀
### AWS Aurora
**Amazon Aurora** is a **fully managed, high-performance relational database service** designed for **high availability, scalability, and cost efficiency**. It is compatible with **MySQL and PostgreSQL**, offering up to **5x better performance** than standard MySQL and **3x better performance** than PostgreSQL, while maintaining compatibility with existing database tools and applications.
#### Key Features:
- **Highly Available & Durable** – Multi-AZ architecture with **automatic failover** and **6-way replication** across 3 Availability Zones.
- **Scalability** – Auto-scales storage up to **128 TB** with read replicas for performance optimization.
- **Performance & Cost Efficiency** – Faster query performance with **serverless and provisioned options**.
- **Security** – Supports **IAM authentication, encryption (at-rest and in-transit), and VPC isolation**.
- **Global Database** – Enables cross-region replication with **low-latency global access**.
#### Use Cases:
- **Enterprise applications** needing high availability and performance.
- **SaaS applications** requiring automatic scaling and multi-region replication.
- **Analytics & reporting** with read replicas for heavy read workloads.
Aurora provides **enhanced performance, reliability, and security** while being **fully managed**, reducing database maintenance overhead.
#### AWS Aurora Architecture Overview
**Amazon Aurora** is designed with a **distributed, fault-tolerant, and highly scalable architecture** to ensure **high performance, availability, and durability**.
##### Key Architectural Components:
1. **Storage Layer (Distributed & Auto-Scaling)**
- Aurora **automatically scales storage** up to **128 TB** per database instance.
- **6-way replication** across **3 Availability Zones (AZs)** ensures fault tolerance.
- **Log-based storage architecture** optimizes durability and crash recovery.
2. **Compute Layer (Decoupled & Scalable)**
- Aurora runs on **separate compute instances**, allowing independent scaling.
- Supports **Aurora Auto Scaling** for adjusting read replicas based on demand.
- **Aurora Serverless** enables automatic compute scaling based on usage.
3. **Replication & High Availability**
- **Multi-AZ support** with automatic failover to a standby replica in case of failure.
- **Aurora Replicas (up to 15)** for **read scaling** and disaster recovery.
- **Global Database** allows replication across regions with sub-second latency.
4. **Performance Optimizations**
- **Parallel query execution** and **low-latency data access** improve query performance.
- **Buffer cache survives instance restarts**, reducing recovery time.
##### Use Cases:
- **Mission-critical applications** requiring high availability and resilience.
- **SaaS and enterprise applications** needing global scalability.
- **Big data analytics** with high throughput and multiple read replicas.
Aurora’s **unique architecture** combines the scalability and cost-effectiveness of open-source databases with the reliability and performance of commercial databases.
#### AWS Aurora Types
##### 1. Aurora Standard (Provisioned)
- Traditional **provisioned** database instances with manually allocated compute resources.
- Ideal for **predictable workloads** requiring **high availability and performance**.
- Supports **Multi-AZ deployments**, **replicas**, and **automatic failover**.
##### 2. Aurora Serverless v2
- **Auto-scales compute resources** based on application demand.
- Eliminates the need to manage database capacity manually.
- Cost-efficient for **variable or unpredictable workloads**.
- Supports **Multi-AZ for high availability** and **replica scaling**.
##### 3. Aurora Global Database
- **Multi-region replication** with sub-second latency.
- Allows fast **disaster recovery** and **low-latency access** for global applications.
- Supports **automatic failover** and promotes a read replica to primary in another region.
#### Aurora v1 and v2 Differences
![[AWS_RDS_Aurora_v1v2_Differences.png]]
- v1 is essentially phased out & recommended to use v2.
### RDS - RDS proxy
**AWS RDS Proxy** is a **fully managed database proxy** that improves the **scalability, security, and availability** of **Amazon RDS** and **Aurora databases**. It **pools and manages database connections**, reducing the overhead on databases from **excessive open connections** and improving performance for **serverless applications** and **highly concurrent workloads**.
#### **Key Features:**
- **Connection Pooling** – Efficiently reuses and manages connections to prevent database overload.
- **High Availability** – Supports **Multi-AZ failover** for minimal downtime.
- **Security Integration** – Works with **IAM authentication and AWS Secrets Manager** for credential management.
- **Compatibility** – Supports **MySQL, PostgreSQL, and Aurora** databases.
- **Cost Optimization** – Reduces database compute costs by minimizing idle connections.
#### **Use Cases:**
- **Lambda-based serverless applications** that frequently open and close database connections.
- **Web applications with high concurrent user traffic** requiring **efficient connection management**.
- **Applications needing high availability** with **automatic failover support**.
AWS RDS Proxy helps improve **application responsiveness, reliability, and cost-efficiency** for **database-heavy workloads**. 🚀
### RDS - Demo
- A brief overview & demo of the RDS service & features.
### Redshift - Main
**Amazon Redshift** is a **fully managed, petabyte-scale data warehouse service** designed for **fast and scalable analytics**. It enables businesses to efficiently **run complex queries on large datasets** using standard SQL and integrates seamlessly with AWS services like **S3, Glue, and Athena**.
#### Key Features:
- **Columnar Storage & Compression** – Optimized for analytical queries by reducing I/O operations.
- **Massively Parallel Processing (MPP)** – Distributes queries across multiple nodes for faster execution.
- **Redshift Spectrum** – Allows querying **data stored in S3** without loading it into Redshift.
- **Concurrency Scaling** – Automatically adds capacity to handle spikes in workload demand.
- **Security & Compliance** – Supports **encryption, VPC isolation, and IAM authentication**.
#### Use Cases:
- **Big data analytics & business intelligence** (BI) workloads.
- **ETL processing** and data lake integration with **AWS Glue**.
- **Machine learning model training** with Redshift ML.
AWS Redshift is **cost-effective, scalable, and optimized for performance**, making it ideal for **high-speed analytical processing** in enterprise environments.
#### **AWS Redshift - Main Components**
![[AWS_Redshift_Components-0.png]]
##### **1. Cluster**
- A **Redshift cluster** is the fundamental unit, consisting of **one or more nodes**.
- It contains a **leader node** and **compute nodes** to process queries and store data.
##### **2. Nodes**
- **Leader Node**: Manages query coordination, planning, and execution.
- **Compute Nodes**: Store and process data in parallel using **Massively Parallel Processing (MPP)**.
- **Node Types**: Dense Compute (DC2) for high-performance workloads, Dense Storage (DS2) for large datasets, and RA3 for scalable storage.
##### **3. Slices**
- Each **compute node** is divided into **slices** that handle portions of the workload.
- Queries are **distributed across slices** for faster execution.
##### **4. Columnar Storage**
- Stores data **by column rather than row**, reducing I/O and improving query performance.
- Uses **compression** to optimize storage and query speed.
##### **5. Workload Management (WLM)**
- Allows users to define **query queues and priorities** to optimize performance.
- Ensures efficient resource allocation for different workloads.
##### **6. Redshift Spectrum**
- Enables **direct querying of data in Amazon S3** without loading it into Redshift.
- Expands Redshift capabilities for **big data lake analytics**.
##### **7. Security & Encryption**
- Supports **VPC isolation, IAM authentication, and encryption (KMS & HSM)**.
- Ensures **data at rest and in transit** remains secure.
### Redshift - Serverless
**AWS Redshift Serverless** is a **fully managed, on-demand data warehouse service** that allows users to run **analytics at any scale without managing infrastructure**. It **automatically provisions, scales, and optimizes compute capacity** based on query workloads, eliminating the need for manual cluster management.
#### **Key Features:**
- **No Cluster Management** – Automatically scales resources up or down based on demand.
- **Pay-Per-Use Pricing** – Charges based on the actual compute used rather than pre-provisioned capacity.
- **Automatic Scaling & Optimization** – Adapts resources dynamically to match workload requirements.
- **Seamless Data Integration** – Works with **Amazon S3, Glue, Athena, and BI tools** for ETL and analysis.
- **Security & Compliance** – Supports **IAM authentication, VPC isolation, and encryption** for secure data processing.
#### **Use Cases:**
- **Ad-hoc analytics** without the need for dedicated Redshift clusters.
- **Data warehousing for unpredictable or variable workloads**.
- **Business intelligence (BI) and data lake queries using Redshift Spectrum**.
Redshift Serverless is ideal for **organizations needing scalable, cost-efficient, and fully automated data analytics** without the operational overhead of managing infrastructure.
### DynamoDB
- Reference [[NoSQL Databases]].
**Amazon DynamoDB** is a **fully managed NoSQL database service** designed for **high-performance, scalability, and low-latency applications**. It supports **key-value and document data models** and delivers **millisecond response times** at any scale. DynamoDB is **serverless**, meaning it automatically **scales, replicates data across multiple Availability Zones (AZs), and handles performance tuning** without manual intervention.
#### **Key Features:**
- **Fully managed & serverless** – No infrastructure management required.
- **Scalability & high availability** – Automatically scales throughput and storage.
- **Low-latency performance** – Single-digit millisecond response times.
- **Flexible data model** – Supports **key-value and document-based storage**.
- **Built-in security & compliance** – IAM authentication, encryption, and VPC integration.
- **Advanced capabilities** – DynamoDB Streams, Global Tables, and On-Demand mode for dynamic scaling.
#### **Use Cases:**
- **Real-time applications** – Gaming leaderboards, IoT device tracking, session storage.
- **Serverless applications** – Backend for Lambda, API Gateway, and mobile apps.
- **High-traffic workloads** – E-commerce, financial transactions, and personalized content delivery.
#### Summary
**Amazon DynamoDB** is a **highly scalable, fully managed NoSQL database service** that provides **fast, consistent, and low-latency performance**. It supports **key-value and document-based data models**, making it ideal for **high-traffic and real-time applications**. With **built-in security, automatic scaling, and global replication**, DynamoDB is a **powerful solution for modern cloud-native applications** requiring **high availability and reliability**.
#### **AWS DynamoDB Key Types**
- **Primary Key**
- A unique identifier for each item in a DynamoDB table.
- Two types of primary keys:
- **Partition Key (Hash Key)**
- **Composite Key (Partition Key + Sort Key)**
- **Partition Key (Hash Key)**
- A single attribute that uniquely identifies each item in the table.
- Used for fast lookup and determines how data is distributed across storage nodes.
- Example: A `UserID` as the partition key in a **Users table**.
- **Composite Primary Key (Partition Key + Sort Key)**
- A combination of **Partition Key** and **Sort Key** to create a unique identifier.
- Allows multiple items with the same **Partition Key**, but each must have a unique **Sort Key**.
- Example: A `CustomerID` as the partition key and `OrderDate` as the sort key in an **Orders table**.
- **Sort Key (Range Key)**
- Used with a **Partition Key** to enable efficient querying of sorted data.
- Supports range queries (e.g., retrieving all orders for a customer within a specific date range).
- **Global Secondary Index (GSI)**
- An index with a different partition key than the table’s primary key.
- Allows flexible querying by duplicating data into an alternative index.
- Example: A **Users table** with a primary key `UserID` but a GSI on `EmailAddress` to allow lookups by email.
- **Local Secondary Index (LSI)**
- An index that shares the **Partition Key** with the main table but has a different **Sort Key**.
- Enables efficient querying of **sorted data within a partition**.
- Example: An **Orders table** with `CustomerID` as the partition key and an LSI on `OrderStatus` to filter orders by status.
#### DynamoDB Streams
**DynamoDB Streams** is a **change data capture (CDC) feature** that captures a **time-ordered sequence of item-level modifications** in a DynamoDB table. It allows applications to **react to data changes in near real-time** by triggering AWS services like **Lambda, Kinesis, or EC2-based consumers**.
##### **Key Features**
- Captures **INSERT, UPDATE, and DELETE** operations on a DynamoDB table.
- Stores change records for **up to 24 hours**.
- Provides **at-least-once delivery** of change events.
- Enables event-driven architectures with **AWS Lambda triggers**.
- Supports different stream view types:
- **Keys only** – Captures only primary keys.
- **New image** – Captures the entire new version of the item.
- **Old image** – Captures the item's previous version before the change.
- **New and old images** – Captures both before and after images.
##### **Use Cases**
- **Event-driven processing** – Triggering Lambda functions for real-time updates.
- **Replication & synchronization** – Copying changes to another database or storage system.
- **Auditing & logging** – Keeping a history of changes for compliance.
- **Analytics & monitoring** – Processing data changes for reporting and insights.
DynamoDB Streams enables **efficient real-time processing** of database changes, making it a powerful tool for **serverless applications, analytics, and data replication**.
#### **DynamoDB Table Classes**
DynamoDB offers two **table classes** to optimize cost and performance based on workload patterns:
##### **1. Standard Table Class**
- **Best for**: Frequently accessed data with unpredictable or high read/write traffic.
- **Storage Cost**: **Higher** compared to the Infrequent Access table class.
- **Request Cost**: **Lower** for read/write operations, making it cost-effective for high-traffic applications.
- **Use Cases**:
- Web and mobile applications with **consistent and high traffic**.
- **Gaming leaderboards**, real-time analytics, and e-commerce applications.
##### **2. Standard-Infrequent Access (Standard-IA) Table Class**
- **Best for**: Tables with low read/write activity but large data storage needs.
- **Storage Cost**: **Lower**, making it ideal for long-term data retention.
- **Request Cost**: **Higher** per read/write operation.
- **Use Cases**:
- **Archival or historical data** that is infrequently accessed.
- **Compliance and audit logs** that need long-term storage.
##### **Key Differences**
| Feature | Standard Table Class | Standard-IA Table Class |
| ---------------- | --------------------------- | ------------------------------- |
| **Storage Cost** | Higher | Lower |
| **Request Cost** | Lower | Higher |
| **Use Case** | High-traffic apps | Infrequent data access |
| **Best For** | Web, gaming, real-time apps | Archival, logs, historical data |
### DynamoDB - Demo
- A brief overview & demo of DynamoDB & features.
### LAB - Scaling DynamoDB using Autoscaling
- Task Request - Part 0:
- Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/.
- In the navigation pane on the left side of the console, choose Dashboard.
- On the right side of the console, choose Create Table.
- Enter the table details as follows:
- a. For the table name, enter Music.
- b. For the partition key, enter Artist.
- c. Enter SongTitle as the sort key.
- d. Leave Default settings selected.
- Choose Create to create the table.
- Task Request - Part 1:
- Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/.
- In the navigation pane on the left side of the console, choose Tables.
- In the table list, choose the Music table.
- Select Explore Table Items.
- In the Items view, choose Create item.
- Enter the following values for your item:
- a. For Artist, enter **KodeKloud** as the value.
- b. For SongTitle, enter **AWS Certified Solutions Architect**.
- Task Request - Part 2:
- Delete the Music table.
### DynamoDB Accelerator (DAX)
**Amazon DynamoDB Accelerator (DAX)** is a **fully managed, in-memory caching service** designed to **accelerate read performance** for DynamoDB. It **reduces response times from milliseconds to microseconds** by caching frequently accessed data, making it ideal for **high-read and low-latency applications**.
#### **Key Features**
- **In-memory cache** – Stores frequently accessed data for ultra-fast lookups.
- **Microsecond response times** – Speeds up read-heavy workloads.
- **Fully managed** – No need to manage caching infrastructure.
- **Highly scalable** – Supports millions of requests per second.
- **Strong consistency support** – Ensures cache remains synchronized with DynamoDB.
- **Integration with existing DynamoDB API** – Requires minimal code changes.
#### **Use Cases**
- **Real-time applications** – Gaming, social media feeds, and recommendation engines.
- **High-traffic APIs** – E-commerce product catalogs and IoT data retrieval.
- **Latency-sensitive workloads** – Applications requiring sub-millisecond response times.
DAX enhances **DynamoDB performance and scalability**, making it a powerful solution for **low-latency, high-read applications** without impacting database consistency.
### OpenSearch
**Amazon OpenSearch Service** is a **fully managed search and analytics service** that enables users to perform **fast, scalable searches on large datasets**. It is based on **OpenSearch (a fork of Elasticsearch)** and provides real-time indexing, full-text search, and data visualization capabilities.
#### **Key Features**
- **Full-Text Search** – Supports powerful text-based querying across structured and unstructured data.
- **Real-Time Analytics** – Enables log monitoring, observability, and security analytics.
- **Scalability & High Availability** – Supports **multi-AZ deployments, automated scaling, and backups**.
- **Built-in Security** – Integrates with **IAM, VPC, encryption, and fine-grained access control**.
- **Integration with AWS Services** – Works with **Kinesis, CloudWatch, S3, and DynamoDB Streams**.
- **Kibana & OpenSearch Dashboards** – Provides intuitive data visualization and monitoring.
#### **Use Cases**
- **Log & Event Monitoring** – Analyzing application logs and security data.
- **Enterprise Search** – Powering search functionality for websites and internal data.
- **Real-Time Data Analytics** – Processing and visualizing business and IoT data.
- **Security Analytics** – Detecting anomalies in security logs.
AWS OpenSearch offers **powerful, real-time search and analytics capabilities**, making it ideal for **log management, observability, and search-driven applications**.
#### Types of OpenSearch
- AWS OpenSearch Service
- AWS OpenSearch Serverless
#### **AWS OpenSearch Components Overview**
AWS OpenSearch consists of several key components that enable **search, analytics, and real-time data processing**.
##### **1. Cluster**
- A collection of **OpenSearch nodes** working together.
- Manages **indexing, searching, and storing data**.
- Can be deployed in a **single-AZ or multi-AZ** setup for high availability.
##### **2. Nodes**
- **Master Node** – Manages cluster health, node allocation, and metadata.
- **Data Node** – Stores indexed data and processes search queries.
- **Ingest Node** – Pre-processes data before indexing (used for transformations).
##### **3. Indices**
- Logical collections of **documents** similar to tables in a relational database.
- Each index contains **shards**, which store subsets of data.
- Supports **replication** for fault tolerance and performance optimization.
##### **4. Shards & Replicas**
- **Shards** – Break data into smaller segments for parallel processing.
- **Replicas** – Duplicate shards for **fault tolerance** and **load balancing**.
- The number of **shards and replicas** can be configured per index.
##### **5. Query DSL (Domain-Specific Language)**
- OpenSearch uses **JSON-based queries** for powerful **search and filtering**.
- Supports **full-text search, structured queries, aggregations, and ranking**.
##### **6. OpenSearch Dashboards**
- A visualization tool similar to **Kibana**, used for monitoring and analyzing data.
- Provides **interactive charts, graphs, and dashboards** for log and metric analysis.
##### **7. Security & Access Control**
- Supports **IAM authentication, VPC integration, fine-grained access control, and encryption**.
- Offers **role-based access (RBAC)** to restrict data access at different levels.
##### **8. Integration with AWS Services**
- **Kinesis Data Firehose** – Streams real-time log data to OpenSearch.
- **CloudWatch Logs** – Sends AWS logs for monitoring and analysis.
- **S3 & DynamoDB Streams** – Enables real-time indexing from AWS data sources.
#### **AWS OpenSearch Integrations**
AWS OpenSearch integrates seamlessly with various AWS services to enhance **search, analytics, and monitoring capabilities**.
##### **1. Data Ingestion & Streaming**
- **Amazon Kinesis Data Firehose** – Streams real-time log and event data into OpenSearch for indexing.
- **AWS Glue** – ETL service that transforms and loads structured/unstructured data into OpenSearch.
- **Amazon S3** – Index and analyze logs, documents, and large datasets stored in S3.
- **Amazon DynamoDB Streams** – Captures table changes and sends them to OpenSearch for indexing.
##### **2. Monitoring & Observability**
- **Amazon CloudWatch Logs** – Streams AWS service logs to OpenSearch for monitoring and alerting.
- **AWS Lambda** – Processes and sends logs, metrics, and events to OpenSearch for real-time analysis.
- **AWS X-Ray** – Traces application requests and integrates with OpenSearch for distributed tracing analysis.
##### **3. Security & Access Control**
- **AWS IAM** – Manages authentication and access control for OpenSearch.
- **AWS Key Management Service (KMS)** – Encrypts OpenSearch data at rest.
- **Amazon Cognito** – Provides user authentication and access control for OpenSearch Dashboards.
##### **4. Machine Learning & AI**
- **Amazon SageMaker** – Uses OpenSearch as a data source for training and deploying ML models.
- **Kendra Search Integration** – Enhances OpenSearch-based search applications with AI-driven enterprise search.
##### **5. Business Intelligence & Analytics**
- **Amazon QuickSight** – Connects to OpenSearch for interactive data visualization and analytics.
- **OpenSearch Dashboards** – Built-in visualization tool for creating dashboards and monitoring search trends.
##### **6. Application & API Integration**
- **Amazon API Gateway** – Enables RESTful API search endpoints backed by OpenSearch.
- **Amazon EventBridge** – Routes AWS events to OpenSearch for real-time processing.
AWS OpenSearch's deep integration with **AWS services** makes it a **powerful tool for real-time analytics, log management, security monitoring, and machine learning applications**.
### OpenSearch - Demo
- A brief overview & demo of the OpenSearch service & features.
- Reference OpenSearch documentation - Getting Started
### ElastiCache
AWS ElastiCache is a **fully managed, in-memory caching service** that improves application performance by reducing **latency and database load**. It supports two caching engines: **Memcached** and **Redis**, both designed to **accelerate read-heavy and compute-intensive workloads**.
#### Key Features
- Low-latency, high-throughput performance for real-time applications.
- Fully managed, including patching, monitoring, and scaling.
- Supports Multi-AZ replication for high availability and disaster recovery.
- Integration with AWS services like RDS, DynamoDB, and Lambda.
- Enhanced security with VPC isolation, IAM authentication, and encryption.
#### ElastiCache Engines
- **[[AWS SAA - Services - Databases#Memcached Overview|Memcached]]** – Simple, high-speed caching for ephemeral data storage.
- **[[AWS SAA - Services - Databases#Redis Overview|Redis]]** – Advanced caching with **persistence, data replication, and pub/sub messaging**.
#### Use Cases
- Database query caching to reduce load on relational or NoSQL databases.
- Real-time session storage for gaming, e-commerce, and web applications.
- Leaderboards, recommendation engines, and machine learning inference caching.
- Distributed locking and messaging with Redis for microservices architectures.
AWS ElastiCache helps **optimize performance, scale applications, and reduce database costs**, making it ideal for **real-time applications and in-memory processing**.
#### Redis Overview
Redis (**Remote Dictionary Server**) is an **open-source, in-memory data store** known for **high-performance caching, persistence, and advanced data structures**. It supports **key-value storage**, real-time analytics, and **pub/sub messaging**.
##### Key Features
- **Persistence** – Supports AOF (Append-Only File) and RDB (Snapshot) persistence options.
- **Data Structures** – Supports strings, lists, sets, sorted sets, hashes, bitmaps, and more.
- **Replication & Clustering** – Multi-AZ, read replicas, and automatic failover for high availability.
- **Transactions & Scripting** – Uses Lua scripting and atomic transactions.
##### Use Cases
- **Session storage** for web applications and gaming.
- **Message brokering** via **pub/sub** messaging.
- **Leaderboard rankings** for real-time applications.
- **Real-time analytics** and machine learning caching.
#### Memcached Overview
Memcached is a **high-performance, distributed memory caching system** designed for **simple key-value storage and rapid data retrieval**. It is best suited for **stateless, ephemeral caching** with **horizontal scaling**.
##### Key Features
- **Simple key-value caching** with no complex data structures.
- **Multi-threaded architecture** for handling large workloads efficiently.
- **Horizontal scaling** across multiple nodes for distributed caching.
- **No persistence** – Data is lost when the server restarts.
##### Use Cases
- **Database query caching** to reduce load and response time.
- **Object caching** for web applications and API responses.
- **Session caching** for user authentication and shopping carts.
- **Dynamic content caching** for frequently accessed data.
#### Redis vs. Memcached
| Feature | Redis | Memcached |
| ---------------------------- | ------------------------------------------- | -------------------------- |
| **Persistence** | Yes (AOF, RDB) | No |
| **Data Structures** | Advanced (Lists, Sets, Hashes, Sorted Sets) | Simple (Key-Value) |
| **Replication & Clustering** | Yes | No |
| **Scaling** | Read Replicas, Clustering | Horizontal Scaling |
| **Best Use Case** | Complex caching, messaging, analytics | Simple, high-speed caching |
**Redis is best for advanced caching, real-time analytics, and messaging**, while **Memcached is ideal for lightweight, high-speed caching** with simple key-value storage.
### MemoryDB for Redis
AWS MemoryDB for Redis is a **fully managed, durable, in-memory database service** designed for **ultra-fast performance with high availability and persistence**. Unlike ElastiCache for Redis, MemoryDB **stores data both in-memory and with durable multi-AZ replication**, ensuring **low-latency performance** while maintaining data integrity.
#### Key Features
- **In-memory performance** with sub-millisecond latency.
- **Multi-AZ durability** with automatic replication and failover.
- **Full Redis compatibility** with support for native Redis data structures and commands.
- **Strong consistency** using distributed transactional log storage.
- **Encryption & Security** with AWS IAM, KMS encryption, and VPC integration.
#### Use Cases
- **Caching with persistence** – Ideal for applications needing high-speed data access with durability.
- **Real-time applications** – Leaderboards, session storage, and financial transaction processing.
- **Event-driven microservices** – Message queues and real-time analytics.
- **Gaming & AI inference** – Low-latency access for fast decision-making.
AWS MemoryDB for Redis provides **the speed of Redis with built-in durability**, making it ideal for applications requiring **high-performance, real-time processing with persistent data storage**.