What is Amazon Redshift?
Amazon Redshift is a fast, scalable data warehouse that makes it simple and cost-effective to analyze all your data across your data warehouse and data lake.
Amazon Redshift is a fully managed cloud-based data warehouse product designed for large-scale data set storage and analysis. It is also used to perform large-scale database migrations.
Redshift’s column-oriented database is designed to connect to SQL-based clients and business intelligence tools, making data available to users in real time. Based on PostgreSQL 8, Redshift delivers fast performance and efficient querying that help teams make sound business analyses and decisions.
Amazon reference
What is a Redshift Cluster?
An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases.
Benefits
- Deepest integration with your data lake and AWS services
- Best performance
- Most scalable
- Best Value
- Easy to manage
- Most secure and compliant
Read a more detailed reference here.
Getting Started
- Log in to the AWS Console
- Enter Username and Password
- Go to services and then go to Analytics
- Search Amazon Redshift
- Ready to enter in Redshift Dashboard.
On the right side, you will see the option to create a Cluster.
On the left side, you can see other dashboard options like Dashboard, Clusters, Queries, etc.
In this article, I am not going too in-depth, I'll only explain how it's easy to get started with Amazon Redshift in the following steps.
- How to create a cluster
- Create schema and tables,
- How to load data and integrate ETL
- Run Queries and Integrate BI Tools
- How to monitor and tune queries.
Click on the Create cluster button and provide all configuration details like Cluster identifier, I am using a Free trial.
Provide database configurations like a database name, port, username, and password.
You must wait until the cluster is created, then you can see the update in the CLUSTERS icon.
Once the cluster status is Available, go to the Editor tab and connect to the created cluster. Enter a database name, username, and password, then click Connect.
The query editor looks like this.
Now I have a CSV file copied in my S3 bucket, I am going to create a table and load data in the table from the S3 bucket. My database looks like this.
Let us run 2 commands in the editor, one to create a new table and another to copy data from the s3 bucket to the redshift table. Run both queries manually one by one.
Queries
Create Table
CREATE TABLE orders (
OrderDate datetime NULL,
Region nvarchar(255) NULL,
Rep nvarchar(255) NULL,
Item nvarchar(255) NULL,
Units float NULL,
Total float NULL
);
Copy data from S3
COPY orders (OrderDate, Region, Rep, Item, Units, Total)
FROM 's3://rajsamplebucket/SalesOrders.csv'
IAM_ROLE '<Role-ARN>'
CSV
IGNOREHEADER 1;
Note. Make sure the given role has an AmazonS3FullAccess policy attached.
Let's check if our table has data or not.
Write the command in the editor.
SELECT *
FROM public.orders
LIMIT 10;
You can visualize data using the Visualize button.
Line Chart
Bar Chart
Conclusion
In this article, we learned how to get started with Amazon Redshift and how to create clusters along with schema, tables, and how to load data from S3 to Redshift table.