Amazon Redshift Data warehouse setup and bring data from Apps to Data warehouse

  • We are setting up new data warehouse with Amazon Redshift and building integrations to bring data from enterprise apps' specific databases to data warehouse.

    Would like to hear from experts are there any best practices to do these integrations consistent, reliable and long term easy maintainable.

    Appreciated for your thoughts.

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I do not have a good answer for you. However, I've reached out to see if I can find one.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • There is vast AWS documentation about the AWS tools, learning in progress will update in the forums. Thanks much.

    AWS-Datalake-Architecture

  • Maybe a little too late to answer, but I will still answer it anyway.

    Based on my experience, I would recommend following

    You can get the maximum benefit of Redshift if your data is time-series data.

    Incremental Loading: Implement an incremental loading strategy by adding/uploading only the new data in Redshift. Update operations in Redshift are expensive compared to copy command (uploading) or inserting new records.

    Consider using Redshift-specific transformations like SORT KEY and DISTKEY to improve query efficiency.

    Data Loading: Utilize the COPY command to efficiently load data into Redshift. Try to keep the number of files the same as the number of slices in the Redshift cluster to get the most out of to copy command. Batch load data whenever possible to minimize the impact on system resources.

    Scalability and Performance: Design the integration processes to scale with increasing data volumes. Optimize Redshift's cluster configuration, utilize distribution styles and sort keys appropriately, and consider implementing workload management (WLM) to manage concurrent queries effectively. Redshift Spectrum is also a good option that separates storage from CPU and the same with new RA3 instances.

    Ensure compliance with relevant regulations and standards, such as GDPR by adhering to data encryption, access controls, and data governance practices especially if you have servers in the Euro region and you are reporting PII data in any other non-Euro region

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply