Data warehousing in Microsoft Azure

Azure Data Warehousing Solution 

Components 

Azure Blob Storage 

Azure Blob Storage is a Massively scalable object storage for any type of unstructured data-images, videos, audio, documents, and more-easily and cost-effectively. 

Data Factory  

Data factory is a hybrid data integration services that allows you to create, schedule and orchestrate your ETL/ELT workflows. 

Azure Databricks 

Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform (is an open-source parallel processing framework and fast-clustering computing system.) 

Azure Synapse Analytics 

Azure synapse analytics is the fast, flexible and trusted cloud data warehouse that lets you scale, compute and store elastically and independently, with a massively parallel processing architecture. Azure 

Power BI 

Power BI is a suite business analytics tool that deliver insights throughout organization. Connect to hundreds of data sources, simplify data prep, and drive ad hoc analysis. Produce required reports, then publish them for the organization to consume on the web and across mobile devices. 

Whole Process Taken Place in UI 

Navigate to Microsoft Azure Portal and the portal page will be as  

Create Storage Account 

For creating a new storage account we have to create a new Resource group. Resource group is a logical container in order to keep all the details for a solution which we are creating. 

It will be more helpful when we keep related resources under a Resource group and easy for maintain the resources 

It’ll be easy to assign some properties to a resource by resource group level. 

The transmission resources from one subscription to another subscription is available by creating ARM template of a resource group, then move the template from one subscription to another subscription. 

Performance is standard and is low cost, when we are dealing with large number of data’s premium is the better option. 

There are three Blob access tier’s Hot, Cool, Archive. Hot will be used in the scenarios which the data will use frequently and Hot access tier is high cost. When the data will not be using frequently then use Cool access tier “Low cost”. The scenarios in which Archive access tier will use when some data will not be used for a long , storing for future use. 

After creating the deployment will started.

Create a Container (Blob Storage) 

Create a container  

After creating we can upload the file from the system. Upon clicking the files we can see the properties of each files. It is possible to edit the file in Container itself , can change the tier , delete the file etc. 

When we using the container it is not possible to upload folders it’ll only supports when uploading files. 

If we want a hierarchy level structure , then have to do it in manually. 

The folders are virtual folders when the data’s were deleted then the folder will not stand as empty. 

Load Data From Storage account to a SQL Database 

Load from storage account to a SQL Database by using Azure Data Factory 

Create SQL Database 

SQL server is needed for creating a SQL database, so create a sql server in that page itself. For SQL server creation we need admin credentials.  

Selecting the resource which we created before. Once completed configure the database. 

Should be very careful when configure the database ,because the cost is depends upon DTU’s (Database processing capacity)and Data max size(Database size). 

After creating the database we should copy the server name which we use to connect the database from the management studio. 

For connecting the database server to management studio ,it is needed to have server name and admin login credentials. 

Before we connecting the database it is necessary to check whether the IP address was provided or not.  

Creating a table by using SQL server management studio. After that load the data using data factory. 

Create Data Factory 

Create a data factory and set resource group as we created before. 

We can select the repository “Azure DevOps repository” or “Git-Hub”. After creating the data factory we can create pipeline , data factory have pipelines. 

For demo we were using Copy Data (pipeline) and we can schedule it if we want or can run once now .When the schedule option selected it is necessary to give all required details for processing.   

In source part ,need to create connection string. In order to create a connection string select Azure blob storage in linked services and load the data. 

For destination data store we need to create a connection string, for that use Azure SQL Database. Provide admin login credentials to create. 

It is necessary to provide firewall setup to SQL server in order to create the connection string. “Allow Azure services and resources to access this server”. 

Microsoft Azure is providing fault tolerance settings for avoiding the issued data. 

Once the deployment done we can verify the database by viewing table in SQL management studio. By this way we can create multiple pipelines for data processing. There are so many pipelines are there(Data flow, Append, Delete etc.). 

As similar in Data flow pipeline we can process the data by creating data flow and add source, we can also see the transformation available in data source and the destination as well. 

On clicking the validate all ,it’ll validate all the activities and remind when there was anything missing. After validation publish all the pipeline , and trigger the pipeline once it was saved in the account. 

Create Azure Data Bricks 

Create a Azure data bricks from same resource group which we created before. 

We were creating this data bricks for single work space and in single workspace multiple team members can work. Data brick deployment will take some time to be complete. 

After deployment we can see a page having launch workspace button  

Upon click the launch workspace button it’ll navigate to a new page. 

Create a Cluster 

Creating a cluster in standard mode. All the details we can set in cluster ,we can set terminate option when no one is using the cluster it’ll shutdown automatically , we can decide the worker type and workers limit. 

Create a Notebook 

Add cluster name which we have created before ,after creating the notebook we can write code in the notebook. 

Create a Azure Synapse Database 

It is similar to SQL Datawarehouse. 

The selected server is same as the server where we created SQL Database. It’ll take some to initialize .Connect the synapse database from SQL management studio by using Server name, admin login credentials and database name is required. 

After connecting to SQL Management studio, we can see the database , inside that table consist of dimension details which we can design the Datawarehouse. 

Create Synapse Analytics Workspace 

Add new Azure synapse analytics workspace by create a new account and file system. Need to add admin login credentials, the Azure synapse analytics workspace after validation will be 

In Azure synapse analytics workspace we can see a “Workspace web URL” , upon clicking it’ll navigate to a new page 

The workspace page will be as

Link Power BI workspace to your Synapse workspace

  • Starting from Synapse Studio, click Manage.
  • Under External Connections, click Linked services.

Click + New.

Click Power BI and click Continue.

Enter a name for the linked service and select a workspace from the dropdown list.

Click Create.

View Power BI workspace in Synapse Studio

Once your workspaces are linked, you can browse your Power BI datasets, edit/create new Power BI Reports from Synapse Studio.

Click Develop.

Expand Power BI and the workspace you wish to use.

New reports can be created clicking + at the top of the Develop tab. Existing reports can be edited by clicking on the report name. Any saved changes will be written back to the Power BI workspace.

Leave a comment

Your email address will not be published. Required fields are marked *