Azure Data Lake Analytics – Gaining the Azure Data Engineer Associate Certification

Azure Data Lake Analytics is used to run on‐demand data analytic jobs in parallel. The parallelism is achieved using Microsoft Dryad, which can compute data represented in directed acyclic graphs (DAGs). A DAG is a model helpful in the calculation of the “traveling salesman” scenario, in which there can be sequential proposed directions that never form a complete loop. Behind the scenes, there is an implementation of Apache Hadoop, which uses Apache YARN to manage the resources across clusters. Azure Data Lake Analytics also support the U‐SQL syntax, which combines SQL with C#. An example of U‐SQL is shown in the following snippet:

@alphareading =

 EXTRACT

 AF3Alpha     decimal,

 T7Alpha      decimal,

 PzAlpha      decimal,

 T8Alpha      decimal,

 AF4Alpha     decimal

 FROM “/brainjammer/playingguitar/reading001.tsv”

 USING Extractors.Tsv();

OUTPUT @alphareading

 TO “/output/brainjammer/playingguitar.csv”

 USING Outputters.csv();


Note also that Azure Data Lake Analytics requires the Gen1 version of Azure Data Lake Storage; Gen2 is the most current one. If you need these options, then choose this product; otherwise, choose Azure Synapse Analytics to perform your data analytics on Azure since Azure Data Lake Analytics is retiring in the near future.

Power BI/Power BI Embedded

Power BI is a tool used to visualize data. The Power BI desktop application supports connecting to numerous datastores. Once connected, many visualizations can then be applied to the data. The following is a list of a few of the built‐in visualizations:

  • Area charts
  • Stacked column charts
  • Maps
  • Matrices
  • Key performance indicators

The data accessed from Power BI has typically already been through the data analytics and data modeling process. The result of those activities is then visualized using this product. Reporting and the creation of dashboards are also common uses for Power BI.

Power BI Embedded is an online SaaS service and is a means of delivering customer‐facing analytics, reports, and dashboards via a web application or website. The benefit is that everyone who wants or needs to consume the results of your data analytics solutions is not required to have a Power BI license. Instead, the results can be placed online and accessed without any software installation requirements.

Azure Storage Products

Azure Storage is a group of products that provide a secure, scalable, and highly available solution for storing your data. This product grouping offers numerous capabilities and also serves as a place to store blobs and files rather than storing rows and columns of data into a database management system (DBMS). An Azure Storage account provides a NoSQL store and a messaging queue, both of which have higher scale alternatives. Read on to learn about each Azure Storage feature and its purpose and possible alternatives.

Azure Data Lake Analytics is used to run on‐demand data analytic jobs in parallel. The parallelism is achieved using Microsoft Dryad, which can compute data represented in directed acyclic graphs (DAGs). A DAG is a model helpful in the calculation of the “traveling salesman” scenario, in which there can be sequential proposed directions that never…

Leave a Reply

Your email address will not be published. Required fields are marked *