Azure Data Lake Storage – Gaining the Azure Data Engineer Associate Certification
Azure Data Lake Storage (ADLS) is a fundamental piece of most enterprise data analytics solutions running on Azure. This product is optimized for Big Data analytics workloads. ADLS accomplishes this by providing storage capacity of up to multiple exabytes of data and supplying access to that data at a throughput of hundreds of gigabytes per second. ADLS Gen2 supports the open source platforms described in Table 1.4.
TABLE 1.4 ADLS‐supported platforms
Platform | Supported version |
Azure Databricks | 5.1+ |
Cloudera | 6.1+ |
Hadoop | 3.2+ |
HDInsight | 3.6+ |
Hortonworks | 3.1.x+ |
ADLS Gen2 can also be easily integrated with many Azure products, such as Azure Data Factory, Azure Event Hub, Azure Machine Learning, Azure Stream Analytics, IoT Hub, Power BI, and Azure SQL databases. Additional information and capabilities include the following:
- Gen1 vs. Gen2
- Hadoop Distributed File System (HDFS)
- ACL and POSIX security model
- Hierarchical namespaces
Gen1 vs. Gen2
ADLS Gen1 will be retired as of February 29, 2024. Therefore, we don’t recommended that you build any new solutions on that version. As mentioned earlier, Azure Data Lake Analytics uses Gen1; therefore, we also don’t recommended building new data analytics solutions with that product either. ADLS Gen2 supports all the capabilities that exist in ADLS Gen1. The significant change is that Gen2 is now aligned with and built on Azure Blob Storage. Building on top of Azure Blob Storage (described later) makes ADLS Gen2 more cost effective and provides diagnostic logging capabilities and access tiers.
Hadoop Distributed File System
If you have used HDFS in the past, you can expect the same experience when using ADLS. This has to do with how you and the operating system interact with data files. Reading, writing, copying, renaming, and deleting are most of the activities you would expect to be able to perform. The Azure Blob Filesystem (ABFS) driver is available on all Apache Hadoop environments such as Azure Synapse Analytics, Azure Databricks, and Azure HDInsight. ABFS has some major performance improvements over the previous Windows Azure Storage Blob (WASB) driver when it comes to renaming and deleting files. Examples of HDFS commands to create a directory, to copy data from local storage to a cluster, and to list the contents of a directory are shown here:
hdfs dfs -mkdir /brainjammer/
hdfs dfs -copyFromLocal meditation.json /brainjammer/
hdfs dfs -ls /brainjammer/
Azure Data Lake Storage (ADLS) is a fundamental piece of most enterprise data analytics solutions running on Azure. This product is optimized for Big Data analytics workloads. ADLS accomplishes this by providing storage capacity of up to multiple exabytes of data and supplying access to that data at a throughput of hundreds of gigabytes per…
Archives
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- July 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- May 2022
- April 2022
- February 2022
- January 2022
- December 2021
- October 2021
- September 2021
- August 2021
- June 2021
- May 2021
- April 2021
Contact US