Author: Terica Anderson

Create an Azure SQL DB – CREATE DATABASE dbName; GO

FIGURE 2.3 The Select SQL Deployment Option blade 2. Select the subscriptions and resource group where you want the database to reside. Enter a database name (I used brainjammer). Under the Server drop‐down box, click the Create New link to create a database server. Enter the server name (I used csharpguitar), and enter a server…

Read More

Comma‐Separated Values – CREATE DATABASE dbName; GO

A comma‐separated values (CSV) file is just that, a CVS file that contains data values separated by commas. Sometimes, the first row in a CVS file identifies the column names: Scenario,Counter,Electrode,THETA,ALPHA,GAMMATikTok,5,AF3,9.681,3.849,0.738TikTok,6,Pz,8.392,4.142,1.106 Loading that file into memory using Python would look something like the following. First you import the CVS library, then open the file and…

Read More

Optimized Row Columnar – CREATE DATABASE dbName; GO

This file format is a columnar format used in the Hadoop ecosystem, like parquet files. Both ORC and parquet files are often referred to as self‐describing, which means the information that describes the data in the file is contained within the file itself. Metadata typically accompanies the file. For example, in Windows, when you right‐click…

Read More

Data File Formats – CREATE DATABASE dbName; GO

Data comes in numerous forms, as stated earlier. The most common types and the types that are important to know about and be comfortable with for the exam are as follows: Let’s take a closer look at each of those formats, beginning with the most common, JSON. JavaScript Object Notation The JavaScript Object Notation (JSON)…

Read More

Volume – CREATE DATABASE dbName; GO

The amount of data being created has increased exponentially over the last decade and continues to do so. Two decades ago the typical amount of storage on a computer was measured in megabytes. Today, a few terabytes aren’t uncommon, especially when you recognize that there are approximately 2.5 quintillion bytes of data created each day.…

Read More

Apache Parquet File Format – CREATE DATABASE dbName; GO

Apache Parquet files are used in the Hadoop ecosystem. JSON, CSV, and XML are useful when it comes to sharing data between applications, whereas parquet files perform better for temporarily storing intermediate data between different stages in an application. To get an idea of how to work with this file type, consider the following code…

Read More

Extensible Markup Language – CREATE DATABASE dbName; GO

The Extensible Markup Language (XML) file format has been around for many years. It was a very nice advancement coming from its predecessors like CSV or TXT files. What XML has that CSV and TXT files do not is the ability to strongly type the contents of the file. For example, numbers can be represented…

Read More

Data Structures, Types, and Concepts – CREATE DATABASE dbName; GO

Now it is time to delve a bit deeper into some data concepts. Up to now you may have noticed that the chapter content has been introductory and perhaps not so much about data. In this section, we will focus on data structures, data types, and some general, yet complex, data concepts. You will need…

Read More

A Historical Look at Data – CREATE DATABASE dbName; GO

Humans have been collecting and storing data for thousands of years. The earliest example is a tally stick, which was a bone that people scratched lines into when counting supplies or tracking business activities of some kind. You have probably also heard of an abacus, which was the first dedicated device created for the purpose…

Read More

Tags– Gaining the Azure Data Engineer Associate Certification

When you provision an Azure resource in the Azure portal, one common step is requesting tags (see Figure 1.30). This gives you the option to add a query‐able identifier to the resource. For example, you can mark it as production, test, or development, or perhaps identify a contact person for the given resource. Regardless, this…

Read More