Extensible Markup Language – CREATE DATABASE dbName; GO

The Extensible Markup Language (XML) file format has been around for many years. It was a very nice advancement coming from its predecessors like CSV or TXT files. What XML has that CSV and TXT files do not is the ability to strongly type the contents of the file. For example, numbers can be represented as either integers or strings, based on the code that is going to interpret it. An XML Document Type Definition (DTD) or XML Schema Definition (XSD) file would be used in combination with the XML that identified the types. For example, here is an example of a brainwave.dtd file:

<xs:element name=”brainwave”><xs:complexType><xs:sequence><xs:element name=”Scenario” type=”xs:string”/><xs:element name=”Counter” type=”xs:int”/><xs:element name=”Electrode” type=”xs:string”/><xs:element name=”THETA” type=”xs:decimal”/><xs:element name=”ALPHA” type=”xs:decimal”/><xs:element name=”GAMMA” type=”xs:decimal”/></xs:sequence></xs:complexType></xs:element>

Additionally, XML supports building arrays and objects from the file structure, which isn’t optimal or even possible when receiving rows of comma‐ or semicolon‐delimited data in CSV or TXT files. Here is an example of an XML file that references the XML DTD file:

<!DOCTYPE brainwave SYSTEM “brainwave.dtd”><Session><Scenario>TikTok</Scenario><Counter>5</Counter><Electrode>AF3</Electrode><THETA>9.681</THETA><ALPHA>3.849</ALPHA><GAMMA>0.738</GAMMA></Session>

You can save the XML content into a file using the classes and methods found in the System.Xml namespace. The following C# syntax illustrates how to achieve this:

public class Brainwave{public string Scenario;public int Counter;public string Electrode;public decimal THETA;public decimal ALPHA;public decimal GAMMA;public static WriteXML(){Brainwave brainwave = newBrainwave();System.Xml.Serialization.XmlSerializerwriter=newSystem.Xml.Serialization.XmlSerializer(typeof(Brainwave));System.IO.FileStream file = System.IO.File.Create(“brainwave.xml”);writer.Serialize(file, brainwave);file.Close();}

Once the files are written, the files can be used and sent to other applications, which can parse and use the data within it. Although this format is not the most effective for Big Data analytics, due to the historical significance of this format there are likely many systems that use this file format as a basis for their solution. Therefore, it is a valid format to know and understand, but do not use it when designing a new data analytics solution. The best uses of XML include the following:

  • When the data requires validation
  • When the file contains a mixture of content
  • When working with WCF

Yet Another Markup Language

Yet Another Markup Language (YAML) is one of the newest file formats. You will find it permanently in the context of Docker container configuration and deployments. That approach is commonly referred to as infrastructure as code and is often used to define pipelines with Azure DevOps and GitHub Actions, also for deployments. However, YAML can also be used to store data in the same way, similar to JSON. Here is an example of YAML syntax:

Session:
  # This is a comment
  Scenario: TikTok
  Counter: 5
  Electrode: AF3
  THETA: 9.681
  ALPHA: 3.849
  GAMMA: 0.738

One benefit of this format, which you might see immediately, is that there are no more curly or square brackets. YAML can be considered a superset of JSON with a few extra features such as comments, anchoring, and aliasing. Simply precede a line in the file with a # character and it is considered a comment. Anchoring and aliasing has to do with removing the necessity to duplicate data throughout the file. Also, the alias simplifies updates to an attribute, where the change needs to only happen in a single place, instead of multiple ones.

Session:
  # This is a comment
  Scenario: &scenario TikTok
    Counter: 5
    Electrode: AF3
    THETA: 9.681
    ALPHA: 3.849
    GAMMA: 0.738
  Scenario: *scenario
    Counter: 6
    Electrode: Pz
    THETA: 8.392
    ALPHA: 4.142
    GAMMA: 1.106

Notice that the first instance of Scenario: is followed by an ampersand, &, then the name of the identifier followed by the value. In all future locations where this same value is needed, enter the identifier preceded with a star instead of placing the actual value. YAML isn’t prime time yet and not the primary choice for doing Big Data analytics. But it is up and coming and worthy of being called out. The best uses of YAML include when you have been using JSON but want the additional features available in YAML.

The Extensible Markup Language (XML) file format has been around for many years. It was a very nice advancement coming from its predecessors like CSV or TXT files. What XML has that CSV and TXT files do not is the ability to strongly type the contents of the file. For example, numbers can be represented…

Leave a Reply

Your email address will not be published. Required fields are marked *