Extensible Markup Language – CREATE DATABASE dbName; GO
The Extensible Markup Language (XML) file format has been around for many years. It was a very nice advancement coming from its predecessors like CSV or TXT files. What XML has that CSV and TXT files do not is the ability to strongly type the contents of the file. For example, numbers can be represented as either integers or strings, based on the code that is going to interpret it. An XML Document Type Definition (DTD) or XML Schema Definition (XSD) file would be used in combination with the XML that identified the types. For example, here is an example of a brainwave.dtd file:
<xs:element name=”brainwave”><xs:complexType><xs:sequence><xs:element name=”Scenario” type=”xs:string”/><xs:element name=”Counter” type=”xs:int”/><xs:element name=”Electrode” type=”xs:string”/><xs:element name=”THETA” type=”xs:decimal”/><xs:element name=”ALPHA” type=”xs:decimal”/><xs:element name=”GAMMA” type=”xs:decimal”/></xs:sequence></xs:complexType></xs:element>
Additionally, XML supports building arrays and objects from the file structure, which isn’t optimal or even possible when receiving rows of comma‐ or semicolon‐delimited data in CSV or TXT files. Here is an example of an XML file that references the XML DTD file:
<!DOCTYPE brainwave SYSTEM “brainwave.dtd”><Session><Scenario>TikTok</Scenario><Counter>5</Counter><Electrode>AF3</Electrode><THETA>9.681</THETA><ALPHA>3.849</ALPHA><GAMMA>0.738</GAMMA></Session>
You can save the XML content into a file using the classes and methods found in the System.Xml namespace. The following C# syntax illustrates how to achieve this:
public class Brainwave{public string Scenario;public int Counter;public string Electrode;public decimal THETA;public decimal ALPHA;public decimal GAMMA;public static WriteXML(){Brainwave brainwave = newBrainwave();System.Xml.Serialization.XmlSerializerwriter=newSystem.Xml.Serialization.XmlSerializer(typeof(Brainwave));System.IO.FileStream file = System.IO.File.Create(“brainwave.xml”);writer.Serialize(file, brainwave);file.Close();}
Once the files are written, the files can be used and sent to other applications, which can parse and use the data within it. Although this format is not the most effective for Big Data analytics, due to the historical significance of this format there are likely many systems that use this file format as a basis for their solution. Therefore, it is a valid format to know and understand, but do not use it when designing a new data analytics solution. The best uses of XML include the following:
- When the data requires validation
- When the file contains a mixture of content
- When working with WCF
Yet Another Markup Language
Yet Another Markup Language (YAML) is one of the newest file formats. You will find it permanently in the context of Docker container configuration and deployments. That approach is commonly referred to as infrastructure as code and is often used to define pipelines with Azure DevOps and GitHub Actions, also for deployments. However, YAML can also be used to store data in the same way, similar to JSON. Here is an example of YAML syntax:
Session:
# This is a comment
Scenario: TikTok
Counter: 5
Electrode: AF3
THETA: 9.681
ALPHA: 3.849
GAMMA: 0.738
One benefit of this format, which you might see immediately, is that there are no more curly or square brackets. YAML can be considered a superset of JSON with a few extra features such as comments, anchoring, and aliasing. Simply precede a line in the file with a # character and it is considered a comment. Anchoring and aliasing has to do with removing the necessity to duplicate data throughout the file. Also, the alias simplifies updates to an attribute, where the change needs to only happen in a single place, instead of multiple ones.
Session:
# This is a comment
Scenario: &scenario TikTok
Counter: 5
Electrode: AF3
THETA: 9.681
ALPHA: 3.849
GAMMA: 0.738
Scenario: *scenario
Counter: 6
Electrode: Pz
THETA: 8.392
ALPHA: 4.142
GAMMA: 1.106
Notice that the first instance of Scenario: is followed by an ampersand, &, then the name of the identifier followed by the value. In all future locations where this same value is needed, enter the identifier preceded with a star instead of placing the actual value. YAML isn’t prime time yet and not the primary choice for doing Big Data analytics. But it is up and coming and worthy of being called out. The best uses of YAML include when you have been using JSON but want the additional features available in YAML.
The Extensible Markup Language (XML) file format has been around for many years. It was a very nice advancement coming from its predecessors like CSV or TXT files. What XML has that CSV and TXT files do not is the ability to strongly type the contents of the file. For example, numbers can be represented…
Archives
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- July 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- May 2022
- April 2022
- February 2022
- January 2022
- December 2021
- October 2021
- September 2021
- August 2021
- June 2021
- May 2021
- April 2021
Contact US