Skip to main content

YAML Tutorial for Quick Revision

I have covered all important YAML topics in this tutorial, which will both help you learn and revise YAML quickly.

Introduction to YAML

YAML (YAML Ain't Markup Language) is a human-friendly data serialization standard for all programming languages. Originally called "Yet Another Markup Language," it was invented by Clark Evans in 2001 and later renamed to emphasize its data-oriented nature.

History and Adoption

YAML quickly gained popularity due to its simplicity and readability. Major companies and projects that use YAML include:

  • Google (Kubernetes configuration)
  • Amazon (AWS CloudFormation)
  • Microsoft (Azure Pipelines)
  • Travis CI (build configuration)
  • Docker (Docker Compose files)

Pros and Cons

Pros:

  • Human-readable and easy to understand
  • Supports complex data structures
  • Language-independent
  • Widely supported across programming languages

Cons:

  • Sensitive to indentation errors
  • Can be verbose for large datasets
  • Some advanced features can be confusing for beginners

This tutorial will cover the basics of YAML and its various features, making it easier for you to work with configuration files, data storage, and more.

YAML Syntax Basics

YAML (YAML Ain't Markup Language) is designed to be easy to read and write. Let's explore the fundamental syntax of YAML, which includes indentation, key-value pairs, and basic structure.

Key-Value Pairs

The most basic YAML structure is a key-value pair:

key: value

Indentation

YAML uses indentation to represent nesting. Spaces are preferred over tabs:

parent:
  child: value
  another_child: another value

Lists

Lists in YAML are represented using hyphens:

languages:
  - Python
  - Java
  - PHP

Combining Structures

You can combine these basic structures to create more complex documents:

best_movies:
  - title: The Shawshank Redemption
    year: 1994
    director: Frank Darabont
  - title: The Godfather
    year: 1972
    director: Francis Ford Coppola
  - title: Pulp Fiction
    year: 1994
    director: Quentin Tarantino

Remember, consistency in indentation is crucial in YAML. Each level of indentation typically uses 2 spaces, but the most important thing is to be consistent throughout your document.

Data Types in YAML

YAML supports various data types, allowing for rich and expressive data representation. Let's explore the main data types in YAML:

Strings

Strings can be represented with or without quotes:

unquoted_string: Hello, World!
single_quoted_string: 'Hello, World!'
double_quoted_string: "Hello, World!"
multi_line_string: |
  This is a multi-line
  string in YAML.

Numbers

YAML supports integers, floating-point numbers, and scientific notation:

integer: 42
float: 3.14159
scientific_notation: 6.022e23

Booleans

Boolean values can be represented in various ways:

boolean_true: true
boolean_false: false
boolean_yes: yes
boolean_no: no

Null Values

Null values can be represented in multiple ways:

explicit_null: null
implicit_null: ~
empty_value:

Lists (Sequences)

Lists of movies are represented using hyphens:

top_movies:
  - title: The Shawshank Redemption
    year: 1994
  - title: The Godfather
    year: 1972
  - title: Pulp Fiction
    year: 1994

Dictionaries (Mappings)

Movie details use key-value pairs:

movie:
  title: Inception
  year: 2010
  director: Christopher Nolan

Complex Example

Here's a more complex example combining various data types:

movie:
  title: "Inception"
  release_year: 2010
  rating: 8.8
  is_scifi: true
  main_actor: null
  cast:
    - Leonardo DiCaprio
    - Joseph Gordon-Levitt
  plot: |
    A thief who enters the dreams of others
    to steal secrets from their subconscious.

Creating YAML Documents

Creating valid YAML documents requires understanding the structure and formatting rules. Here are the key points to remember:

Document Start and End

YAML documents can start with --- and end with ... (optional):

---
name: Example Document
version: 1.0
description: This is an example of a YAML document.
...

Indentation

Use consistent indentation (usually 2 spaces) for nested structures:

parent:
  child1: value1
  child2:
    grandchild: value2

Comments

Use # for comments:

# This is a comment
key: value # This is an inline comment

Multiple Documents

You can have multiple documents in a single file:

---
document: 1
---
document: 2

Complex Example

Here's a more complex example of a YAML document:

---
movie_collection:
  name: Classic Films
  curator: Mike Alan
  last_updated: 2024-09-28
  films:
    - title: Casablanca
      year: 1942
      director: Michael Curtiz
    - title: Citizen Kane
      year: 1941
      director: Orson Welles
    - title: Gone with the Wind
      year: 1939
      director: Victor Fleming
  total_films: 3
  description: |
    A collection of classic films from the golden age of Hollywood.
    These films have stood the test of time and continue to inspire
    filmmakers and audiences alike.
...

Remember to validate your YAML documents using a YAML parser or linter to ensure they are correctly formatted and free of syntax errors.

YAML Comments

Understand how to use comments in YAML to annotate your documents without affecting the data structure. Comments are crucial for improving code readability and providing context for other developers.

# This is a single-line comment
key: value # This is an inline comment

# Multi-line comments can be achieved by using multiple single-line comments
# Like this
# And this
# Movie Database Configuration
movies:
  # Classic movies section
  - title: The Godfather
    year: 1972 # Release year of this iconic film
    director: Francis Ford Coppola
  # Modern movies section
  - title: Inception
    year: 2010 # Christopher Nolan's mind-bending masterpiece
    director: Christopher Nolan
# Add more movies as needed

YAML Scalars

Explore the different scalar types in YAML, including strings, numbers, booleans, and null values. Scalars are the basic building blocks of YAML documents.

string_unquoted: Hello World
string_quoted: "Hello, World!"
string_single_quoted: 'Hello, World!'
number_integer: 123
number_float: 3.14159
boolean_true: true
boolean_false: false
null_value: null
null_value_alternative: ~
movie:
  title: "The Matrix"
  release_year: 1999
  rating: 8.7
  is_classic: true
  description: >
    A computer programmer discovers
    that reality as he knows it
    is a simulation created by machines.
  tagline: |
    Welcome to the Real World.
  box_office: 463.5e6 # Scientific notation for 463.5 million
  sequel: null

YAML Sequences

Learn how to define sequences (arrays) in YAML and how to access their elements. Sequences are ordered collections of values.

names:
  - Annie
  - Joe
  - Mike

numbers: [1, 2, 3, 4, 5]

mixed_sequence:
  - String
  - 41
  - true
  - null
  - [nested, sequence]
top_rated_movies:
  - title: The Shawshank Redemption
    year: 1994
    rating: 9.3
  - title: The Godfather
    year: 1972
    rating: 9.2
  - title: The Dark Knight
    year: 2008
    rating: 9.0
  - title: 12 Angry Men
    year: 1957
    rating: 9.0
  - title: Schindler's List
    year: 1993
    rating: 9.0

genres: [Action, Comedy, Drama, Sci-Fi, Thriller]

YAML Mappings

YAML mappings, also known as dictionaries or associative arrays, are key-value pairs that allow you to represent structured data. Let's explore how to create and use mappings in YAML.

Basic Mappings

A basic mapping consists of keys and their corresponding values:

name: Alan Olan
age: 33
occupation: Software Developer

In this example, "name", "age", and "occupation" are keys, and their corresponding values are "Alan Olan", 33, and "Software Developer".

Nested Mappings

Mappings can be nested to represent more complex structures:

person:
  name: Jane Smith
  age: 28
  address:
    street: 123 Main St
    city: Uptown
    country: China
  hobbies:
    - reading
    - hiking
    - photography

Here, "person" is the top-level key, containing nested mappings for "name", "age", "address", and "hobbies". The "address" key itself contains a nested mapping, and "hobbies" contains a list.

Flow Style Mappings

For compact representation, you can use flow style mappings:

person: {name: Alice, age: 25, job: Engineer}

This is equivalent to the block style but more compact. It's useful for simple, short mappings.

Complex Example: Movie Database

Let's look at a more complex example using a movie database:

movie:
  title: Inception
  director: Christopher Nolan
  release_year: 2010
  cast:
    lead: Leonardo DiCaprio
    supporting:
      - Joseph Gordon-Levitt
      - Ellen Page
      - Tom Hardy
  genres:
    - Sci-Fi
    - Action
    - Thriller
  ratings:
    imdb: 8.8
    rotten_tomatoes: 87%
  box_office:
    budget: $160 million
    revenue: $836 million

This example demonstrates nested mappings, lists within mappings, and various data types (strings, numbers, percentages) all working together to represent complex data structures.

Key Points to Remember

  • Keys in a mapping must be unique.
  • Indentation is crucial for nesting in block style.
  • You can mix and match mappings, sequences, and scalars.
  • Use colons and spaces consistently: key: value
  • For multi-word keys, use quotes: "release year": 2010

Mappings are a powerful feature in YAML, allowing you to represent complex, hierarchical data structures in a human-readable format. They are widely used in configuration files, data serialization, and various applications that require structured data representation.

Nested Structures in YAML

Nested structures in YAML allow you to represent complex, hierarchical data. They combine mappings, sequences, and scalars to create multi-level data representations. Let's explore how to create and use nested structures effectively.

Basic Nesting

Nesting is achieved through indentation. Here's a simple example:

company:
  name: Example Corp
  founded: 2005
  location:
    city: San Francisco
    country: USA

Here, "location" is nested under "company", and "city" and "country" are nested under "location".

Combining Mappings and Sequences

You can nest sequences within mappings and vice versa:

company:
  name: Example Corp
  departments:
    - name: Engineering
      employees: 100
    - name: Marketing
      employees: 50
  projects:
    active:
      - Project A
      - Project B
    completed:
      - Project C

This example shows a sequence of departments (each with its own mapping) and a mapping of project lists.

Deep Nesting

YAML allows for deep nesting to represent complex hierarchies:

university:
  name: Example University
  faculties:
    - name: Science
      departments:
        - name: Physics
          programs:
            - name: Bachelor of Science
              duration: 4 years
            - name: Master of Science
              duration: 2 years
        - name: Chemistry
          programs:
            - name: Bachelor of Science
              duration: 3 years
    - name: Arts
      departments:
        - name: Literature
        - name: History

This structure represents a university with faculties, departments, and programs, demonstrating how deep nesting can represent complex organizational structures.

Mixing Styles

You can mix block and flow styles for readability and compactness:

person:
  name: Alan Klan
  age: 30
  address: {street: 125 Main St, city: Chinatown, country: China}
  hobbies: [reading, hiking, photography]

Here, we use block style for the main structure, flow style for the address mapping, and flow style for the hobbies sequence.

Key Points for Nested Structures

  • Consistent indentation is crucial for proper nesting.
  • Each level of nesting is typically indented by 2 spaces.
  • You can nest mappings within sequences and vice versa.
  • There's no limit to nesting depth, but keep it reasonable for readability.
  • Use comments to explain complex structures: # This is a comment

Nested structures in YAML provide a powerful way to represent complex data hierarchies in a human-readable format. They are extensively used in configuration files, data serialization, and various applications that require structured data representation. By mastering nested structures, you can effectively model and work with complex data in your YAML documents.

YAML Anchors and Aliases

YAML anchors and aliases are powerful features that allow you to reuse content within your YAML document, reducing repetition and making your files more maintainable. Let's explore how to use these features effectively.

Anchors (&)

An anchor is a way to mark a node for future reference. It's defined using the '&' character:

base_config: &base
  version: 1.0
  database: mysql
  server: localhost

Here, we've created an anchor named 'base' for the 'base_config' mapping.

Aliases (*)

An alias is a way to refer to an anchored node. It's defined using the '*' character:

development:
  <<: *base
  environment: dev

production:
  <<: *base
  environment: prod
  server: production.example.com

The '<<:' key is a merge key, which merges the referenced mapping into the current one. In this example, both 'development' and 'production' inherit from 'base_config'.

Overriding Inherited Properties

You can override inherited properties by redefining them:

base_user: &base_user
  name: Alex Joe
  role: user
  permissions: [read, write]

admin_user:
  <<: *base_user
  name: Admin User
  role: admin
  permissions: [read, write, delete]

Here, 'admin_user' inherits from 'base_user' but overrides the 'name', 'role', and 'permissions' properties.

Complex Example: Server Configurations

Let's look at a more complex example using server configurations:

default_settings: &default
  timeout: 30
  max_connections: 100
  logging:
    level: info
    format: json

web_server: &web
  <<: *default
  port: 80
  protocol: http

development:
  <<: *web
  host: localhost
  logging:
    level: debug

production:
  <<: *web
  host: example.com
  protocol: https
  port: 443
  max_connections: 1000

In this example, we define default settings, then a web server configuration that inherits from the defaults. The development and production configurations then inherit from the web server configuration, each with its own specific overrides.

Key Points for Anchors and Aliases

  • Anchors are defined with '&', aliases are referenced with '*'.
  • Use '<<:' to merge mappings.
  • Overridden properties take precedence over inherited ones.
  • Anchors and aliases can significantly reduce duplication in your YAML files.
  • Be cautious with deep nesting of anchors and aliases to maintain readability.

YAML anchors and aliases are powerful tools for creating reusable and maintainable configuration files. They allow you to define common configurations once and reuse them throughout your document, making it easier to manage complex configurations and reduce errors from duplicated code.

YAML Tags

YAML tags are a powerful feature that allows you to specify the type of data being represented. They provide a way to explicitly declare how a particular node should be interpreted. Let's explore YAML tags in depth.

Basic Tag Usage

Tags are prefixed with '!!' and come before the value:

integer: !!int 42
float: !!float 3.14159
string: !!str "Hello, World!"
boolean: !!bool true
null_value: !!null null

These tags explicitly specify the data type of each value.

Complex Data Types

YAML also provides tags for more complex data types:

timestamp: !!timestamp 2023-05-14T12:34:56Z
set: !!set
  ? Java
  ? PHP
  ? Ruby
ordered_map: !!omap
  - first: 1
  - second: 2
  - third: 3

Here, we use tags for a timestamp, a set (unique unordered collection), and an ordered map.

Binary Data

YAML can represent binary data using the !!binary tag:

gif_file: !!binary |
  R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5
  OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+
  +f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC
  AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs=

This represents a base64-encoded binary data, typically used for small images or other binary files.

Custom Tags

You can define custom tags for application-specific data types:

%TAG ! tag:example.com,2023:
---
product: !electronics
  name: Smartphone
  brand: TechCo
  price: 599.99

user: !person
  name: Alex Grek
  age: 30

Here, we define custom tags '!electronics' and '!person' for specific data types in our application.

Tag Resolution

YAML processors use tag resolution to determine the data type when no explicit tag is provided:

implicit_integer: 22  # Resolved as !!int
implicit_float: 5.39  # Resolved as !!float
implicit_string: Hello  # Resolved as !!str
implicit_datetime: 2024-09-27  # Often resolved as !!timestamp

The YAML processor attempts to infer the correct data type based on the value's format.

Key Points for YAML Tags

  • Tags provide explicit type information for YAML nodes.
  • Built-in tags like !!str, !!int, !!float are widely supported.
  • Complex types like !!timestamp, !!set, and !!binary offer advanced functionality.
  • Custom tags allow for application-specific data types.
  • Tag resolution can infer types when tags are not explicitly provided.
  • Use tags judiciously to balance between explicitness and readability.

YAML tags are a powerful feature that allows for precise control over data representation and interpretation. By using tags effectively, you can ensure that your YAML data is correctly understood and processed by applications, while also providing clear documentation of your data structures.

YAML and JSON Comparison

YAML (YAML Ain't Markup Language) and JSON (JavaScript Object Notation) are both popular data serialization formats. While they share some similarities, they also have distinct differences. Let's compare these two formats in detail.

Syntax Comparison

Here's a side-by-side comparison of YAML and JSON syntax:

# YAML
person:
  name: Mike Alan
  age: 32
  city: Beijing
  hobbies:
    - reading
    - hiking
    - photography

// JSON
{
  "person": {
    "name": "Mike Alan",
    "age": 32,
    "city": "Beijing",
    "hobbies": [
      "reading",
      "hiking",
      "photography"
    ]
  }
}
"year": 1999, "directors": [ "Lana Wachowski", "Lilly Wachowski" ], "rating": 8.7 } } # YAML movie: title: The Matrix year: 1999 directors: - Lana Wachowski - Lilly Wachowski rating: 8.7

Using YAML in Configuration Files

Understand how YAML is used in configuration files across various applications.

server:
  host: localhost
  port: 8080
movie_database:
  connection:
    host: movie-db.example.com
    port: 5432
    username: movie_user
    password: secret_password
  settings:
    max_connections: 100
    timeout: 30
  logging:
    level: info
    file: /var/log/movie_db.log

YAML Best Practices

Learn best practices for writing clean and maintainable YAML documents.

# Use consistent indentation
# Avoid tabs, use spaces instead
key: value
# Use meaningful keys
movie:
  title: Inception
  release_year: 2010
  director: Christopher Nolan

# Group related data
cast:
  lead_actor: Leonardo DiCaprio
  supporting_actors:
    - Joseph Gordon-Levitt
    - Ellen Page

# Use comments for clarity
ratings:
  imdb: 8.8  # Out of 10
  rotten_tomatoes: 87  # Percentage

Advanced YAML Features

Explore advanced features of YAML, such as complex data types and custom tags.

# Ordered map
!!omap
- key1: value1
- key2: value2
# Set type
!!set
? Star Wars
? The Lord of the Rings
? Harry Potter

# Ordered map with movie release years
!!omap
- The Godfather: 1972
- Pulp Fiction: 1994
- Fight Club: 1999

# Custom tag
%TAG !movie! tag:movies.example.com,2023:
---
!movie!sci-fi
title: Blade Runner
year: 1982
director: Ridley Scott

Error Handling in YAML

Learn how to handle errors in YAML parsing and validation.

# Example of a syntax error
key: value
key2: value2 # Missing colon
# Correct syntax
movie:
  title: "The Shawshank Redemption"
  year: 1994

# Indentation error
movie:
  title: "The Godfather"
    year: 1972  # This line is incorrectly indented

# Type mismatch
release_year: "1999"  # Should be an integer, not a string

# Duplicate key
movie:
  title: "Pulp Fiction"
  title: "Reservoir Dogs"  # Duplicate key, will overwrite the previous value

YAML Libraries in Different Languages

Discover popular YAML libraries available in various programming languages.

# Python
import yaml

# JavaScript
const yaml = require('js-yaml');
# Ruby
require 'yaml'

# Java
import org.yaml.snakeyaml.Yaml;

# C#
using YamlDotNet.Serialization;

# Go
import "gopkg.in/yaml.v2"

YAML in CI/CD Pipelines

Understand how YAML is used in CI/CD pipelines for configuration and automation.

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v2
# GitLab CI/CD pipeline for a movie review app
stages:
  - build
  - test
  - deploy

build_app:
  stage: build
  script:
    - npm install
    - npm run build

run_tests:
  stage: test
  script:
    - npm run test

deploy_to_production:
  stage: deploy
  script:
    - ssh user@movie-review-server 'deploy-script.sh'
  only:
    - master

YAML Security Considerations

Learn about security considerations when using YAML, including potential vulnerabilities.

# Avoid executing arbitrary code
# Validate input data
# Potential security risk (arbitrary code execution)
some_key: !!python/object/apply:os.system ["ls -l"]

# Safe alternative
command: "ls -l"

# Always validate and sanitize user input
user_input: !!str {{ sanitize(user_provided_value) }}

# Use explicit tags for sensitive data
password: !!str "my_secure_password"

# Avoid using unquoted values for sensitive data
api_key: ABC123  # Risky
api_key: "ABC123"  # Better

YAML CheatSheet

A quick reference guide for YAML syntax and common usage patterns.

Feature Syntax Example
Key-Value Pair key: value
title: The Godfather
Nested Mapping
key:
  subkey: value
movie:
  title: Inception
List
- item1
- item2
genres:
  - Drama
  - Crime
Nested List
- 
  - subitem1
  - subitem2
cast:
  - 
    - Leonardo DiCaprio
    - Dom Cobb
Multiline String
key: |
  Line 1
  Line 2
synopsis: |
  A thief who enters the dreams of others
  to steal secrets from their subconscious.
Folded String
key: >
  Long line 1
  Long line 2
tagline: >
  Your mind is the scene of the crime.
  The dream is real.
Anchor and Alias
key: &anchor
  subkey: value
alias: *anchor
director: &nolan
  name: Christopher Nolan
movie_director: *nolan
Merge Key
base: &base
  key1: value1
merged:
  <<: *base
  key2: value2
base_movie: &base_movie
  studio: Warner Bros.
inception:
  <<: *base_movie
  title: Inception

Complex Movie Database Example

# Movie Database
studio: &studio
  name: Warner Bros. Pictures
  founded: 1923

director: &nolan
  name: Christopher Nolan
  nationality: British-American

movies:
  - title: Inception
    year: 2010
    studio: *studio
    director: *nolan
    genre: !!set
      ? Sci-Fi
      ? Action
      ? Thriller
    cast:
      - name: Leonardo DiCaprio
        role: Dom Cobb
      - name: Joseph Gordon-Levitt
        role: Arthur
    synopsis: |
      A skilled thief is offered a chance to regain his old life as payment for a task considered
to be impossible: "inception",
the implantation of another person's idea into a target's subconscious. box_office: 836.8e6 ratings: imdb: 8.8 rotten_tomatoes: 87% - title: Interstellar year: 2014 studio: <<: *studio co_production: Paramount Pictures director: *nolan genre: !!set ? Sci-Fi ? Drama ? Adventure cast: - name: Matthew McConaughey role: Joseph Cooper - name: Anne Hathaway role: Dr. Amelia Brand tagline: > Mankind was born on Earth. It was never meant to die here. box_office: 701.8e6 ratings: imdb: 8.6 rotten_tomatoes: 72% awards: !!omap - Inception: - Academy Award for Best Cinematography - Academy Award for Best Visual Effects - Interstellar: - Academy Award for Best Visual Effects release_dates: Inception: !!timestamp 2010-07-16 Interstellar: !!timestamp 2014-11-07

Comments & Discussion

Facing issues? Have questions? Post them here! We're happy to help!