YAML with perl

思うところあってYAMLについてお勉強。
YAML 1.1ドキュメントを読んでみる(blockquote部分は超適当訳)。
パーサは YAML.pm v0.66。Perl v5.8.7 (on Cygwin)。
サンプルスクリプト

#!/usr/bin/perl
use strict;
use warnings;

use YAML;
use Data::Dumper;

my @datareflist = Load(<<'EOF');
# something yaml code
EOF

print ">||\n##### yaml #####\n";
print Dump @datareflist;
print "\n##### data #####\n";
print Dumper @datareflist;
print "||<\n";

Collections

YAML では block collection のスコープを示すためにインデントを使う。
Block sequence を記述するためにはエントリのアタマに dash and space ('- ')を使う。
Mapping を記述するためには colon and space (': ')を使う。

Example 2.1. Sequence of Scalars(ball players)

- Mark McGwire
- Sammy Sosa
- Ken Griffey
##### yaml #####
---
- Mark McGwire
- Sammy Sosa
- Ken Griffey

##### data #####
$VAR1 = [
          'Mark McGwire',
          'Sammy Sosa',
          'Ken Griffey'
        ];

Example 2.2. Mapping Scalars to Scalars(player statistics)

hr:  65    # Home runs
avg: 0.278 # Batting average
rbi: 147   # Runs Batted In
##### yaml #####
---
avg: '0.278 # Batting average'
hr: '65    # Home runs'
rbi: '147   # Runs Batted In'

##### data #####
$VAR1 = {
          'hr' => '65    # Home runs',
          'avg' => '0.278 # Batting average',
          'rbi' => '147   # Runs Batted In'
        };

Example 2.3. Mapping Scalars to Sequences (ball clubs in each league)

american:
  - Boston Red Sox
  - Detroit Tigers
  - New York Yankees
national:
  - New York Mets
  - Chicago Cubs
  - Atlanta Braves
##### yaml #####
---
american:
  - Boston Red Sox
  - Detroit Tigers
  - New York Yankees
national:
  - New York Mets
  - Chicago Cubs
  - Atlanta Braves

##### data #####
$VAR1 = {
          'american' => [
                          'Boston Red Sox',
                          'Detroit Tigers',
                          'New York Yankees'
                        ],
          'national' => [
                          'New York Mets',
                          'Chicago Cubs',
                          'Atlanta Braves'
                        ]
        };

Example 2.4. Sequence of Mappings (players’ statistics)

-
  name: Mark McGwire
  hr:   65
  avg:  0.278
-
  name: Sammy Sosa
  hr:   63
  avg:  0.288
##### yaml #####
---
- avg: 0.278
  hr: 65
  name: Mark McGwire
- avg: 0.288
  hr: 63
  name: Sammy Sosa

##### data #####
$VAR1 = [
          {
            'hr' => '65',
            'avg' => '0.278',
            'name' => 'Mark McGwire'
          },
          {
            'hr' => '63',
            'avg' => '0.288',
            'name' => 'Sammy Sosa'
          }
        ];

YAML はインデントのかわりにスコープを明示する indicators を明示する、flow style ももっている。
Flow sequence はコンマで区切られた、'[]'中のリストで記述する。
Flow mappings は同様に '{}' 中のリストで記述する。

Example 2.5. Sequence of Sequences

- [name        , hr, avg  ]
- [Mark McGwire, 65, 0.278]
- [Sammy Sosa  , 63, 0.288]
##### yaml #####
---
-
  - name
  - hr
  - avg
-
  - Mark McGwire
  - 65
  - 0.278
-
  - Sammy Sosa
  - 63
  - 0.288

##### data #####
$VAR1 = [
          [
            'name',
            'hr',
            'avg'
          ],
          [
            'Mark McGwire',
            '65',
            '0.278'
          ],
          [
            'Sammy Sosa',
            '63',
            '0.288'
          ]
        ];

Example 2.6. Mapping of Mappings

Mark McGwire: {hr: 65, avg: 0.278}
Sammy Sosa: {
    hr: 63,
    avg: 0.288
  }

これ、そのままだと YAML.pl はエラーになる。'{}' 内を複数行にするとダメみたいだ。

YAML Error: Inconsistent indentation level
   Code: YAML_PARSE_ERR_INCONSISTENT_INDENTATION
   Line: 3
   Document: 1
##### yaml #####
---
Mark McGwire:
  avg: 0.278
  hr: 65
Sammy Sosa:
  avg: 0.288
  hr: 63

##### data #####
$VAR1 = {
          'Mark McGwire' => {
                              'hr' => '65',
                              'avg' => '0.278'
                            },
          'Sammy Sosa' => {
                            'hr' => '63',
                            'avg' => '0.288'
                          }
        };

Structures

YAMLでは3つの dash ('---')を使って stream 中の documents をわける。
3つの dots ('...') は、communication channels 中で使うための、新しい documents のはじまらない、document のおわりを示す。
Comment 行は Octothrope (hash, sharp, number sign ともいう: '#') ではじまる。

Example 2.7. Two Documents in a Stream (each with a leading comment)

# Ranking of 1998 home runs
---
- Mark McGwire
- Sammy Sosa
- Ken Griffey

# Team ranking
---
- Chicago Cubs
- St Louis Cardinals
##### yaml #####
---
- Mark McGwire
- Sammy Sosa
- Ken Griffey
---
- Chicago Cubs
- St Louis Cardinals

##### data #####
$VAR1 = [
          'Mark McGwire',
          'Sammy Sosa',
          'Ken Griffey'
        ];
$VAR2 = [
          'Chicago Cubs',
          'St Louis Cardinals'
        ];

Example 2.8. Play by Play Feed from a Game

---
time: 20:03:20
player: Sammy Sosa
action: strike (miss)
...
---
time: 20:03:47
player: Sammy Sosa
action: grand slam
...

YAML.pm は '...' を解釈しないのか?

YAML Error: Invalid element in map
   Code: YAML_LOAD_ERR_BAD_MAP_ELEMENT
   Line: 5
   Document: 1

くりかえされる nodes は anchor ( ampersand '&')によって示される。
それらは以降 (asterisk '*' による参照で) aliase される。

Example 2.9. Single Document with Two Comments

---
hr: # 1998 hr ranking
  - Mark McGwire
  - Sammy Sosa
rbi:
  # 1998 rbi ranking
  - Sammy Sosa
  - Ken Griffey

'hr: # 1998 hr ranking' みたいな '#' がアタマにこない場合のコメントは YAML.pm は解釈しないみたいだ。Example 2.2 でもコメントのつもりがまとめて文字列扱いされてるし。

YAML Error: Inconsistent indentation level
   Code: YAML_PARSE_ERR_INCONSISTENT_INDENTATION
   Line: 3
   Document: 1

Example 2.10. Node for "Sammy Sosa" appears twice in this document

---
hr:
  - Mark McGwire
  # Following node labeled SS
  - &SS Sammy Sosa
rbi:
  - *SS # Subsequent occurrence
  - Ken Griffey
##### yaml #####
---
hr:
  - Mark McGwire
  - Sammy Sosa
rbi:
  - Sammy Sosa
  - Ken Griffey

##### data #####
$VAR1 = {
          'hr' => [
                    'Mark McGwire',
                    'Sammy Sosa'
                  ],
          'rbi' => [
                     'Sammy Sosa',
                     'Ken Griffey'
                   ]
        };

Question mark and space ('? ') は複合(complex) mapping key を示す。
Block collection の中では、"key: value" の対は '- ', ': ' または '? ' のすぐ後ろに記述することができる。

Example 2.11. Mapping between Sequences

? - Detroit Tigers
  - Chicago cubs
:
  - 2001-07-23

? [ New York Yankees,
    Atlanta Braves ]
: [ 2001-07-02, 2001-08-12,
    2001-08-14 ]

これもそのままだと YAML.pm は解釈してくれない。このへんまでオトす。

? [Detroit Tigers, Chicago cubs]
: [2001-07-23]
? [ New York Yankees, Atlanta Braves ]
: [ 2001-07-02, 2001-08-12, 2001-08-14 ]
##### yaml #####
---
ARRAY(0x1037ad2c):
  - 2001-07-23

##### data #####
$VAR1 = {
          'ARRAY(0x1037ad2c)' => [
                                   '2001-07-23'
                                 ]
        };

何で1つ目のエントリしか読まないんだ? バグ?

? [ New York Yankees, Atlanta Braves ]
: [ 2001-07-02, 2001-08-12, 2001-08-14 ]
? [ Detroit Tigers, Chicago cubs ]
: [ 2001-07-23 ]

順番入れ替えたらちゃんと二つ出たぞ。バグっぽいなあ。

##### yaml #####
---
ARRAY(0x1037acb8):
  - 2001-07-02
  - 2001-08-12
  - 2001-08-14
ARRAY(0x103851ac):
  - 2001-07-23

##### data #####
$VAR1 = {
          'ARRAY(0x103851ac)' => [
                                   '2001-07-23'
                                 ],
          'ARRAY(0x1037acb8)' => [
                                   '2001-07-02',
                                   '2001-08-12',
                                   '2001-08-14'
                                 ]
        };

Example 2.12. In-Line Nested Mapping

---
# products purchased
- item    : Super Hoop
  quantity: 1
- item    : Basketball
  quantity: 4
- item    : Big Shoes
  quantity: 1
##### yaml #####
---
- item: Super Hoop
  quantity: 1
- item: Basketball
  quantity: 4
- item: Big Shoes
  quantity: 1

##### data #####
$VAR1 = [
          {
            'quantity' => '1',
            'item' => 'Super Hoop'
          },
          {
            'quantity' => '4',
            'item' => 'Basketball'
          },
          {
            'quantity' => '1',
            'item' => 'Big Shoes'
          }
        ];

Scalars

(すべての改行に意味を持つ) Scalar content は litteral style ('|') を使って、block form の中に記述することができる。
各改行を(empty または "more indented" line をのぞいて)空白としておりたたまれる folded style ('>') もある。

Example 2.13. In literals, newlines are preserved

# ASCII Art
--- |
  \//||\/||
  // ||  ||__
##### yaml #####
--- "\\//||\\/||\n// ||  ||__\n"

##### data #####
$VAR1 = '\\//||\\/||
// ||  ||__
';

Example 2.14. In the plain scalar, newlines become spaces

---
  Mark McGwire's
  year was crippled
  by a knee injury.

これも YAmL.pm ではエラーになる。Plain scalar は書けないということか??

YAML Error: Expected separator '---'
   Code: YAML_PARSE_ERR_NO_SEPARATOR
   Line: 2
   Document: 2

Example 2.15. Folded newlines are preserved for "more indented" and blank lines

>
 Sammy Sosa completed another
 fine season with great stats.

   63 Home Runs
   0.288 Batting Average

 What a year!

これもそのままだとダメ。アタマをちょっとだけかきかえる。

--- >
 Sammy Sosa completed another
 fine season with great stats.

   63 Home Runs
   0.288 Batting Average

 What a year!
##### yaml #####
--- |
Sammy Sosa completed another fine season with great stats.

  63 Home Runs
  0.288 Batting Average

What a year!

##### data #####
$VAR1 = 'Sammy Sosa completed another fine season with great stats.

  63 Home Runs
  0.288 Batting Average

What a year!
';

Example 2.16. Indentation determines scope

name: Mark McGwire
accomplishment: >
  Mark set a major league
  home run record in 1998.
stats: |
  65 Home Runs
  0.278 Batting Average
##### yaml #####
---
accomplishment: "Mark set a major league home run record in 1998.\n"
name: Mark McGwire
stats: |
  65 Home Runs
  0.278 Batting Average

##### data #####
$VAR1 = {
          'accomplishment' => 'Mark set a major league home run record in 1998.
',
          'stats' => '65 Home Runs
0.278 Batting Average
',
          'name' => 'Mark McGwire'
        };

YAMLの flow scalars は plain style と quoted styles がある。
Double-quoted style は escape sequences を提供する。
Single-quoted style は escaping が不要なときに便利。
すべての flow scalars は複数行にできる; 改行は常におりたたまれる。

Example 2.17. Quoted Scalars

unicode: "Sosa did fine.\u263A"control: "\b1998\t1999\t2000\n"
hexesc:  "\x13\x10 is \r\n"

single: '"Howdy!" he cried.'
quoted: ' # not a ''comment''.'
tie-fighter: '|\-*-/|'
##### yaml #####
---
control: "\\b1998\t1999\t2000\n"
hexesc: "\x13\x10 is \r\n"
quoted: " # not a 'comment'."
single: '"Howdy!" he cried.'
tie-fighter: '|\-*-/|'
unicode: Sosa did fine.\u263A

##### data #####
$VAR1 = {
          'single' => '"Howdy!" he cried.',
          'tie-fighter' => '|\\-*-/|',
          'unicode' => 'Sosa did fine.\\u263A',
          'quoted' => ' # not a \'comment\'.',
          'hexesc' => ' is 
',
          'control' => '\\b1998	1999	2000
'
        };

Example 2.18. Multi-line Flow Scalars

plain:
  This unquoted scalar
  spans many lines.

quoted: "So does this
  quoted scalar.\n"

これもそのままだと YAML.pm ではエラーになる。複数行の plain scalar はうけつけないのか。

Tags

YAML では、untagged nodes は application に依存した型(type) をあたえる。
一般的な例として、YAML tag repository では "seg", "map", "str" types が使われている。
そのほか、"int", "float", "null", "bool", "set" などもある。

Example 2.19. Integers

canonical: 12345
decimal: +12,345
sexagesimal: 3:25:45
octal: 014
hexadecimal: 0xC
##### yaml #####
---
canonical: 12345
decimal: '+12,345'
hexadecimal: 0xC
octal: 014
sexagesimal: 3:25:45

##### data #####
$VAR1 = {
          'octal' => '014',
          'sexagesimal' => '3:25:45',
          'canonical' => '12345',
          'hexadecimal' => '0xC',
          'decimal' => '+12,345'
        };

普通に全部文字列として解釈されているような。

Example 2.20. Floating Point

canonical: 1.23015e+3exponential: 12.3015e+02
sexagesimal: 20:30.15
fixed: 1,230.15
negative infinity: -.inf
not a number: .NaN
##### yaml #####
---
canonical: 1.23015e+3
exponential: 12.3015e+02
fixed: '1,230.15'
negative infinity: -.inf
not a number: .NaN
sexagesimal: 20:30.15

##### data #####
$VAR1 = {
          'sexagesimal' => '20:30.15',
          'canonical' => '1.23015e+3',
          'negative infinity' => '-.inf',
          'fixed' => '1,230.15',
          'not a number' => '.NaN',
          'exponential' => '12.3015e+02'
        };

Example 2.21. Miscellaneous

null: ~
true: y
false: n
string: '12345'
##### yaml #####
---
false: n
null: ~
string: 12345
true: y

##### data #####
$VAR1 = {
          'false' => 'n',
          'string' => '12345',
          'true' => 'y',
          'null' => undef
        };

Example 2.22. Timestamps

canonical: 2001-12-15T02:59:43.1Z
iso8601: 2001-12-14t21:59:43.10-05:00
spaced: 2001-12-14 21:59:43.10 -5
date: 2002-12-14
##### yaml #####
---
canonical: 2001-12-15T02:59:43.1Z
date: 2002-12-14
iso8601: 2001-12-14t21:59:43.10-05:00
spaced: 2001-12-14 21:59:43.10 -5

##### data #####
$VAR1 = {
          'canonical' => '2001-12-15T02:59:43.1Z',
          'date' => '2002-12-14',
          'iso8601' => '2001-12-14t21:59:43.10-05:00',
          'spaced' => '2001-12-14 21:59:43.10 -5'
        };

やっぱり全部普通に文字列になってるよな。

型の明示は exclamation point ('!') を使った tag によってできる。
Global tags は URIs と handle による shorthand で特定されるだろう。
Application-specific local tags も利用できる。

Example 2.23. Various Explicit Tags

---
not-date: !!str 2002-04-28

picture: !!binary |
 R0lGODlhDAAMAIQAAP//9/X
 17unp5WZmZgAAAOfn515eXv
 Pz7Y6OjuDg4J+fn5OTk6enp
 56enmleECcgggoBADs=

application specific tag: !something |
 The semantics of the tag
 above may be different for
 different documents.
##### yaml #####
---
application specific tag: |
  The semantics of the tag
  above may be different for
  different documents.
not-date: 2002-04-28
picture: |
  R0lGODlhDAAMAIQAAP//9/X
  17unp5WZmZgAAAOfn515eXv
  Pz7Y6OjuDg4J+fn5OTk6enp
  56enmleECcgggoBADs=

##### data #####
$VAR1 = {
          'not-date' => '2002-04-28',
          'application specific tag' => 'The semantics of the tag
above may be different for
different documents.
',
          'picture' => 'R0lGODlhDAAMAIQAAP//9/X
17unp5WZmZgAAAOfn515eXv
Pz7Y6OjuDg4J+fn5OTk6enp
56enmleECcgggoBADs=
'
        };

Example 2.24. Global Tags

%TAG ! tag:clarkevans.com,2002:
--- !shape
  # Use the ! handle for presenting
  # tag:clarkevans.com,2002:circle
- !circle
  center: &ORIGIN {x: 73, y: 129}
  radius: 7
- !line
  start: *ORIGIN
  finish: { x: 89, y: 102 }
- !label
  start: *ORIGIN
  color: 0xFFEEBB
  text: Pretty vector drawing.

エラーになった。

YAML Error: Can't parse inline implicit value ''
   Code: YAML_PARSE_ERR_BAD_INLINE_IMPLICIT
   Line: 1
   Document: 1

%TAG という記述そのものが解釈できないのか?

Example 2.25. Unordered Sets

# sets are represented as a
# mapping where each key is
# associated with the empty string
--- !!set
? Mark McGwire
? Sammy Sosa
? Ken Griff

これもエラー。

YAML Error: Invalid element in map
   Code: YAML_LOAD_ERR_BAD_MAP_ELEMENT
   Line: 6
   Document: 1

Example 2.26. Ordered Mappings

# ordered maps are represented as
# a sequence of mappings, with
# each mapping having one key
--- !!omap
- Mark McGwire: 65
- Sammy Sosa: 63
- Ken Griffy: 58
##### yaml #####
--- !!omap
- Mark McGwire: 65
- Sammy Sosa: 63
- Ken Griffy: 58

##### data #####
$VAR1 = [
          {
            'Mark McGwire' => '65'
          },
          {
            'Sammy Sosa' => '63'
          },
          {
            'Ken Griffy' => '58'
          }
        ];