During the growth curve of every Perl hacker they come to managing
complex data structures like hash of hashes and lists of lists, etc.
They usually get the hang of it with help from [7]perllol, [8]perldsc,
some good books, usenet, #perl and whatever other resources they can
find. But one subtle Perl feature seems to trip many of them up and
that is the subject of this tutorial.

Let's say you create a data structure like this:

	$HoH =  {
		'foo'   => {
				'x'     => 23,
		},
		'bar'   => {
				'y'     => 18,
		},
	} ;

We can print this using Data::Dumper.
   
	use Data::Dumper ;


	print Dumper $HoH ;

and we see:
   
	$VAR1 = {
		  'foo' => {
			     'x' => 23
			   },
		  'bar' => {
			     'y' => 18
			   }
	};

which is what we expect.
   
But now we try to see if there is a entry for $HoH->{'baz'}{'z'} which
we know doesn't exist. And we are smart enough to test it with exists:
   
	print "baz->z doesn't exist\n" unless exists $HoH->{'baz'}{'z'} ;

	print Dumper $HoH ;

But when we look at the data structure again we see:
   
	$VAR1 = {
		  'foo' => {
			     'x' => 23
			   },
		  'baz' => {},
		  'bar' => {
			     'y' => 18
			   }
	};

Where did that 'baz' entry come from? We never created it? Or did we?

What happened is that Perl saw that $HoH->{'baz'} was being used as a
hash reference (referring to a hash with 'z' as the key) and that
$HoH->{'baz'} was not defined (actually it doesn't exist either) so
Perl created it for you. That is called autovivification which means
bringing to life automagically!

Here is the same concept but with anonymous arrays instead of hashes:

   
	$LoL =  [
		[ 2, 4, 6 ],
		[ 3, 5, 7 ],
	] ;

	print Dumper $LoL ;

	$VAR1 = [
		  [
		    2,
		    4,
		    6
		  ],
		  [
		    3,
		    5,
		    7
		  ]
	];


	print "[2][1] isn't defined\n" unless defined $LoL->[2][1] ;

	[2][1] isn't defined

	print Dumper $LoL ;

	$VAR1 = [
		  [
		    2,
		    4,
		    6
		  ],
		  [
		    3,
		    5,
		    7
		  ],
		  []
	];

Notice the anonymous array created in $LoL->[2]! It just got
autovivified because the code assumed it had to exist and Perl created
it for you.

Here is another example which is a common idiom and confuses some
newbies:
   
	$list_ref = undef ;
	push @{$list_ref}, 1 .. 4 ;

	print Dumper $list_ref ;

	$VAR1 = [
		  1,
		  2,
		  3,
		  4
	];

Note that undef is only assigned to $list_ref for this example. In
normal code it would probably be a my'ed variable and start out
undefined. Without autovivification you would have to assign an empty
anonymous array to $list_ref first.
   
   
	$list_ref = [] ;
	push @{$list_ref}, 1 .. 4 ;

A variant on that would be:
   
	push @{$list_ref ||= []}, 1 .. 4 ;

That initializes $list_ref to [] if it is false (most likely it was
undefined as in the above cases).

It is still cleaner and definitely faster to let Perl do the defined
test and initialization with [] for you.

Autovivification even works on references to scalars:
   
	my $scalar_ref = undef ;
	${$scalar_ref} = 'i am refered to' ;

	print "ref $scalar_ref value [${$scalar_ref}]\n" ;

Now is the time for some explanation of what is happening under the
hood. Autovivification of references only occurs when you dereference
an undefined value. If there is a defined value (and not a reference
of the proper type), it will be used as a symbolic reference and not
be what you want. Remember, symbolic references are black magic and
should only be used in very few cases and never by newbies. You should
be using strict which disables symbolic references and would thereby
detect the error of dereferencing a variable which has a value other
than undef or a proper reference.

So Perl first evaluates a dereference expression and sees that the
current reference value is undefined. It notes the type of dereference
(scalar, array or hash) and allocates an anonymous reference of that
type. Perl then stores that new reference value where the undefined
value was stored. Then the dereference operation in progress is
continued. If you do a nested dereference expression, then each level
from top to bottom can cause its own autovivication. Look at this:
   
	$deep_ref = undef ;

	$deep_ref->{'foo'}{'bar'}[1]{'baz'} = 1 ;

	print Dumper $deep_ref ;


	$VAR1 = {
		  'foo' => {
			     'bar' => [
					undef,
					{
					  'baz' => 1
					}
				      ]
			   }
	};

Four anonymous references were created there by autovivification
working from the top level with $deep_ref all the way down to the hash
that has 'baz' for its only key.

This last example illustrates the power and primary use of
autovivifiction. If you wanted to assign the lowest level hash before
the higher levels existed, without autovivifiaction, you would have to
do the loop yourself and test each level and optionally create it as
you went down. The call would have to take a list of pairs - reference
type and index or key. You could simplify it by restricting it to one
type:
   
	sub deep_hash_assign {

	    my( $ref_ref, $val, @keys ) = @_ ;

	    unless ( @keys ) {
		warn "deep_hash_assign: no keys" ;
		return ;
	    }

	    foreach my $key ( @keys ) {

		my $ref = ${$ref_ref} ;

	# this is the autoviv step
		unless ( defined( $ref ) ) {

		    $ref = { $key => undef } ;
		    ${$ref_ref} = $ref ;
		}

	# this checks we have a valid hash ref as a current value

		unless ( ref $ref eq 'HASH' and exists( $ref->{ $key } ) ) {

		    warn "deep_hash_assign: not a hash ref at $key in @keys" ;
		    return ;
		}

	# this points to the next level down the hash tree

		$ref_ref = \$ref->{ $key } ;

	    }

	    ${$ref_ref} = $val ;
	}


	$deep_ref2 = undef ;

	deep_hash_assign( \$deep_ref2, 17, qw( foo bar baz ) ) ;

	print Dumper $deep_ref2 ;

	$deep_ref2 = undef ;

	deep_hash_assign( \$deep_ref2, 17 ) ;

As you can see, that sub is not very robust, clumsy to use and
probably a lot slower than having Perl do it for you. Also it can't
handle a mix of hashes and arrays. To do that you would have to also
specify hash or array along with each key or index.

So autovivification saves code and trouble when assigning deep into a
data structure, but why does it also happen when using exists and
defined? Many people think that exists and defined should fail at the
first level thay can. Let's look at exists and defined again with this
code:

  
	%hash = (
		'foo'   => 3,
	) ;

	print Dumper \%hash ;

	if ( exists( $hash{'bar'}{'baz'} ) ) {
		print "{'bar'}{'baz'} exists\n" ;
	}

	print Dumper \%hash ;

Where did the 'bar' => {} and 'array' => [] entries in %hash come
from? Well, the way Perl works, exists and defined do not provide any
special contexts to their expressions. So if their expression would
autovivify, it will happen before the exists or defined test occurs.
This issue has been argued heavily in various fora including p5p but
it won't be changed as too much code works with the current behavior.
It is the way Perl treats it and you can't directly get around it.
Perl6 has been discussing this and may do something to support this
and it could be controlled by a pragma. But there are still gray
areas, such as if you take a reference deep into a tree where
autovivification would be triggered, does passing that to an exists
call stop it from happening? Similarly passing a potentially
autovivified expression to a sub which may only call defined on it,
should that work as it does now?

Here is a sub you can use to test for existance of a key at any level
and it will not trigger autovivification:
   
	sub deep_exists {

	    my( $hash_ref, @keys ) = @_ ;

	    unless ( @keys ) {

		warn "deep_exists: no keys" ;
		return ;
	    }

	    foreach my $key ( @keys ) {

		unless( ref $hash_ref eq 'HASH' ) {

		    warn "$hash_ref not a HASH ref" ;
		    return ;
		}


		return 0 unless exists( $hash_ref->{$key} ) ;

		$hash_ref = $hash_ref->{$key} ;
	    }

	    return 1 ;
	}

	%exist_hash = (

	    'foo'    => {
		'bar'    => 3
	    }
	) ;

	print "\$exist_hash{foo}{bar} exists\n"
		if deep_exists( \%exist_hash, qw( foo bar ) ) ;


	print "\$exist_hash{foo}{bar}{baz} doesn't exist\n"
		unless deep_exists( \%exist_hash, qw( foo bar baz ) ) ;

	print Dumper \%exist_hash ;

	$VAR1 = {
		  'foo' => {
			     'bar' => 3
			   }
	};

Notice that the data structure did not get modified as we didn't
trigger autovivification and we exited as soon as an exists call
failed. Also it returns 0 on normal failure and undef on detecting an
error.

That sub only works on hashes of hashes and it tests with exists. Here
it is, modified to work with hashes or arrays and it uses defined for
the test:
   
	sub deep_defined {

	    my( $ref, @keys ) = @_ ;

	    unless ( @keys ) {

		warn "deep_defined: no keys" ;
		return ;
	    }

	    foreach my $key ( @keys ) {

		if( ref $ref eq 'HASH' ) {

	# fail when the key doesn't exist at this level

		    return unless defined( $ref->{$key} ) ;

		    $ref = $ref->{$key} ;
		    next ;
		}

		if( ref $ref eq 'ARRAY' ) {

	# fail when the index is out of range or is not defined

		    return unless 0 <= $key && $key < @{$ref} ;

		    return unless defined( $ref->[$key] ) ;

		    $ref = $ref->[$key] ;
		    next ;
		}

	# fail when the current level is not a hash or array ref

		return ;
	    }

	    return 1 ;
	}

	my $defined_tree = {

	    'foo'    => [

		    {
		    'bar'    => 3,
		    'baz'    => 'four',
		},
		    {
		    'bar'    => 5,
		    'baz'    => 'six',
		}
	    ],
	    'oof'    => [

		    {
		    'bar'    => 7,
		    'baz'    => 'eight',
		},
		    {
		    'bar'    => 9,
		}
	    ],
	} ;

	print "\$defined_tree->{foo}[0]{bar} is defined\n"
		if deep_defined( $defined_tree, 'foo', 0, 'bar' ) ;


	print "\$defined_tree->{oof}[1]{baz} isn't defined\n"
		unless deep_defined( $defined_tree, 'oof', 1, 'baz' ) ;

	print "\$defined_tree->{goof}[1]{baz} isn't defined\n"
		unless deep_defined( $defined_tree, 'goof', 1, 'baz' ) ;

	print DumperX $defined_tree ;

	$defined_tree->{foo}[0]{bar} is defined
	$defined_tree->{oof}[1]{baz} isn't defined
	$defined_tree->{goof}[1]{baz} isn't defined

	$VAR1 = {
		  'oof' => [
			     {
			       'baz' => 'eight',
			       'bar' => 7
			     },
			     {
			       'bar' => 9
			     }
			   ],
		  'foo' => [
			     {
			       'baz' => 'four',
			       'bar' => 3
			     },
			     {
			       'baz' => 'six',
			       'bar' => 5
			     }
			   ]
	};

As you can see it works and doesn't autovivify higher levels as it
returns when it doesn't find a reference. It is a cleaner subroutine
than deep_hash_assign since it can see what there is at each level and
do the right thing.

So to review the concept, autovivification happens when Perl
automatically create a reference of the appropriate type when an
undefined scalar value is dereferenced. It is a useful concept and is
used in many programs. If Perl didn't do it, you would have to resort
to clumsier code and special subroutines to create the new levels of
your data structures. Some complain it shouldn't happen with exists or
defined but the sub to work around that is not tricky to create or
use. There is interest that in Perl 6 those two operations won't
autovivify but that is not for certain.