# Data Containers

This notebook shows how to combine data into different types of "containers" (arrays, dictionaries, tuples, ...) inside your program.

## Load Packages and Extra Functions

In [1]:
using Printf

include("jlFiles/printmat.jl")

printyellow (generic function with 1 method)

# Arrays

are used everywhere in finance and statistics/econometrics.

## Vectors, Matrices and High-dimensional Arrays

can be created in many ways: the code below demonstrates just a few of them. See the tutorial on Arrays for (many) more details.

To access an array element, just do `A[2]` or similarly. Also, you can change an array element as in `B[2,1] = -999.`

Notice that `D = [A B]` creates an independent copy, so later changing `B` does not affect `D`. However, if we define `E = B`, then a change of `B` will affect both itself and `E`.



In [2]:
A = [100,101]           #a vector
printmat(A)             #or display(A)

B = [1 2;               #a matrix
     0 10]
printmat(B)

D = [A B]               #a 2x3 matrix
printmat(D)

   100    
   101    

     1         2    
     0        10    

   100         1         2    
   101         0        10    



In [3]:
println("A[2] is ",A[2])          #access an element

B[2,1] = -999                     #change an element
println("\nB is now")
printmat(B)

println("\nD is not affected")
printmat(D)                       #D is not changed when B is

A[2] is 101

B is now
     1         2    
  -999        10    


D is not affected
   100         1         2    
   101         0        10    



In [4]:
C = rand(4,3,2)         #a 4x3x3 array
display(C)

4×3×2 Array{Float64, 3}:
[:, :, 1] =
 0.0822035  0.727781   0.616084
 0.700477   0.148329   0.0486485
 0.497247   0.0274626  0.233258
 0.908952   0.0319028  0.63753

[:, :, 2] =
 0.991477  0.725542  0.505041
 0.999608  0.801481  0.636564
 0.841383  0.160885  0.691288
 0.473578  0.14428   0.724725

## Arrays of Arrays (or other types)

You can store very different things (a mixture of numbers, matrices, strings) in an array. For instance, if `a` is a vector, `str` is a string and `C` is a matrix, then `x = [a,str,C]` puts them into a vector.

If you later change elements of the matrix `C` then it will affect `x` (discussed at the end of the notebook). 

In [5]:
a   = 1:10
str = "Hazel"
C   = [11 12;21 22]
x  = [a,str,C]        #element 1 of x is a

foreach(display,x)    #loops over the elements of x

1:10

"Hazel"

2×2 Matrix{Int64}:
 11  12
 21  22

# Tuples and Named Tuples

are very useful for collecting very different types of data (a number, a string, and a couple of vectors, say). 

Once created, you cannot change tuples (they are immutable). (Exception: *changing elements of an array* that belongs to the tuple will affect the tuple too.)

Tuples are often used as inputs or outputs of functions.

The next few cells show how to create (named) tuples, how to extract parts of them and what happens when you to try to change them.

In [6]:
a   = 1:10              #how to create tuples and named tuples
str = "Hazel"
C   = [11 12;21 22]

t = (a,str,C)           #a tuple, or tuple(a,str,C)
display(t)

nt = (a=a,str=str,C=C)  #a named tuple, (a2=a,str2=str,C2=C) would also work
display(nt)

nt_ = (;a,str,C)        #also a named tuple (Julia 1.5+), names are given by variables
display(nt_)

(1:10, "Hazel", [11 12; 21 22])

(a = 1:10, str = "Hazel", C = [11 12; 21 22])

(a = 1:10, str = "Hazel", C = [11 12; 21 22])

In [7]:
(a2,str2,C2) = t                   #extract the tuple into variables ("destructuring")
println("a2 and str2 are: $a2 $str2 \n")

println("t[3] is ",t[3],"\n")            #can index into (tuple) t

println("nt.C is ",nt.C)                 #we can use nt.C as a name (nt is a named tuple)

#(a3,str3...) = t                        #in Julia 1.6, str3 will be a tuple 

a2 and str2 are: 1:10 Hazel 

t[3] is [11 12; 21 22]

nt.C is [11 12; 21 22]


In [8]:
println("t[1] is ",t[1],"\n")
#t[1] = -999                        #cannot change the tuple, uncomment to get an error
#t[4] = 34                          #cannot add elements, uncomment to get an error

println("nt.C is ",nt.C,"\n")
#nt.a = -999                        #cannot change the tuple
#n.D = 34                           #cannot add elements

t[1] is 1:10

nt.C is [11 12; 21 22]



## Create a Tuple Dynamically (extra)

when the values and (perhaps also the names) are created dynamically in the program.

Suppose `values` and `names` in the next cell may differ in length from one run of the program to the next. Using `tuple(values...)` and `NamedTuple{names}(values)` allows you to still create tuples/named tuples.

In [9]:
values = [a,str,C]

t2 = tuple(values...)                        #or (values...,)
display(t2)

names  = (:a, :b, :c)
nt2    = NamedTuple{names}(values)           #or (;zip(names,values)...)
display(nt2)

(1:10, "Hazel", [11 12; 21 22])

(a = 1:10, b = "Hazel", c = [11 12; 21 22])

# Dictionaries

is a flexible way to collect different types of data. Dictionaries can (in contrast to tuples) be changed. Also, changing elements of an array that belongs to the dictionary will affect the dictionary too.

A dictionary is organised as (key,value) pairs, where the key is the name of the element. You can loop over the elements (see below) and also change/add elements in a loop.

In [10]:
a   = 1:10
str = "Hazel"
C   = [11 12;21 22]

D = Dict(:a=>a,:str=>str,:C=>C)       #dictionary, "a" instead of :a works too

println("D[:C] is ",D[:C])

D[:a] = -999                          #can change an element

D[:verse2] = "Stardust"               #can add an element

display(D)

Dict{Symbol, Any} with 4 entries:
  :a      => -999
  :verse2 => "Stardust"
  :str    => "Hazel"
  :C      => [11 12; 21 22]

D[:C] is [11 12; 21 22]


In [11]:
for (key,value) in D                #loop over a dictionary
    println("$key: $value")
end

a: -999
verse2: Stardust
str: Hazel
C: [11 12; 21 22]


## From a Dict to a NamedTuple and Back Again (extra)

In [12]:
nt = (;D...)      #create a named tuple from a dict
display(nt)

D2 = Dict(pairs(nt))  #create a dict from a named tuple
display(D2)

(a = -999, verse2 = "Stardust", str = "Hazel", C = [11 12; 21 22])

Dict{Symbol, Any} with 4 entries:
  :a      => -999
  :verse2 => "Stardust"
  :str    => "Hazel"
  :C      => [11 12; 21 22]

## A Potential Pitfall in Adding to a Dict (extra)

If you have created a dict with only numbers by 
```
D = Dict(:aa=>1)
``` 
then you cannot add eg. a string by `D[:cc] = "hello"` since `D` is only set up to accept variables of the type `Int`. 

In [13]:
D = Dict(:aa=>1)
#D[:cc] = "hello"            #error since D only accepts Int

D = Dict{Any,Any}(:aa=>1)    #this works
D[:cc] = "hello"
display(D)

Dict{Any, Any} with 2 entries:
  :aa => 1
  :cc => "hello"

## Create a Dictionary Dynamically (extra)

See below for examples.

Remark: if you have the names as an array of strings (`names = ["a","b","c"]`), but want symbol names (`:a` etc), then use `Symbol.(names)`.

In [14]:
names  = (:a, :b, :c)           #or ["a","b","c"]
values = [a,str,C]

D = Dict(zip(names,values))
display(D)

Dict{Symbol, Any} with 3 entries:
  :a => 1:10
  :b => "Hazel"
  :c => [11 12; 21 22]

In [15]:
D = Dict()                   #empty dictionary
for i = 1:length(values)     #loop
    D[names[i]] = values[i]  #add this to the dictionary
end
display(D)

Dict{Any, Any} with 3 entries:
  :a => 1:10
  :b => "Hazel"
  :c => [11 12; 21 22]

# Your Own Tailor Made Data Type

It is sometime conventient to define your own `struct` as a container. The `struct` command creates an immutable type (you cannot change it, except for elements of arrays that belong to it). There is also a `mutable struct` approach.

In [16]:
a   = 1:10
str = "Hazel"
C   = [11 12;21 22]

struct MyType            #change to `mutable struct` to be able to change it later
   x                     #can be anything
   s::String             #has to be a String
   z::Array              #has to be an Array
end

x1 = MyType(a,str,C)    #has to specify all arguments

println("x1: ",x1)
println("x1.s: ",x1.s)

#x1 = MyType(1:10,10,[1;2])      #error since 10 is not a string
#x1.x = 3                        #error since we cannot change

x1: MyType(1:10, "Hazel", [11 12; 21 22])
x1.s: Hazel


## A Potential Pitfall in Using Arrays in Structures (extra)

It is also possible to specify array types (for instance, `z::Array{Float64}` instead of just `z::Array`). This has the effect of converting (if possible) an input array to Float64. While this might have its uses, it also comes with a potential drawback: the conversion breaks the link between the input array and the array inside `MyType`.

### DataFrames and Other Things

See [DataFrames.jl](https://juliadata.github.io/DataFrames.jl/stable/) for how to work with DataFrames.

# A Potential Pitfall when Using an Array in another Data Container (extra)

Suppose you create an array of arrays  (or a tuple or a dictionary) called `y`, and that the array `C` is one of the elements.

If you later change *elements* of `C` then it will affect `y` as well (and vice versa). This happens with *arrays*, since they are designed to conserve memory space. For instance, even if `C` is a very large array (several GB, say), creating `y=["hello",C]` will require very little additional memory space.

If you want an independent copy, use `copy(C)`, for instance, `y=["hello",copy(C)]`.

In contrast, if you change the shape of `C` then it will *not* affect `y` (but you don't save any memory).

In [17]:
a   = 1:10
str = "Hazel"
C   = [11 12;21 22]

x = [a,str,C]
t = (a,str,C)
d = Dict(:a=>a,:str=>str,:C=>C)
e = MyType(a,str,C)

C[1,1] = -999                  #changing an element of C affects x,t,d,e

display(x)
display(t)
display(d)
display(e)

3-element Vector{Any}:
 1:10
 "Hazel"
 [-999 12; 21 22]

(1:10, "Hazel", [-999 12; 21 22])

Dict{Symbol, Any} with 3 entries:
  :a   => 1:10
  :str => "Hazel"
  :C   => [-999 12; 21 22]

MyType(1:10, "Hazel", [-999 12; 21 22])

In [18]:
C = 0               #changing the shape of C does not affect x,t,d
display(t)

(1:10, "Hazel", [-999 12; 21 22])