HSG-MCS-HS21_Julia/JuliaTutorial-master/Tutorial_03b_DataContainers.ipynb
2021-11-15 21:14:51 +01:00

813 lines
20 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Data Containers\n",
"\n",
"This notebook shows how to combine data into different types of \"containers\" (arrays, dictionaries, tuples, ...) inside your program."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Packages and Extra Functions"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"printyellow (generic function with 1 method)"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"using Printf\n",
"\n",
"include(\"jlFiles/printmat.jl\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Arrays\n",
"\n",
"are used everywhere in finance and statistics/econometrics."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Vectors, Matrices and High-dimensional Arrays\n",
"\n",
"can be created in many ways: the code below demonstrates just a few of them. See the tutorial on Arrays for (many) more details.\n",
"\n",
"To access an array element, just do `A[2]` or similarly. Also, you can change an array element as in `B[2,1] = -999.`\n",
"\n",
"Notice that `D = [A B]` creates an independent copy, so later changing `B` does not affect `D`. However, if we define `E = B`, then a change of `B` will affect both itself and `E`.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 100 \n",
" 101 \n",
"\n",
" 1 2 \n",
" 0 10 \n",
"\n",
" 100 1 2 \n",
" 101 0 10 \n",
"\n"
]
}
],
"source": [
"A = [100,101] #a vector\n",
"printmat(A) #or display(A)\n",
"\n",
"B = [1 2; #a matrix\n",
" 0 10]\n",
"printmat(B)\n",
"\n",
"D = [A B] #a 2x3 matrix\n",
"printmat(D)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"A[2] is 101\n",
"\n",
"B is now\n",
" 1 2 \n",
" -999 10 \n",
"\n",
"\n",
"D is not affected\n",
" 100 1 2 \n",
" 101 0 10 \n",
"\n"
]
}
],
"source": [
"println(\"A[2] is \",A[2]) #access an element\n",
"\n",
"B[2,1] = -999 #change an element\n",
"println(\"\\nB is now\")\n",
"printmat(B)\n",
"\n",
"println(\"\\nD is not affected\")\n",
"printmat(D) #D is not changed when B is"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4×3×2 Array{Float64, 3}:\n",
"[:, :, 1] =\n",
" 0.0822035 0.727781 0.616084\n",
" 0.700477 0.148329 0.0486485\n",
" 0.497247 0.0274626 0.233258\n",
" 0.908952 0.0319028 0.63753\n",
"\n",
"[:, :, 2] =\n",
" 0.991477 0.725542 0.505041\n",
" 0.999608 0.801481 0.636564\n",
" 0.841383 0.160885 0.691288\n",
" 0.473578 0.14428 0.724725"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"C = rand(4,3,2) #a 4x3x3 array\n",
"display(C)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Arrays of Arrays (or other types)\n",
"\n",
"You can store very different things (a mixture of numbers, matrices, strings) in an array. For instance, if `a` is a vector, `str` is a string and `C` is a matrix, then `x = [a,str,C]` puts them into a vector.\n",
"\n",
"If you later change elements of the matrix `C` then it will affect `x` (discussed at the end of the notebook). "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"1:10"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"\"Hazel\""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"2×2 Matrix{Int64}:\n",
" 11 12\n",
" 21 22"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"a = 1:10\n",
"str = \"Hazel\"\n",
"C = [11 12;21 22]\n",
"x = [a,str,C] #element 1 of x is a\n",
"\n",
"foreach(display,x) #loops over the elements of x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tuples and Named Tuples\n",
"\n",
"are very useful for collecting very different types of data (a number, a string, and a couple of vectors, say). \n",
"\n",
"Once created, you cannot change tuples (they are immutable). (Exception: *changing elements of an array* that belongs to the tuple will affect the tuple too.)\n",
"\n",
"Tuples are often used as inputs or outputs of functions.\n",
"\n",
"The next few cells show how to create (named) tuples, how to extract parts of them and what happens when you to try to change them."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"(1:10, \"Hazel\", [11 12; 21 22])"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"(a = 1:10, str = \"Hazel\", C = [11 12; 21 22])"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"(a = 1:10, str = \"Hazel\", C = [11 12; 21 22])"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"a = 1:10 #how to create tuples and named tuples\n",
"str = \"Hazel\"\n",
"C = [11 12;21 22]\n",
"\n",
"t = (a,str,C) #a tuple, or tuple(a,str,C)\n",
"display(t)\n",
"\n",
"nt = (a=a,str=str,C=C) #a named tuple, (a2=a,str2=str,C2=C) would also work\n",
"display(nt)\n",
"\n",
"nt_ = (;a,str,C) #also a named tuple (Julia 1.5+), names are given by variables\n",
"display(nt_)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a2 and str2 are: 1:10 Hazel \n",
"\n",
"t[3] is [11 12; 21 22]\n",
"\n",
"nt.C is [11 12; 21 22]\n"
]
}
],
"source": [
"(a2,str2,C2) = t #extract the tuple into variables (\"destructuring\")\n",
"println(\"a2 and str2 are: $a2 $str2 \\n\")\n",
"\n",
"println(\"t[3] is \",t[3],\"\\n\") #can index into (tuple) t\n",
"\n",
"println(\"nt.C is \",nt.C) #we can use nt.C as a name (nt is a named tuple)\n",
"\n",
"#(a3,str3...) = t #in Julia 1.6, str3 will be a tuple "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"t[1] is 1:10\n",
"\n",
"nt.C is [11 12; 21 22]\n",
"\n"
]
}
],
"source": [
"println(\"t[1] is \",t[1],\"\\n\")\n",
"#t[1] = -999 #cannot change the tuple, uncomment to get an error\n",
"#t[4] = 34 #cannot add elements, uncomment to get an error\n",
"\n",
"println(\"nt.C is \",nt.C,\"\\n\")\n",
"#nt.a = -999 #cannot change the tuple\n",
"#n.D = 34 #cannot add elements"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Tuple Dynamically (extra)\n",
"\n",
"when the values and (perhaps also the names) are created dynamically in the program.\n",
"\n",
"Suppose `values` and `names` in the next cell may differ in length from one run of the program to the next. Using `tuple(values...)` and `NamedTuple{names}(values)` allows you to still create tuples/named tuples."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1:10, \"Hazel\", [11 12; 21 22])"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"(a = 1:10, b = \"Hazel\", c = [11 12; 21 22])"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"values = [a,str,C]\n",
"\n",
"t2 = tuple(values...) #or (values...,)\n",
"display(t2)\n",
"\n",
"names = (:a, :b, :c)\n",
"nt2 = NamedTuple{names}(values) #or (;zip(names,values)...)\n",
"display(nt2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dictionaries\n",
"\n",
"is a flexible way to collect different types of data. Dictionaries can (in contrast to tuples) be changed. Also, changing elements of an array that belongs to the dictionary will affect the dictionary too.\n",
"\n",
"A dictionary is organised as (key,value) pairs, where the key is the name of the element. You can loop over the elements (see below) and also change/add elements in a loop."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Dict{Symbol, Any} with 4 entries:\n",
" :a => -999\n",
" :verse2 => \"Stardust\"\n",
" :str => \"Hazel\"\n",
" :C => [11 12; 21 22]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"D[:C] is [11 12; 21 22]\n"
]
}
],
"source": [
"a = 1:10\n",
"str = \"Hazel\"\n",
"C = [11 12;21 22]\n",
"\n",
"D = Dict(:a=>a,:str=>str,:C=>C) #dictionary, \"a\" instead of :a works too\n",
"\n",
"println(\"D[:C] is \",D[:C])\n",
"\n",
"D[:a] = -999 #can change an element\n",
"\n",
"D[:verse2] = \"Stardust\" #can add an element\n",
"\n",
"display(D)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a: -999\n",
"verse2: Stardust\n",
"str: Hazel\n",
"C: [11 12; 21 22]\n"
]
}
],
"source": [
"for (key,value) in D #loop over a dictionary\n",
" println(\"$key: $value\")\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## From a Dict to a NamedTuple and Back Again (extra)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(a = -999, verse2 = \"Stardust\", str = \"Hazel\", C = [11 12; 21 22])"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"Dict{Symbol, Any} with 4 entries:\n",
" :a => -999\n",
" :verse2 => \"Stardust\"\n",
" :str => \"Hazel\"\n",
" :C => [11 12; 21 22]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"nt = (;D...) #create a named tuple from a dict\n",
"display(nt)\n",
"\n",
"D2 = Dict(pairs(nt)) #create a dict from a named tuple\n",
"display(D2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A Potential Pitfall in Adding to a Dict (extra)\n",
"\n",
"If you have created a dict with only numbers by \n",
"```\n",
"D = Dict(:aa=>1)\n",
"``` \n",
"then you cannot add eg. a string by `D[:cc] = \"hello\"` since `D` is only set up to accept variables of the type `Int`. "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Dict{Any, Any} with 2 entries:\n",
" :aa => 1\n",
" :cc => \"hello\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"D = Dict(:aa=>1)\n",
"#D[:cc] = \"hello\" #error since D only accepts Int\n",
"\n",
"D = Dict{Any,Any}(:aa=>1) #this works\n",
"D[:cc] = \"hello\"\n",
"display(D)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Dictionary Dynamically (extra)\n",
"\n",
"See below for examples.\n",
"\n",
"Remark: if you have the names as an array of strings (`names = [\"a\",\"b\",\"c\"]`), but want symbol names (`:a` etc), then use `Symbol.(names)`."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Dict{Symbol, Any} with 3 entries:\n",
" :a => 1:10\n",
" :b => \"Hazel\"\n",
" :c => [11 12; 21 22]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"names = (:a, :b, :c) #or [\"a\",\"b\",\"c\"]\n",
"values = [a,str,C]\n",
"\n",
"D = Dict(zip(names,values))\n",
"display(D)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Dict{Any, Any} with 3 entries:\n",
" :a => 1:10\n",
" :b => \"Hazel\"\n",
" :c => [11 12; 21 22]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"D = Dict() #empty dictionary\n",
"for i = 1:length(values) #loop\n",
" D[names[i]] = values[i] #add this to the dictionary\n",
"end\n",
"display(D)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Your Own Tailor Made Data Type\n",
"\n",
"It is sometime conventient to define your own `struct` as a container. The `struct` command creates an immutable type (you cannot change it, except for elements of arrays that belong to it). There is also a `mutable struct` approach."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x1: MyType(1:10, \"Hazel\", [11 12; 21 22])\n",
"x1.s: Hazel\n"
]
}
],
"source": [
"a = 1:10\n",
"str = \"Hazel\"\n",
"C = [11 12;21 22]\n",
"\n",
"struct MyType #change to `mutable struct` to be able to change it later\n",
" x #can be anything\n",
" s::String #has to be a String\n",
" z::Array #has to be an Array\n",
"end\n",
"\n",
"x1 = MyType(a,str,C) #has to specify all arguments\n",
"\n",
"println(\"x1: \",x1)\n",
"println(\"x1.s: \",x1.s)\n",
"\n",
"#x1 = MyType(1:10,10,[1;2]) #error since 10 is not a string\n",
"#x1.x = 3 #error since we cannot change"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A Potential Pitfall in Using Arrays in Structures (extra)\n",
"\n",
"It is also possible to specify array types (for instance, `z::Array{Float64}` instead of just `z::Array`). This has the effect of converting (if possible) an input array to Float64. While this might have its uses, it also comes with a potential drawback: the conversion breaks the link between the input array and the array inside `MyType`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### DataFrames and Other Things\n",
"\n",
"See [DataFrames.jl](https://juliadata.github.io/DataFrames.jl/stable/) for how to work with DataFrames."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# A Potential Pitfall when Using an Array in another Data Container (extra)\n",
"\n",
"Suppose you create an array of arrays (or a tuple or a dictionary) called `y`, and that the array `C` is one of the elements.\n",
"\n",
"If you later change *elements* of `C` then it will affect `y` as well (and vice versa). This happens with *arrays*, since they are designed to conserve memory space. For instance, even if `C` is a very large array (several GB, say), creating `y=[\"hello\",C]` will require very little additional memory space.\n",
"\n",
"If you want an independent copy, use `copy(C)`, for instance, `y=[\"hello\",copy(C)]`.\n",
"\n",
"In contrast, if you change the shape of `C` then it will *not* affect `y` (but you don't save any memory)."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"3-element Vector{Any}:\n",
" 1:10\n",
" \"Hazel\"\n",
" [-999 12; 21 22]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"(1:10, \"Hazel\", [-999 12; 21 22])"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"Dict{Symbol, Any} with 3 entries:\n",
" :a => 1:10\n",
" :str => \"Hazel\"\n",
" :C => [-999 12; 21 22]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"MyType(1:10, \"Hazel\", [-999 12; 21 22])"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"a = 1:10\n",
"str = \"Hazel\"\n",
"C = [11 12;21 22]\n",
"\n",
"x = [a,str,C]\n",
"t = (a,str,C)\n",
"d = Dict(:a=>a,:str=>str,:C=>C)\n",
"e = MyType(a,str,C)\n",
"\n",
"C[1,1] = -999 #changing an element of C affects x,t,d,e\n",
"\n",
"display(x)\n",
"display(t)\n",
"display(d)\n",
"display(e)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"(1:10, \"Hazel\", [-999 12; 21 22])"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"C = 0 #changing the shape of C does not affect x,t,d\n",
"display(t)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"@webio": {
"lastCommId": null,
"lastKernelId": null
},
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Julia 1.6.0",
"language": "julia",
"name": "julia-1.6"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 1
}