HSG-MCS-HS21_Julia/JuliaTutorial-master/Tutorial_10_Strings.ipynb

487 lines
11 KiB
Plaintext
Raw Permalink Normal View History

2021-11-15 20:14:51 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Strings\n",
"\n",
"This notebook demonstrates some basic string commands."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Packages and Extra Functions"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"printyellow (generic function with 1 method)"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"using Printf\n",
"\n",
"include(\"jlFiles/printmat.jl\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# String Basics\n",
"\n",
"The next few cells show how to\n",
"\n",
"1. combine several strings into one string by `string(str1,str2)` or `str1 * str2`.\n",
"\n",
"2. test if a string contains a specific substring\n",
"\n",
"3. replace part of a string with something else\n",
"\n",
"4. split a string into a vector of words (and then to join them back into a string again)\n",
"\n",
"5. sort a vector of words in alphabetical order"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello world!\n",
"Where are you?\n"
]
}
],
"source": [
"str1 = \"Hello\"\n",
"str2 = \"world!\\n\"\n",
"str3 = \"Where are you?\"\n",
"\n",
"str3b = string(str1,\" \",str2,str3) #combine into one string\n",
"println(str3b)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Highway 62 Revisited\n",
"\u001b[34m\u001b[1mcontains the word Highway\u001b[22m\u001b[39m\n",
"\n",
"\u001b[34m\u001b[1mNew, better string after a replacement: \u001b[22m\u001b[39m\n",
"Highway 61 Revisited\n"
]
}
],
"source": [
"str4 = \"Highway 62 Revisited\"\n",
"\n",
"if occursin(\"Highway\",str4)\n",
" println(str4)\n",
" printblue(\"contains the word Highway\")\n",
"end\n",
"\n",
"str4 = replace(str4,\"62\" => \"61\")\n",
"printblue(\"\\nNew, better string after a replacement: \")\n",
"println(str4)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[34m\u001b[1msplit a string into a vector of words:\u001b[22m\u001b[39m\n",
" Highway\n",
" 61\n",
" Revisited\n",
"\n",
"\n",
"\u001b[34m\u001b[1mand join the words again into a string:\u001b[22m\u001b[39m\n",
"Highway 61 Revisited\n"
]
}
],
"source": [
"words = split(str4)\n",
"printblue(\"split a string into a vector of words:\")\n",
"printmat(words)\n",
"\n",
"printblue(\"\\nand join the words again into a string:\")\n",
"println(join(words,\" \"))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[34m\u001b[1msort the words alphabetically:\u001b[22m\u001b[39m\n",
" 61\n",
" Highway\n",
" Revisited\n",
"\n"
]
}
],
"source": [
"printblue(\"sort the words alphabetically:\")\n",
"printmat(sort(words,lt=isless))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reading an Entire File as a String\n",
"\n",
"The next cell reads a file into one single string. It keeps the formatting (spaces, line breaks etc). "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"txtFile = \"Data/FileWithText.txt\"\n",
"\n",
"fh1 = open(txtFile) #open the file, can then refer to it as fh1\n",
" str = read(fh1,String) #read as string\n",
"close(fh1)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"String\n",
"\n",
"Dogs are nicer\n",
"than cats.\n",
" \n",
" This\n",
" is a\n",
"fairly short file.\n"
]
}
],
"source": [
"println(typeof(str),\"\\n\")\n",
"\n",
"println(str) #Printing the string read from a file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reading all Lines of a File into an Array of Strings\n",
"\n",
"The next cell reads a file into an array of strings: one string per line of the file. The second cell joins the lines into one string."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dogs are nicer\n",
"than cats.\n",
" \n",
" This\n",
" is a\n",
"fairly short file.\n",
"\n"
]
}
],
"source": [
"fh2 = open(txtFile)\n",
" lines = readlines(fh2)\n",
"close(fh2)\n",
"\n",
"printmat(lines)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dogs are nicer\n",
"than cats.\n",
" \n",
" This\n",
" is a\n",
"fairly short file.\n"
]
}
],
"source": [
"linesJoined = join(lines,\"\\n\") #join the lines of the array,\n",
"println(linesJoined) # \"\\n\" to create line breaks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Strings and Indexing (extra)\n",
"\n",
"can be tricky when the string contains non-ascii characters.\n",
"\n",
"Notice that you cannot change a string by indexing. For instance, `str[1] = \"D\"` does not work. However, you can *read* strings by indexing, if you are careful.\n",
"\n",
"The next cell gives two versions of a string. Try running the subsequent cells for both versions."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Dx = -0.9x\""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"str = \"Dx = -0.9x\"\n",
"#str = \"Δx = -0.9x\" #uncomment this and re-run the cells below"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"D\n",
"x\n"
]
}
],
"source": [
"println(str[1]) #works\n",
"println(str[2]) #might not work, depending on the contents of string"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If `str[2]` does not work (it does not if the first character is Δ), then that is due to the fact that the first character takes more than one byte to store. Julia has commands to get around this. For instance, see the next cell."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x\n"
]
}
],
"source": [
" #this should work in all cases\n",
"println(str[nextind(str,1)]) #nextind() gives the starting point of the next character"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Looping over All Characters in a String\n",
"\n",
"in a way that works even if there are some non-ascii characters."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 D\n",
"2 x\n",
"3 \n",
"4 =\n",
"5 \n",
"6 -\n",
"7 0\n",
"8 .\n",
"9 9\n",
"10 x\n"
]
}
],
"source": [
"i = 1\n",
"for c in str #alternatively, while i <= lastindex(str)\n",
" #global i #only needed in script \n",
" println(i,\" \",c)\n",
" i = nextind(str,i) #nextind() gives the starting point of the next character\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Creating a Long String (extra)\n",
"\n",
"can be done with `string()`, but it is often quicker to write to an `IOBuffer()`. \n",
"\n",
"Both approaches are demonstrated below by combining a vector of words into a string. (This is just meant as an illustration since `join(txt,\" \")` would here be a better way to achieve the same thing.)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"txt = [\"The\",\"highway\",\"is\",\"for\",\"gamblers,\",\"better\",\"use\",\"your\",\"sense\\n\", #a vector of words from a song\n",
" \"Take\",\"what\",\"you\",\"have\",\"gathered\",\"from\",\"coincidence\"];"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" The highway is for gamblers, better use your sense\n",
" Take what you have gathered from coincidence\n"
]
}
],
"source": [
"BabyBlue1 = \"\" #an empty string\n",
"for i = 1:length(txt)\n",
" #global BabyBlue1 #only needed in script\n",
" BabyBlue1 = string(BabyBlue1,\" \",txt[i]) #add to the string\n",
"end\n",
"println(BabyBlue1)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" The highway is for gamblers, better use your sense\n",
" Take what you have gathered from coincidence\n"
]
}
],
"source": [
"iob = IOBuffer() #an IOBuffer\n",
"for i = 1:length(txt)\n",
" write(iob,\" \",txt[i]) #write to the buffer\n",
"end\n",
"BabyBlue2 = String(take!(iob)) #convert to a string\n",
"println(BabyBlue2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"@webio": {
"lastCommId": null,
"lastKernelId": null
},
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Julia 1.6.1",
"language": "julia",
"name": "julia-1.6"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 1
}