HSG-MCS-HS21_Julia/Problemsets/Solutions/PS09c_WarAndPeace_Solution.ipynb

390 lines
16 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"using Downloads"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Download a Book from Internet\n",
"\n",
"and read it into a string in Julia. Then report the number of letters etc (see below)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"File to download: http://www.gutenberg.org/files/2600/2600-0.txt\n",
"\n",
"check the subfolder Results\n",
"\n"
]
}
],
"source": [
"if !isdir(\"Results\")\n",
" error(\"create the subfolder Results before running this program\")\n",
"end\n",
"\n",
"http = \"http://www.gutenberg.org/files/2600/2600-0.txt\"\n",
"\n",
"println(\"File to download: \",http)\n",
"Downloads.download(http,\"Results/WarAndPeace.txt\")\n",
"\n",
"println(\"\\ncheck the subfolder Results\\n\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"fh = open(\"Results/WarAndPeace.txt\")\n",
"str = read(fh,String) \n",
"close(fh)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Task 1\n",
"\n",
"1. Count the number of letters and the unique letters in `str`. Hint: `length() and `unique()`\n",
"\n",
"2. Count the number of word and lines. Hint: `split(str)` and `split(str,\"\\n\")`\n",
"\n",
"3. Count the number of unique words."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"number of letters and unique letters in W&P: 3293555 113\n",
"\n",
"unique letters: ['\\ufeff', 'T', 'h', 'e', ' ', 'P', 'r', 'o', 'j', 'c', 't', 'G', 'u', 'n', 'b', 'g', 'B', 'k', 'f', 'W', 'a', 'd', ',', 'y', 'L', 'l', 's', '\\r', '\\n', 'i', 'w', 'U', 'S', 'm', 'p', 'v', '.', 'Y', '-', 'I', ':', 'A', 'M', 'R', 'D', '2', '0', '1', '[', '#', '6', ']', 'J', '9', 'E', 'C', 'F', '8', 'V', '*', 'O', 'H', 'N', 'K', '/', '5', 'X', '7', '3', '“', '', '—', '', '!', '?', '”', 'á', 'é', 'ë', 'í', ';', 'x', '(', ')', 'z', 'q', 'À', 'ó', 'ú', 'è', 'î', 'ô', 'à', 'ç', 'Q', 'â', 'ê', 'ï', 'Z', '4', 'ý', 'ö', 'ä', 'ü', 'Á', 'œ', 'É', '=', 'æ', '\"', '%', '\\'', '$']\n",
"\n",
"ASCII letters: ['T', 'h', 'e', ' ', 'P', 'r', 'o', 'j', 'c', 't', 'G', 'u', 'n', 'b', 'g', 'B', 'k', 'f', 'W', 'a', 'd', ',', 'y', 'L', 'l', 's', '\\r', '\\n', 'i', 'w', 'U', 'S', 'm', 'p', 'v', '.', 'Y', '-', 'I', ':', 'A', 'M', 'R', 'D', '2', '0', '1', '[', '#', '6', ']', 'J', '9', 'E', 'C', 'F', '8', 'V', '*', 'O', 'H', 'N', 'K', '/', '5', 'X', '7', '3', '!', '?', ';', 'x', '(', ')', 'z', 'q', 'Q', 'Z', '4', '=', '\"', '%', '\\'', '$']\n"
]
}
],
"source": [
"n = length(str)\n",
"letters = unique(str)\n",
"println(\"number of letters and unique letters in W&P: \", n,\" \",length(letters))\n",
"\n",
"println(\"\\nunique letters: \",letters)\n",
"\n",
"vv = isascii.(letters)\n",
"println(\"\\nASCII letters: \",letters[vv])"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"number of words in W&P, including pre-amble, etc: 566334\n",
"number of lines in W&P, including pre-amble, etc: 66033\n"
]
}
],
"source": [
"words = split(str)\n",
"println(\"number of words in W&P, including pre-amble, etc: \",length(words))\n",
"\n",
"lines = split(str,\"\\n\")\n",
"println(\"number of lines in W&P, including pre-amble, etc: \",length(lines))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"How many unique words are there in the file?\n",
"Number of unique words: 41971\n"
]
}
],
"source": [
"println(\"How many unique words are there in the file?\")\n",
"UniqueWords = unique(words)\n",
"\n",
"println(\"Number of unique words: \",length(UniqueWords))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Task 2\n",
"\n",
"1. How often is Borodinó mentioned. Hint: `occursin.(,words)`\n",
"\n",
"2. Print all lines that contain the word Borodinó. Hint: `occursin(,line)`"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"How often is Borodinó mentioned?\n",
"108\n",
"108\n"
]
}
],
"source": [
"println(\"How often is Borodinó mentioned?\") \n",
"\n",
"println(sum(occursin.(\"Borodinó\",words))) #\"Borodinó\" or \"Borodinó. or similarly\n",
"\n",
"println(sum(z->occursin(\"Borodinó\",z),words)) #quicker approach"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"print the line numbers and the lines that contain the word Borodinó:\n",
"\n",
"38971 they reached Borodinó, seventy miles from Moscow. From Vyázma Napoleon\n",
"41375 the twenty-sixth the battle of Borodinó itself took place.\n",
"41377 Why and how were the battles of Shevárdino and Borodinó given and\n",
"41378 accepted? Why was the battle of Borodinó fought? There was not the least\n",
"41399 Before the battle of Borodinó our strength in proportion to the French\n",
"41416 In giving and accepting battle at Borodinó, Kutúzov acted involuntarily\n",
"41427 On the other question, how the battle of Borodinó and the preceding\n",
"41434 position at Borodinó.\n",
"41438 to it, from Borodinó to Utítsa, at the very place where the battle was\n",
"41445 the field of Borodinó.\n",
"41451 during the retreat passed many positions better than Borodinó. They did\n",
"41457 and that the position at Borodinó (the one where the battle was fought),\n",
"41463 Borodinó to the left of, and at a right angle to, the highroad (that\n",
"41481 later, when reports on the battle of Borodinó were written at leisure,\n",
"41486 the battle of Borodinó was fought by us on an entrenched position\n",
"41493 village of Nóvoe, and the center at Borodinó at the confluence of the\n",
"41496 To anyone who looks at the field of Borodinó without thinking of how\n",
"41503 to Borodinó (he could not have seen that position because it did not\n",
"41514 Borodinó—a plain no more advantageous as a position than any other plain\n",
"41531 and chief action of the battle of Borodinó was already lost on the\n",
"41551 distinct from the main course of the battle.) So the battle of Borodinó\n",
"41554 army and people) it has been described. The battle of Borodinó was not\n",
"41557 Shevárdino Redoubt, the Russians fought the battle of Borodinó on an\n",
"41749 hundred paces in front of the knoll and below it. This was Borodinó.\n",
"41780 “Borodinó,” the other corrected him.\n",
"41814 entrenchments. There, you see? Theres our center, at Borodinó, just\n",
"41855 A church procession was coming up the hill from Borodinó. First along\n",
"42128 Borodinó and thence turned to the left, passing an enormous number of\n",
"42133 him than any other spot on the plain of Borodinó.\n",
"42643 On August 25, the eve of the battle of Borodinó, M. de Beausset, prefect\n",
"42949 The fourth order was: The vice-King will occupy the village (Borodinó)\n",
"42957 given him, he was to advance from the left through Borodinó to the\n",
"42962 not be executed. After passing through Borodinó the vice-King was driven\n",
"42982 Many historians say that the French did not win the battle of Borodinó\n",
"42993 battle of Borodinó, and if this or that other arrangement depended on\n",
"43015 men at Borodinó was not due to Napoleons will, though he ordered the\n",
"43022 At the battle of Borodinó Napoleon shot at no one and killed no one.\n",
"43026 The French soldiers went to kill and be killed at the battle of Borodinó\n",
"43065 than previous ones because the battle of Borodinó was the first Napoleon\n",
"43078 Napoleon at the battle of Borodinó fulfilled his office as\n",
"43286 Pierre most of all was the view of the battlefield itself, of Borodinó\n",
"43289 Above the Kolochá, in Borodinó and on both sides of it, especially to\n",
"43297 riverbanks and in Borodinó. A white church could be seen through the\n",
"43298 mist, and here and there the roofs of huts in Borodinó as well as dense\n",
"43301 the whole space. Just as in the mist-enveloped hollow near Borodinó, so\n",
"43386 bridge across the Kolochá between Górki and Borodinó, which the French\n",
"43387 (having occupied Borodinó) were attacking in the first phase of the\n",
"43819 The chief action of the battle of Borodinó was fought within the seven\n",
"43820 thousand feet between Borodinó and Bagratións flèches. Beyond that\n",
"43825 battlefield. On the field between Borodinó and the flèches, beside the\n",
"43834 troops advanced on Borodinó from their left.\n",
"43838 to Borodinó, so that Napoleon could not see what was happening there,\n",
"43886 Borodinó had been occupied and the bridge over the Kolochá was in the\n",
"43890 as soon in fact as the adjutant had left Borodinó—the bridge had been\n",
"44222 times repulsed. In the center the French had not got beyond Borodinó,\n",
"44301 of the field of Borodinó.\n",
"44821 hundreds of years the peasants of Borodinó, Górki, Shevárdino, and\n",
"44903 Russians at Borodinó. The French invaders, like an infuriated animal\n",
"44909 wound it had received at Borodinó. The direct consequence of the battle\n",
"44910 of Borodinó was Napoleons senseless flight from Moscow, his retreat\n",
"44913 which at Borodinó for the first time the hand of an opponent of stronger\n",
"45069 from Smolénsk to Borodinó. The French army pushed on to Moscow, its\n",
"45078 consolidated. At Borodinó a collision took place. Neither army was\n",
"45097 Russian army were convinced that the battle of Borodinó was a victory.\n",
"45185 the twenty-sixth at Borodinó, and each day and hour and minute of the\n",
"45186 retreat from Borodinó to Filí.\n",
"45449 After the battle of Borodinó the abandonment and burning of Moscow was\n",
"45491 that could happen. They went away even before the battle of Borodinó and\n",
"45873 Borodinó.\n",
"45881 Toward the end of the battle of Borodinó, Pierre, having run down\n",
"46434 and commotion. Every day thousands of men wounded at Borodinó were\n",
"46442 Some said there had been another battle after Borodinó at which the\n",
"47598 to the second of September, that is from the battle of Borodinó to the\n",
"48315 ever since the battle of Borodinó, for all the generals who came to\n",
"48353 if after the battle of Borodinó, when the surrender of Moscow became\n",
"49092 particularly of the battle of Borodinó and of that vague sense of his\n",
"50087 ambulance station on the field of Borodinó. His feverish state and the\n",
"50105 Borodinó. They were accompanied by a doctor, Prince Andrews valet, his\n",
"50835 battle of Borodinó, there was a soiree, the chief feature of which was\n",
"51280 A few days before the battle of Borodinó, Nicholas received the\n",
"51740 The dreadful news of the battle of Borodinó, of our losses in killed and\n",
"51747 When he received the news of the battle of Borodinó and the abandonment\n",
"53596 The historians consider that, next to the battle of Borodinó and the\n",
"53703 whole campaign and by the battle of Borodinó, the Russian army—when\n",
"53711 Borodinó had been a victory, he alone—who as commander in chief might\n",
"53715 The beast wounded at Borodinó was lying where the fleeing hunter had\n",
"54244 or deliberately deceive themselves. No battle—Tarútino, Borodinó, or\n",
"54858 of Borodinó. He had sought it in philanthropy, in Freemasonry, in the\n",
"55314 day long. At the battle of Borodinó, when Bagratión was killed and nine\n",
"55319 And the quiet little Dokhtúrov rode thither, and Borodinó became the\n",
"55543 The undecided question as to whether the wound inflicted at Borodinó was\n",
"55658 That army could not recover anywhere. Since the battle of Borodinó\n",
"55805 The Battle of Borodinó, with the occupation of Moscow that followed it\n",
"55840 history: to say that the field of battle at Borodinó remained in the\n",
"55844 After the French victory at Borodinó there was no general engagement nor\n",
"55855 The period of the campaign of 1812 from the battle of Borodinó to the\n",
"55895 retreats after battles, the blow dealt at Borodinó and the renewed\n",
"57692 done at Mozháysk after the battle of Borodinó.\n",
"58038 French had given battle at Borodinó, did not achieve its purpose when it\n",
"58063 enemy in full strength at Borodinó—defeated at Krásnoe and the Berëzina\n",
"58772 activity in 1812, never once swerving by word or deed from Borodinó to\n",
"58819 Beginning with the battle of Borodinó, from which time his disagreement\n",
"58820 with those about him began, he alone said that the battle of Borodinó\n",
"58845 this enemy of decisive action, gave battle at Borodinó, investing the\n",
"58848 contradiction to everyone else, declared till his death that Borodinó\n",
"59762 Borodinó for more than a month had recently died in the Rostóvs house\n",
"60187 one at Borodinó.\n",
"61413 the cold in his head at Borodinó to the sparks which set Moscow on\n"
]
}
],
"source": [
"println(\"print the line numbers and the lines that contain the word Borodinó:\\n\")\n",
"for (i,line) in enumerate(lines)\n",
" if occursin(\"Borodinó\",line)\n",
" println(i,\" \",line)\n",
" end\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Task 3\n",
"\n",
"1. Change Borodinó everywhere to Berëzina and then count the occurances"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Change Borodinó everywhere to Berëzina and then count the occurances\n",
"125\n",
"125\n"
]
}
],
"source": [
"println(\"Change Borodinó everywhere to Berëzina and then count the occurances\")\n",
"str2 = replace(str,\"Borodinó\"=>\"Berëzina\");\n",
"words2 = split(str2)\n",
"println(sum(occursin.(\"Berëzina\",words2)))\n",
"\n",
"println(sum(z->occursin(\"Berëzina\",z),words2)) #quicker approach"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"@webio": {
"lastCommId": null,
"lastKernelId": null
},
"anaconda-cloud": {},
"kernel_info": {
"name": "julia-1.2"
},
"kernelspec": {
"display_name": "Julia 1.7.0",
"language": "julia",
"name": "julia-1.7"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.7.0"
},
"nteract": {
"version": "0.24.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}