From 07ed9a0cce7ec23aa4cf0bcf69684f0a72a2d784 Mon Sep 17 00:00:00 2001 From: Amit Rathi Date: Tue, 2 Jul 2019 16:14:12 +0530 Subject: [PATCH 1/3] Delete and merge cells --- .../02.06-Boolean-Arrays-and-Masks.ipynb | 45 ++++--------------- 1 file changed, 8 insertions(+), 37 deletions(-) diff --git a/notebooks/02.06-Boolean-Arrays-and-Masks.ipynb b/notebooks/02.06-Boolean-Arrays-and-Masks.ipynb index f3bfb6800..5511e5fd9 100644 --- a/notebooks/02.06-Boolean-Arrays-and-Masks.ipynb +++ b/notebooks/02.06-Boolean-Arrays-and-Masks.ipynb @@ -8,28 +8,15 @@ "\n", "*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n", "\n", - "*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "< [Computation on Arrays: Broadcasting](02.05-Computation-on-arrays-broadcasting.ipynb) | [Contents](Index.ipynb) | [Fancy Indexing](02.07-Fancy-Indexing.ipynb) >" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Comparisons, Masks, and Boolean Logic" + "*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*\n", + "Added above deleting 2 cells" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ + "Added below deleting 2 cells\n", "This section covers the use of Boolean masks to examine and manipulate values within NumPy arrays.\n", "Masking comes up when you want to extract, modify, count, or otherwise manipulate values in an array based on some criterion: for example, you might wish to count all values greater than a certain value, or perhaps remove all outliers that are above some threshold.\n", "In NumPy, Boolean masking is often the most efficient way to accomplish these types of tasks." @@ -68,27 +55,9 @@ "# use pandas to extract rainfall inches as a NumPy array\n", "rainfall = pd.read_csv('data/Seattle2014.csv')['PRCP'].values\n", "inches = rainfall / 254.0 # 1/10mm -> inches\n", - "inches.shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The array contains 365 values, giving daily rainfall in inches from January 1 to December 31, 2014.\n", - "\n", - "As a first quick visualization, let's look at the histogram of rainy days, which was generated using Matplotlib (we will explore this tool more fully in [Chapter 4](04.00-Introduction-To-Matplotlib.ipynb)):" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ + "inches.shape\n", "%matplotlib inline\n", - "import matplotlib.pyplot as plt\n", - "import seaborn; seaborn.set() # set plot styles" + "import matplotlib.pyplot as plt" ] }, { @@ -108,6 +77,7 @@ } ], "source": [ + "import seaborn; seaborn.set() # set plot styles\n", "plt.hist(inches, 40);" ] }, @@ -1177,6 +1147,7 @@ ], "metadata": { "anaconda-cloud": {}, + "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python 3", "language": "python", @@ -1192,7 +1163,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.5" + "version": "3.6.8" } }, "nbformat": 4, From f0daf11ef30dd5d56d2507c99e347a073a8558b8 Mon Sep 17 00:00:00 2001 From: Amit Rathi Date: Wed, 3 Jul 2019 12:12:37 +0530 Subject: [PATCH 2/3] Changes larger than git diff context of 3 lines --- .../02.06-Boolean-Arrays-and-Masks.ipynb | 31 +------------------ 1 file changed, 1 insertion(+), 30 deletions(-) diff --git a/notebooks/02.06-Boolean-Arrays-and-Masks.ipynb b/notebooks/02.06-Boolean-Arrays-and-Masks.ipynb index 5511e5fd9..836281f71 100644 --- a/notebooks/02.06-Boolean-Arrays-and-Masks.ipynb +++ b/notebooks/02.06-Boolean-Arrays-and-Masks.ipynb @@ -797,36 +797,7 @@ } ], "source": [ - "x[x < 5]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "What is returned is a one-dimensional array filled with all the values that meet this condition; in other words, all the values in positions at which the mask array is ``True``.\n", - "\n", - "We are then free to operate on these values as we wish.\n", - "For example, we can compute some relevant statistics on our Seattle rain data:" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Median precip on rainy days in 2014 (inches): 0.194881889764\n", - "Median precip on summer days in 2014 (inches): 0.0\n", - "Maximum precip on summer days in 2014 (inches): 0.850393700787\n", - "Median precip on non-summer rainy days (inches): 0.200787401575\n" - ] - } - ], - "source": [ + "x[x < 5]\n", "# construct a mask of all rainy days\n", "rainy = (inches > 0)\n", "\n", From ffe7e02689646abe4be2f4fd9b2aff91f90ab320 Mon Sep 17 00:00:00 2001 From: Amit Rathi Date: Fri, 5 Jul 2019 23:09:45 +0530 Subject: [PATCH 3/3] Structural changes to see outputs --- notebooks/02.08-Sorting.ipynb | 173 ++++------------------------------ 1 file changed, 19 insertions(+), 154 deletions(-) diff --git a/notebooks/02.08-Sorting.ipynb b/notebooks/02.08-Sorting.ipynb index fdb02d56c..82b007b37 100644 --- a/notebooks/02.08-Sorting.ipynb +++ b/notebooks/02.08-Sorting.ipynb @@ -113,32 +113,7 @@ "output_type": "execute_result" } ], - "source": [ - "x = np.array([2, 1, 4, 3, 5])\n", - "bogosort(x)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This silly sorting method relies on pure chance: it repeatedly applies a random shuffling of the array until the result happens to be sorted.\n", - "With an average scaling of $\\mathcal{O}[N \\times N!]$, (that's *N* times *N* factorial) this should–quite obviously–never be used for any real computation.\n", - "\n", - "Fortunately, Python contains built-in sorting algorithms that are *much* more efficient than either of the simplistic algorithms just shown. We'll start by looking at the Python built-ins, and then take a look at the routines included in NumPy and optimized for NumPy arrays." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Fast Sorting in NumPy: ```np.sort``` and ``np.argsort``\n", - "\n", - "Although Python has built-in `sort` and ``sorted`` functions to work with lists, we won't discuss them here because NumPy's ``np.sort`` function turns out to be much more efficient and useful for our purposes.\n", - "By default ``np.sort`` uses an $\\mathcal{O}[N\\log N]$, *quicksort* algorithm, though *mergesort* and *heapsort* are also available. For most applications, the default quicksort is more than sufficient.\n", - "\n", - "To return a sorted version of the array without modifying the input, you can use ``np.sort``:" - ] + "source": [] }, { "cell_type": "code", @@ -159,7 +134,10 @@ ], "source": [ "x = np.array([2, 1, 4, 3, 5])\n", - "np.sort(x)" + "bogosort(x)\n", + "x = np.array([2, 1, 4, 3, 5])\n", + "np.sort(x)\n", + "# 2 lines got copied below, top one has array output (intact) this one should have error output" ] }, { @@ -274,55 +252,12 @@ "source": [ "rand = np.random.RandomState(42)\n", "X = rand.randint(0, 10, (4, 6))\n", - "print(X)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([[2, 1, 4, 0, 1, 5],\n", - " [5, 2, 5, 4, 3, 7],\n", - " [6, 3, 7, 4, 6, 7],\n", - " [7, 6, 7, 4, 9, 9]])" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ + "print(X)\n", "# sort each column of X\n", - "np.sort(X, axis=0)" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([[3, 4, 6, 6, 7, 9],\n", - " [2, 3, 4, 6, 7, 7],\n", - " [1, 2, 4, 5, 7, 7],\n", - " [0, 1, 4, 5, 5, 9]])" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ + "np.sort(X, axis=0)\n", "# sort each row of X\n", - "np.sort(X, axis=1)" + "np.sort(X, axis=1)\n", + "# source from following 2 cells got appended here, output as is" ] }, { @@ -416,22 +351,6 @@ "Using the standard convention, we'll arrange these in a $10\\times 2$ array:" ] }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "X = rand.rand(10, 2)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To get an idea of how these points look, let's quickly scatter plot them:" - ] - }, { "cell_type": "code", "execution_count": 15, @@ -449,10 +368,12 @@ } ], "source": [ + "X = rand.rand(10, 2)\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import seaborn; seaborn.set() # Plot styling\n", - "plt.scatter(X[:, 0], X[:, 1], s=100);" + "plt.scatter(X[:, 0], X[:, 1], s=100);\n", + "# line is copied down, should have image output (previous cell had array output, got removed)" ] }, { @@ -573,67 +494,6 @@ "dist_sq.diagonal()" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It checks out!\n", - "With the pairwise square-distances converted, we can now use ``np.argsort`` to sort along each row. The leftmost columns will then give the indices of the nearest neighbors:" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[[0 3 9 7 1 4 2 5 6 8]\n", - " [1 4 7 9 3 6 8 5 0 2]\n", - " [2 1 4 6 3 0 8 9 7 5]\n", - " [3 9 7 0 1 4 5 8 6 2]\n", - " [4 1 8 5 6 7 9 3 0 2]\n", - " [5 8 6 4 1 7 9 3 2 0]\n", - " [6 8 5 4 1 7 9 3 2 0]\n", - " [7 9 3 1 4 0 5 8 6 2]\n", - " [8 5 6 4 1 7 9 3 2 0]\n", - " [9 7 3 0 1 4 5 8 6 2]]\n" - ] - } - ], - "source": [ - "nearest = np.argsort(dist_sq, axis=1)\n", - "print(nearest)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Notice that the first column gives the numbers 0 through 9 in order: this is due to the fact that each point's closest neighbor is itself, as we would expect.\n", - "\n", - "By using a full sort here, we've actually done more work than we need to in this case. If we're simply interested in the nearest $k$ neighbors, all we need is to partition each row so that the smallest $k + 1$ squared distances come first, with larger distances filling the remaining positions of the array. We can do this with the ``np.argpartition`` function:" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [], - "source": [ - "K = 2\n", - "nearest_partition = np.argpartition(dist_sq, K + 1, axis=1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In order to visualize this network of neighbors, let's quickly plot the points along with lines representing the connections from each point to its two nearest neighbors:" - ] - }, { "cell_type": "code", "execution_count": 2, @@ -652,6 +512,10 @@ } ], "source": [ + "nearest = np.argsort(dist_sq, axis=1)\n", + "print(nearest)\n", + "K = 2\n", + "nearest_partition = np.argpartition(dist_sq, K + 1, axis=1)\n", "plt.scatter(X[:, 0], X[:, 1], s=100)\n", "\n", "# draw lines from each point to its two nearest neighbors\n", @@ -661,7 +525,8 @@ " for j in nearest_partition[i, :K+1]:\n", " # plot a line from X[i] to X[j]\n", " # use some zip magic to make it happen:\n", - " plt.plot(*zip(X[j], X[i]), color='black')" + " plt.plot(*zip(X[j], X[i]), color='black')\n", + "# source from earlier 2 cells got merged here. Earlier cell+outputs deleted, error output for this one." ] }, { @@ -729,7 +594,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.4" + "version": "3.6.8" } }, "nbformat": 4,