forked from nltk/nltk.github.com
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdata.html
More file actions
161 lines (147 loc) · 13 KB
/
data.html
File metadata and controls
161 lines (147 loc) · 13 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>Installing NLTK Data — NLTK 3.5 documentation</title>
<link rel="stylesheet" href="_static/agogo.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Contribute to NLTK" href="contribute.html" />
<link rel="prev" title="Installing NLTK" href="install.html" />
</head><body>
<div class="header-wrapper" role="banner">
<div class="header">
<div class="headertitle"><a
href="index.html">NLTK 3.5 documentation</a></div>
<div class="rel" role="navigation" aria-label="related navigation">
<a href="install.html" title="Installing NLTK"
accesskey="P">previous</a> |
<a href="contribute.html" title="Contribute to NLTK"
accesskey="N">next</a> |
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |
<a href="genindex.html" title="General Index"
accesskey="I">index</a>
</div>
</div>
</div>
<div class="content-wrapper">
<div class="content">
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="installing-nltk-data">
<h1>Installing NLTK Data<a class="headerlink" href="#installing-nltk-data" title="Permalink to this headline">¶</a></h1>
<p>NLTK comes with many corpora, toy grammars, trained models, etc. A complete list is posted at: <a class="reference external" href="http://nltk.org/nltk_data/">http://nltk.org/nltk_data/</a></p>
<p>To install the data, first install NLTK (see <a class="reference external" href="http://nltk.org/install.html">http://nltk.org/install.html</a>), then use NLTK’s data downloader as described below.</p>
<p>Apart from individual data packages, you can download the entire collection (using “all”), or just the data required for the examples and exercises in the book (using “book”), or just the corpora and no grammars or trained models (using “all-corpora”).</p>
<div class="section" id="interactive-installer">
<h2>Interactive installer<a class="headerlink" href="#interactive-installer" title="Permalink to this headline">¶</a></h2>
<p><em>For central installation on a multi-user machine, do the following from an administrator account.</em></p>
<p>Run the Python interpreter and type the commands:</p>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">nltk</span>
<span class="gp">>>> </span><span class="n">nltk</span><span class="o">.</span><span class="n">download</span><span class="p">()</span>
</pre></div>
</div>
<p>A new window should open, showing the NLTK Downloader. Click on the File menu and select Change Download Directory. For central installation, set this to <code class="docutils literal notranslate"><span class="pre">C:\nltk_data</span></code> (Windows), <code class="docutils literal notranslate"><span class="pre">/usr/local/share/nltk_data</span></code> (Mac), or <code class="docutils literal notranslate"><span class="pre">/usr/share/nltk_data</span></code> (Unix). Next, select the packages or collections you want to download.</p>
<p>If you did not install the data to one of the above central locations, you will need to set the <code class="docutils literal notranslate"><span class="pre">NLTK_DATA</span></code> environment variable to specify the location of the data. (On a Windows machine, right click on “My Computer” then select <code class="docutils literal notranslate"><span class="pre">Properties</span> <span class="pre">></span> <span class="pre">Advanced</span> <span class="pre">></span> <span class="pre">Environment</span> <span class="pre">Variables</span> <span class="pre">></span> <span class="pre">User</span> <span class="pre">Variables</span> <span class="pre">></span> <span class="pre">New...</span></code>)</p>
<p>Test that the data has been installed as follows. (This assumes you downloaded the Brown Corpus):</p>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">nltk.corpus</span> <span class="kn">import</span> <span class="n">brown</span>
<span class="gp">>>> </span><span class="n">brown</span><span class="o">.</span><span class="n">words</span><span class="p">()</span>
<span class="go">['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]</span>
</pre></div>
</div>
<div class="section" id="installing-via-a-proxy-web-server">
<h3>Installing via a proxy web server<a class="headerlink" href="#installing-via-a-proxy-web-server" title="Permalink to this headline">¶</a></h3>
<p>If your web connection uses a proxy server, you should specify the proxy address as follows. In the case of an authenticating proxy, specify a username and password. If the proxy is set to None then this function will attempt to detect the system proxy.</p>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">nltk</span><span class="o">.</span><span class="n">set_proxy</span><span class="p">(</span><span class="s1">'http://proxy.example.com:3128'</span><span class="p">,</span> <span class="p">(</span><span class="s1">'USERNAME'</span><span class="p">,</span> <span class="s1">'PASSWORD'</span><span class="p">))</span>
<span class="gp">>>> </span><span class="n">nltk</span><span class="o">.</span><span class="n">download</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="command-line-installation">
<h2>Command line installation<a class="headerlink" href="#command-line-installation" title="Permalink to this headline">¶</a></h2>
<p>The downloader will search for an existing <code class="docutils literal notranslate"><span class="pre">nltk_data</span></code> directory to install NLTK data. If one does not exist it will attempt to create one in a central location (when using an administrator account) or otherwise in the user’s filespace. If necessary, run the download command from an administrator account, or using sudo. The recommended system location is <code class="docutils literal notranslate"><span class="pre">C:\nltk_data</span></code> (Windows); <code class="docutils literal notranslate"><span class="pre">/usr/local/share/nltk_data</span></code> (Mac); and <code class="docutils literal notranslate"><span class="pre">/usr/share/nltk_data</span></code> (Unix). You can use the <code class="docutils literal notranslate"><span class="pre">-d</span></code> flag to specify a different location (but if you do this, be sure to set the <code class="docutils literal notranslate"><span class="pre">NLTK_DATA</span></code> environment variable accordingly).</p>
<p>Run the command <code class="docutils literal notranslate"><span class="pre">python</span> <span class="pre">-m</span> <span class="pre">nltk.downloader</span> <span class="pre">all</span></code>. To ensure central installation, run the command <code class="docutils literal notranslate"><span class="pre">sudo</span> <span class="pre">python</span> <span class="pre">-m</span> <span class="pre">nltk.downloader</span> <span class="pre">-d</span> <span class="pre">/usr/local/share/nltk_data</span> <span class="pre">all</span></code>.</p>
<p>Windows: Use the “Run…” option on the Start menu. Windows Vista users need to first turn on this option, using <code class="docutils literal notranslate"><span class="pre">Start</span> <span class="pre">-></span> <span class="pre">Properties</span> <span class="pre">-></span> <span class="pre">Customize</span></code> to check the box to activate the “Run…” option.</p>
<p>Test the installation: Check that the user environment and privileges are set correctly by logging in to a user account,
starting the Python interpreter, and accessing the Brown Corpus (see the previous section).</p>
</div>
<div class="section" id="manual-installation">
<h2>Manual installation<a class="headerlink" href="#manual-installation" title="Permalink to this headline">¶</a></h2>
<p>Create a folder <code class="docutils literal notranslate"><span class="pre">nltk_data</span></code>, e.g. <code class="docutils literal notranslate"><span class="pre">C:\nltk_data</span></code>, or <code class="docutils literal notranslate"><span class="pre">/usr/local/share/nltk_data</span></code>,
and subfolders <code class="docutils literal notranslate"><span class="pre">chunkers</span></code>, <code class="docutils literal notranslate"><span class="pre">grammars</span></code>, <code class="docutils literal notranslate"><span class="pre">misc</span></code>, <code class="docutils literal notranslate"><span class="pre">sentiment</span></code>, <code class="docutils literal notranslate"><span class="pre">taggers</span></code>, <code class="docutils literal notranslate"><span class="pre">corpora</span></code>,
<code class="docutils literal notranslate"><span class="pre">help</span></code>, <code class="docutils literal notranslate"><span class="pre">models</span></code>, <code class="docutils literal notranslate"><span class="pre">stemmers</span></code>, <code class="docutils literal notranslate"><span class="pre">tokenizers</span></code>.</p>
<p>Download individual packages from <code class="docutils literal notranslate"><span class="pre">http://nltk.org/nltk_data/</span></code> (see the “download” links).
Unzip them to the appropriate subfolder. For example, the Brown Corpus, found at:
<code class="docutils literal notranslate"><span class="pre">https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/brown.zip</span></code>
is to be unzipped to <code class="docutils literal notranslate"><span class="pre">nltk_data/corpora/brown</span></code>.</p>
<p>Set your <code class="docutils literal notranslate"><span class="pre">NLTK_DATA</span></code> environment variable to point to your top level <code class="docutils literal notranslate"><span class="pre">nltk_data</span></code> folder.</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="sidebar">
<h3>Table of Contents</h3>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="news.html">NLTK News</a></li>
<li class="toctree-l1"><a class="reference internal" href="install.html">Installing NLTK</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Installing NLTK Data</a></li>
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contribute to NLTK</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/nltk/nltk/wiki/FAQ">FAQ</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/nltk/nltk/wiki">Wiki</a></li>
<li class="toctree-l1"><a class="reference internal" href="api/nltk.html">API</a></li>
<li class="toctree-l1"><a class="reference external" href="http://www.nltk.org/howto">HOWTO</a></li>
</ul>
<div role="search">
<h3 style="margin-top: 1.5em;">Search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
<div class="footer-wrapper">
<div class="footer">
<div class="left">
<div role="navigation" aria-label="related navigaton">
<a href="install.html" title="Installing NLTK"
>previous</a> |
<a href="contribute.html" title="Contribute to NLTK"
>next</a> |
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |
<a href="genindex.html" title="General Index"
>index</a>
</div>
<div role="note" aria-label="source link">
<br/>
<a href="_sources/data.rst.txt"
rel="nofollow">Show Source</a>
</div>
</div>
<div class="right">
<div class="footer" role="contentinfo">
© Copyright 2020, NLTK Project.
Last updated on Apr 13, 2020.
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 2.4.4.
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</body>
</html>