datalab is command-line Python client for NSF NOIRLab's Astro Data Lab.
It provides easy access to various Astro Data Lab functionalities including:
- synchronous and asynchronous database queries (TAP)
- your remote file storage (VOSpace)
- your remote database tables (MyDB)
>>> from dl import queryClient as qc
>>> result = qc.query(sql='SELECT TOP 5 ra,dec from smash_dr1.object')
>>> print(result)
ra,dec
296.0702105660565,-75.58008799398345
296.0689079309987,-75.57850708319104
296.0695746063349,-75.5771115243687
296.0734998386567,-75.57729189836104
296.074467291614,-75.57941799334213- An Astro Data Lab user account (You can request an account on the Astro Data Lab website).
- Python 3 (Python 3.11 recommended. Python >=3.9 required.)
- fuse or OSX-FUSE (only if you want to mount the remote storage as a local filesystem)
The astro-datalab package installs the datalab command line
client, and various additional Data Lab Python libraries that allow you to use
Astro Data Lab functionality locally on your computer (for instance in Ipython
etc.)
The easiest way to install the datalab client is via pip:
pip install --upgrade astro-datalabNote: You should periodically update your client via the command above to ensure you are using the latest version. Older versions of the client may not be supported.
You can also install the datalab client from source on
GitHub via the steps below:
-
Clone the repository and enter the directory:
git clone git@github.com:astro-datalab/datalab.git && \ cd datalab
-
Ensure you have the latest version of pip and setuptools:
python -m pip install --upgrade pip setuptools
-
Build the package:
python -m pip install build
python -m build
-
Install the package:
pip install dist/astro_datalab-<version>-py3-none-any.whl
If you want it installed in your private Python repository (because you maintain multiple Python instances on your machine), you can use the
--userflag:pip install --user dist/astro_datalab-<version>-py3-none-any.whl
Note: Replace
<version>in thepip installcommand with the actual version number of theastro_datalabpackage, such as2.23.0.
If you intend to mount the virtual storage as a local filesystem, you will need to touch a file in your home directory:
touch ~/.netrc- If the pip installation instructions below fail for you complaining about a missing library
libcurl4-openssl-dev, please install it using your software/package manager.
Users with macOS ARM architecture (M1/M2) may encounter issues when running or importing the datalab package. This is often due to a mismatch between the version of libcurl available at runtime and the version that pycurl was compiled against.
Common error message:
ImportError: pycurl: libcurl link-time version (7.77.0) is older than compile-time version (8.4.0)To check if this issue exists, you can run:
python -c "import dl; print(dl.__version__)"If you encounter the above error, follow these steps to resolve it:
-
Update your macOS and Xcode Tools
-
Update your conda, if you have one
-
Update Homebrew and
curl:brew update brew upgrade curl
-
Uninstall and Reinstall
pycurl:pip uninstall pycurl pip install --no-cache-dir pycurl
-
If the above steps do not work, use
condato installpycurl:conda install -c conda-forge pycurl
With version 2.20.0, the datalab package changed internal service
URLs to point to our new noirlab.edu domain. If you have an older version of
datalab installed, your local configuration file will need to be reinitialized
in order to use our new domain name (datalab.noirlab.edu).
To refresh the config, simply remove the old configuration file. The next time you
run a datalab command , a new configuration file will be generated:
rm $HOME/.datalab/dl.confAny datalab command will create a new config file eg.
datalab versionIn some cases you might need to go through the login process eg.
datalab loginTo check the currently installed version of datalab:
datalab --version
Task Version: 2.20.1To get a list of available datalab commands (tasks):
datalab --help
Usage:
% datalab <task> [task_options]
where <task> is one of:
cp - copy a file in Data Lab
dropdb - Drop a user MyDB table
get - get a file from Data Lab
listdb - List the user MyDB tables
ln - link a file in Data Lab
login - Login to the Data Lab
logout - Logout of the Data Lab
ls - list a location in Data Lab
mkdir - create a directory in Data Lab
mv - move a file in Data Lab
mydb_copy - Rename a user MyDB table
mydb_create - Create a user MyDB table
mydb_drop - Drop a user MyDB table
mydb_import - Import data into a user MyDB table
mydb_index - Index data in a MyDB table
mydb_insert - Insert data into a user MyDB table
mydb_list - List the user MyDB tables
mydb_rename - Rename a user MyDB table
mydb_truncate - Truncate a user MyDB table
profiles - List the available Query Manager profiles
put - Put a file into Data Lab
qresults - Get the async query results
qstatus - Get an async query job status
query - Query a remote data service in the Data Lab
rm - delete a file in Data Lab
rmdir - delete a directory in Data Lab
schema - Print data service schema info
services - Print available data services
status - Report on the user status
svc_urls - Print service URLs in use
tag - tag a file in Data Lab
version - Print task version
whoami - Print the current active userYou can get summaries of the arguments to a task with the help
option:
datalab login help
The 'login' task takes the following parameters:
user - Username of account in Data Lab [required]
password - Password for account in Data Lab [required]
mount - Mountpoint of remote Virtual Storage [optional]
verbose - print verbose level log messages [optional]
debug - print debug log level messages [optional]
warning - print warning level log messages [optional]The datalab command will prompt you for required arguments if you do not
provide them on the command line, e.g.:
datalab login
user (default: None): foousername
password (default: None): foouserpassword
Welcome to the Data Lab, foousernameDocumentation for the datalab commands can be also found in the
docs/
directory:
Once the client is installed, some Data Lab Python modules can be imported and used in your Python programs locally, e.g.
ipython
In [1]: from dl import queryClient as qc
In [2]: result = qc.query(sql='SELECT ra,dec from smash_dr1.object LIMIT 10')
In [3]: print(result)
ra,dec
175.215070742307,-38.4897863179213
175.241595469141,-38.4163769993698
175.25128999751,-38.4393292753547
175.265049366394,-38.424371697545
175.265160854504,-38.4915114547051
175.277267094536,-38.431267581266
175.302055158646,-38.4674421358985
175.328056295831,-38.4350989294865
175.334968899953,-38.4547709884234
175.34222308206,-38.4433633662239A comprehensive user manual explains the many features of Data Lab.