Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #238 +/- ##
==========================================
- Coverage 72.62% 72.38% -0.24%
==========================================
Files 19 19
Lines 1673 1673
Branches 211 211
==========================================
- Hits 1215 1211 -4
- Misses 376 380 +4
Partials 82 82 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
The NASA proposal has the best text to pull from: https://docs.google.com/document/d/13I8EMbolHw1F4P7-QRJ6x_wYdZOFLhHw-pEaOB4BZvQ/edit?tab=t.0#heading=h.jfxzkej6mv63 |
…noreply@anthropic.com>
- Draft State of the field section comparing tech stacks of existing astronomical compilations (AstroCats, MOCA, species, CATS, Hypatia, Ultracool Sheet, NASA Exoplanet Archive, WISeREP, HITRAN) - Add paper.bib with citations for all referenced works - Add @ucsheet Zenodo dataset entry for the Ultracool Sheet - Expand AI usage disclosure to meet required disclosure standards Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Solid first draft. @dr-rodriguez , take a look! |
| --- | ||
| ## Summary | ||
|
|
||
| We introduce the AstroDB Toolkit to fill a gap in the data management/sharing ecosystem for astronomers by providing a robust toolkit to empower astronomers to build databases of astronomical sources. The AstroDB Toolkit is an open-source, openly developed tool that greatly lowers the technology burden on the astronomers and empowers them to make databases of astronomical sources using a common, interoperable framework. The Toolkit uses GitHub’s features and its collaborative workflow. |
There was a problem hiding this comment.
| We introduce the AstroDB Toolkit to fill a gap in the data management/sharing ecosystem for astronomers by providing a robust toolkit to empower astronomers to build databases of astronomical sources. The AstroDB Toolkit is an open-source, openly developed tool that greatly lowers the technology burden on the astronomers and empowers them to make databases of astronomical sources using a common, interoperable framework. The Toolkit uses GitHub’s features and its collaborative workflow. | |
| We introduce the AstroDB Toolkit to fill a gap in the data management/sharing ecosystem for astronomers working on catalogs of astronomical sources. The AstroDB Toolkit is an open-source, openly developed tool that lowers the technology burden on astronomers and empowers them to make databases of astronomical sources using a common, interoperable framework. The Toolkit uses GitHub’s features and its collaborative workflow. |
| ## Statement of need | ||
|
|
||
| The purpose of the AstroDB Toolkit is to fill a gap in the data management/sharing ecosystem for astronomers by providing a robust toolkit to empower astronomers to build databases of astronomical sources. | ||
| In the “big data” era of large, long-standing missions such as TESS, Kepler, and JWST, astronomers find themselves managing increasingly large and unwieldy collections of target parameters, both observed and modeled. |
There was a problem hiding this comment.
| In the “big data” era of large, long-standing missions such as TESS, Kepler, and JWST, astronomers find themselves managing increasingly large and unwieldy collections of target parameters, both observed and modeled. | |
| In the “big data” era of large, long-standing missions such as TESS, Kepler, and JWST, astronomers find themselves managing increasingly large and unwieldy collections of astrophysical parameters, both observed and derived. |
|
|
||
| ## Statement of need | ||
|
|
||
| The purpose of the AstroDB Toolkit is to fill a gap in the data management/sharing ecosystem for astronomers by providing a robust toolkit to empower astronomers to build databases of astronomical sources. |
There was a problem hiding this comment.
This entire sentence feels like a repetition of the summary, I think we can start with "In the big data era" and remove this sentence
|
|
||
| The purpose of the AstroDB Toolkit is to fill a gap in the data management/sharing ecosystem for astronomers by providing a robust toolkit to empower astronomers to build databases of astronomical sources. | ||
| In the “big data” era of large, long-standing missions such as TESS, Kepler, and JWST, astronomers find themselves managing increasingly large and unwieldy collections of target parameters, both observed and modeled. | ||
| Currently, astronomers reinvent the wheel, spending time making technology choices, database design decisions, and web applications as opposed to being able to focus on the analysis and physical interpretation of the actual data. |
There was a problem hiding this comment.
| Currently, astronomers reinvent the wheel, spending time making technology choices, database design decisions, and web applications as opposed to being able to focus on the analysis and physical interpretation of the actual data. | |
| Currently, astronomers reinvent the wheel, spending time making technology choices, database design decisions, and web applications as opposed to focusing on the analysis and physical interpretation of actual data. |
| Currently, astronomers reinvent the wheel, spending time making technology choices, database design decisions, and web applications as opposed to being able to focus on the analysis and physical interpretation of the actual data. | ||
| The AstroDB Toolkit greatly lowers the technology burden on the astronomers and empowers them to make databases of astronomical sources using a common, interoperable framework. | ||
|
|
||
| The AstroDB Toolkit aims to serve the needs of individual astronomers and small-to-medium sized collaborations who need a data management system for hundreds to thousands of sources. The Toolkit bridges the divide where a shared Google sheet is insufficient, but the dataset is either still living (e.g., follow-up observations are underway, new parameters being derived) or otherwise not appropriate for an institutional archive. |
There was a problem hiding this comment.
| The AstroDB Toolkit aims to serve the needs of individual astronomers and small-to-medium sized collaborations who need a data management system for hundreds to thousands of sources. The Toolkit bridges the divide where a shared Google sheet is insufficient, but the dataset is either still living (e.g., follow-up observations are underway, new parameters being derived) or otherwise not appropriate for an institutional archive. | |
| The AstroDB Toolkit aims to serve the needs of individual astronomers and small-to-medium sized collaborations who need a data management system for hundreds to thousands of sources. The Toolkit bridges the divide where a shared spreadsheet is insufficient, but the dataset is either still living (e.g., follow-up observations are underway, new parameters being derived) or otherwise not appropriate for an institutional archive. |
|
|
||
| One of the key design requirements for an AstroDB Toolkit-powered database is support for collaborative editing of the holdings using a GitHub workflow. As a result, the Toolkit creates databases that are fundamentally a set of plain text JSON files that describe each object. When users make changes to the properties of an object, such as adding a new spectrum or updating a value for a radial velocity, these changes are human-readable as a simple diff between two JSON files and can be reviewed via pull requests. This JSON document store architecture allows for a community to maintain a database, review changes as they come in, and use automated tools to validate the database. The Astrodbkit package has tools to readily transform data between the document store and relational database to facilitate managing local, private data as well as external applications, such as a hosted website. | ||
|
|
||
| By exporting a database to a JSON document store, we can use git and GitHub to handle version control for our database as well as curate commits via pull requests. |
There was a problem hiding this comment.
| By exporting a database to a JSON document store, we can use git and GitHub to handle version control for our database as well as curate commits via pull requests. |
|
|
||
| By exporting a database to a JSON document store, we can use git and GitHub to handle version control for our database as well as curate commits via pull requests. | ||
|
|
||
| An individual user may contain their own copy of any database. They may make changes in their local branch and push to their copy on GitHub. By issuing a pull request, they request their changes be adopted into the main branch of the database. Because the database is stored as individual JSON documents, reviewers can see exactly which objects have been updated and can comment on the changes if needed. |
There was a problem hiding this comment.
| An individual user may contain their own copy of any database. They may make changes in their local branch and push to their copy on GitHub. By issuing a pull request, they request their changes be adopted into the main branch of the database. Because the database is stored as individual JSON documents, reviewers can see exactly which objects have been updated and can comment on the changes if needed. | |
| An individual user may deploy their own copy of any AstroDB database. They may make changes in their local branch and push to their copy on GitHub. By issuing a pull request, they request their changes be adopted into the main branch of the database. Because the database is stored as individual JSON documents, reviewers can see exactly which objects have been updated and can comment on the changes if needed. |
|
|
||
| As part of the pull request process, automatic tests implemented via GitHub Actions can be run to verify the integrity of the database. This ensures no changes took place that break the functionality of the database and also include verification for the data that has been added. | ||
|
|
||
| Finally, when the pull request is accepted, additional automated tasks can be performed to regenerate the database and push it to external users of the database, such as a graphical user interface. |
There was a problem hiding this comment.
| Finally, when the pull request is accepted, additional automated tasks can be performed to regenerate the database and push it to external users of the database, such as a graphical user interface. | |
| Finally, when the pull request is accepted, additional automated tasks can be performed to regenerate the database and push it to external users of the database, such as a hosted website. |
|
|
||
| ### Spectra and non-tabular data | ||
|
|
||
| Spectra, images, and other non-tabular data are stored as pointers to cloud-hosted files. The files are hosted in a variety of places, including institutional repositories. However, we have found the best host to be Amazon Simple Storage Service (Amazon S3). In order to enable the database to fully function without an internet connection and/or to point to files not hosted in the cloud, we allow pointers to local files using an environment variable to indicate a local path. |
There was a problem hiding this comment.
We don't really explain why the best host is Amazon S3, we may be asked to justify that or list what else we tested.
|
|
||
| The current poster-child of the AstroDB Toolkit is SIMPLE, the Substellar and IMaged PLanet Explorer Archive of Complex Objects [https://simple-bd-archive.org/](https://simple-bd-archive.org/). | ||
|
|
||
| The newest useage of the Toolkit is focused on Dwarf Galaxies. There are tens if not hundreds of galaxies in the Local Group, with a diverse range of applicable science problems. Many of these scientific applications depend on having as complete as possible a list of properties for these galaxies such as: total luminosity, half-light radius, and redshift/systemic velocity. |
There was a problem hiding this comment.
| The newest useage of the Toolkit is focused on Dwarf Galaxies. There are tens if not hundreds of galaxies in the Local Group, with a diverse range of applicable science problems. Many of these scientific applications depend on having as complete as possible a list of properties for these galaxies such as: total luminosity, half-light radius, and redshift/systemic velocity. | |
| The newest useage of the Toolkit is focused on Dwarf Galaxies. There are tens if not hundreds of galaxies in the Local Group, with a diverse range of applicable science problems. Many of these scientific applications depend on having as complete as possible a list of properties for these galaxies including total luminosity, half-light radius, and redshift/systemic velocity. |
Let's write a short JOSS paper!