Scuffed Google Scholar (SGS) is an attempt to create a small version of an application that indexes academic journals. It includes tables to store papers, authors and tags.
I worked alone for this project. My group on Canvas is Database Design 23,
i'm in CS3200 Section 04, and my name is Nithin Chintala.
Please view the uml
SGS attempts to solve the problem of having a local database of paper to search through academic papers. There is a plethura of academic work online, but same may want to search through a manually currated set that can be created and maintained.
SGS implements a simple relational database that can store relation between the core object: papers. SGS provides a simple UI to insert papers and view their relations with one another.
The typical user for SGS would be someone in academia looking to having a personal copy of papers they need for their research. Other users simply looking to research papers are also a prime target user.
The two core domain objects are a paper and tag. A papaer describes an academic work that may
be published in a Journal, have multiple references, have multiple tags and have multiple authors.
A tag is simply a descriptor. It can be anything really, it simply described how a group of papers
can be related. So the tag proteins may be used on papers about protein science.
- What are the most cited papers in the database?
CREATE VIEW most_cited AS
SELECT *
FROM (SELECT citee_id, count(citer_id) AS num_cited FROM `references` GROUP BY citee_id) subquery
ORDER BY subquery.num_cited DESC;- What papers cited this paper in their work?
-- citee_id = <id> is refering to "this paper"
SELECT papers.*
FROM papers, `references`
WHERE papers.id = subquery.citer_id
AND `references`.citee_id = 5;- What are popular papers within a certain tag?
-- keywords.tag_id = <id> refers to a "certain tag"
SELECT *
FROM most_cited, keywords
WHERE most_cited.citee_id = keywords.paper_id
AND keywords.tag_id = 4A user represents someone who may write papers. A user stores the first name, last name, username,
password, email and data of birth. The email was added because it is something necessary
when contacting acadaemics.
There are two domain object, a paper and a tag.
A paper represents an academic writing. It contains information to uniquely identify an academic
writing and optionally provide a doi.
A tag represents categories or other broad ideas that can group multiple papers togethr.
The users and papers table are related to one another in a many to many relationship.
That is, one paper can have many authors, and one author can have many papers. This is
reflective of how actual academic relationships are as well.
Since MySQL was used to implement this, the relationship is reified in an authors table.
Whenever a paper or author is deleted all of its respective mappings in the authors table
are also deleted.
A paper can be cited by any other paper. This relationship is best understood with the idea
of a citeer and citee. A citer cites a citee. That is, if paper A used paper B in
its finding, then A would be a citer citing paper B, the citee. To implement this in sql,
the papers table has a reified references table to describe the self-referential many-to-many
relationship. Deleting a paper deletes all the records in the mapping table that refers to the
deleted paper.
Finally, the papers and tags tables have a many-to-many relationship. One paper can have
many tags, and one tag can have many papers. This is reified in the keywords table. Deleting
a paper or a tag removes all the mapping records that relate to it.
The portable enum is the journal table that is used
in the papers table. A journal is simply an official place where papers are published.
Currently the available journals in the databse are: "Cell", "Nature" and "Science". A
journal can be NULL meaning the paper has not been published in an approved journal.
There is a list and edit page for papers, tags and users. The list pages can be accessed
from the top header anywhere on the website. Once on the list page you can navigate to the edit
page by clicking on the associated title for each entry in the list. Furthermore, on the list
and edit page for any table a link to the related records in other tables are provided.
Lastly, the homepage allows you to search for papers related to a certain tag serving as an extremely simple version of Google Scholar's search engine.
This was my CS3200 Final project. Also I use Google Scholar a lot and it has a simple UI and seemed like a cool way to practice making relational databases.
I just wanted to get better with the language, so I decided to use it to make this. Also Go is a pretty cool language, though I have to say not having generics in CRUD based applications feels AWFUL.
You can create all the tables by running the sql in sql/all.sql.
To build the server just run go build.
To populate the databse run go run scripts/populate.go.
I intially wanted to scrape BibTeX files from Google Scholar to populate my databse, but quickly learned that this is explicitily made incredibly difficult. There is a way, but it essentially gets flagged and banned immediately.
So, instead of actual academic papers I just made random data in scripts/populate.go