Skip to content

Prereq graph#90

Open
mystor wants to merge 16 commits intoChrisCooper:masterfrom
mystor:prereq-graph
Open

Prereq graph#90
mystor wants to merge 16 commits intoChrisCooper:masterfrom
mystor:prereq-graph

Conversation

@mystor
Copy link

@mystor mystor commented Aug 7, 2013

Fixes #89

Features

Parses Prerequisite text to determine prerequisites, corequisites, recommended courses, and exclusions

This produces JSON which is entered into the parsed_reqs field on the Course object.
The following is BCHM 410's JSON:

{
    "prerequisite": {
        "items": [
            "BCHM 313",
            "BCHM 315",
            "BCHM 316",
            "BCHM 317",
            {
                "items": [
                    "BCHM 218",
                    "MBIO 218"
                ],
                "type": "or"
            }
        ],
        "type": "and"
    }
}

And BIOL 410

{
    "prerequisite": {
        "items": [
            "BIOL 302",
            "BIOL 303"
        ],
        "type": "or"
    },
    "recommend": {
        "items": [
            "BIOL 335"
        ],
        "type": "and"
    }
}

Generates dotfile code for a specific course's prerequisites

BCHM 410:

digraph "Prerequisite Chart" {
    node [
        shape=ellipse,
        style=filled,
        fontname="'Droid Sans', sans-serif",
        fontcolor=white
    ]
    "CHEM 212" [ label="CHEM 212", color="forestgreen", fontcolor="white" ]
    "CHEM 211" [ label="CHEM 211", color="forestgreen", fontcolor="white" ]
    "MBIO 218" [ label="MBIO 218", color="forestgreen", fontcolor="white" ]
    "103cf374be597a18316c995d77dc233cc9f2f8424fef71aaaad300d8" [ label="OR", color="white", fontcolor="black" ]
    "7b0299e0d1705984bb66efa68397bba64034b5c6148efbfe89bdb24e" [ label="OR", color="white", fontcolor="black" ]
    "CHEM 120" [ label="CHEM 120", color="forestgreen", fontcolor="white" ]
    "8c8297db09ba9091969d5eaf4a3a71da74bdaea2acb33545d49ec890" [ label="OR", color="white", fontcolor="black" ]
    "0820ab6003b0dae94287c1a2303a5f8e9a73c374df55eb488ecb429e" [ label="OR", color="white", fontcolor="black" ]
    "9779b5fa943774d7fdc8df89af690c1b4a20865c449f14333dd55419" [ label="OR", color="white", fontcolor="black" ]
    "1e38d9b7235deb45b9b2175131bd4d40a08003c5fd20721b6f58eb77" [ label="OR", color="white", fontcolor="black" ]
    "df9383e911ce34dae4d5787a64fdd9bef1e27ada00502b0fdfc1524f" [ label="OR", color="white", fontcolor="black" ]
    "BIOL 103" [ label="BIOL 103", color="forestgreen", fontcolor="white" ]
    "BIOL 102" [ label="BIOL 102", color="forestgreen", fontcolor="white" ]
    "834a67ab3b9f48dd81c37627a63b49c23231b9fca468f1abcc777325" [ label="AND", color="white", fontcolor="black" ]
    "316a2523747fb6766af0801a68a4214004fedb70d1970c22dda5b94f" [ label="AND", color="white", fontcolor="black" ]
    "BCHM 410" [ label="BCHM 410", color="forestgreen", fontcolor="white" ]
    "BCHM 313" [ label="BCHM 313", color="forestgreen", fontcolor="white" ]
    "CHEM 223" [ label="CHEM 223", color="forestgreen", fontcolor="white" ]
    "BCHM 315" [ label="BCHM 315", color="forestgreen", fontcolor="white" ]
    "BCHM 316" [ label="BCHM 316", color="forestgreen", fontcolor="white" ]
    "CHEM 222" [ label="CHEM 222", color="forestgreen", fontcolor="white" ]
    "CHEM 113" [ label="CHEM 113", color="forestgreen", fontcolor="white" ]
    "CHEM 112" [ label="CHEM 112", color="forestgreen", fontcolor="white" ]
    "CHEM 114" [ label="CHEM 114", color="forestgreen", fontcolor="white" ]
    "CHEM 116" [ label="CHEM 116", color="forestgreen", fontcolor="white" ]
    "BCHM 317" [ label="BCHM 317", color="forestgreen", fontcolor="white" ]
    "CHEM 281" [ label="CHEM 281", color="forestgreen", fontcolor="white" ]
    "CHEM 282" [ label="CHEM 282", color="forestgreen", fontcolor="white" ]
    "BIOL 205" [ label="BIOL 205", color="forestgreen", fontcolor="white" ]
    "APSC 112" [ label="APSC 112", color="forestgreen", fontcolor="white" ]
    "a86dcd4240996171ae503222384d7d57e9f7a3fccf06792be1f20295" [ label="AND", color="white", fontcolor="black" ]
    "APSC 131" [ label="APSC 131", color="forestgreen", fontcolor="white" ]
    "APSC 132" [ label="APSC 132", color="forestgreen", fontcolor="white" ]
    "APSC 171" [ label="APSC 171", color="forestgreen", fontcolor="white" ]
    "BCHM 218" [ label="BCHM 218", color="forestgreen", fontcolor="white" ]
    "APSC 111" [ label="APSC 111", color="forestgreen", fontcolor="white" ]
    "df9383e911ce34dae4d5787a64fdd9bef1e27ada00502b0fdfc1524f" -> "CHEM 212"
    "9779b5fa943774d7fdc8df89af690c1b4a20865c449f14333dd55419" -> "CHEM 211"
    "BCHM 218" -> "103cf374be597a18316c995d77dc233cc9f2f8424fef71aaaad300d8"
    "MBIO 218" -> "103cf374be597a18316c995d77dc233cc9f2f8424fef71aaaad300d8"
    "CHEM 112" -> "7b0299e0d1705984bb66efa68397bba64034b5c6148efbfe89bdb24e"
    "CHEM 114" -> "7b0299e0d1705984bb66efa68397bba64034b5c6148efbfe89bdb24e"
    "a86dcd4240996171ae503222384d7d57e9f7a3fccf06792be1f20295" -> "7b0299e0d1705984bb66efa68397bba64034b5c6148efbfe89bdb24e"
    "834a67ab3b9f48dd81c37627a63b49c23231b9fca468f1abcc777325" -> "8c8297db09ba9091969d5eaf4a3a71da74bdaea2acb33545d49ec890"
    "CHEM 282" -> "8c8297db09ba9091969d5eaf4a3a71da74bdaea2acb33545d49ec890"
    "CHEM 211" -> "0820ab6003b0dae94287c1a2303a5f8e9a73c374df55eb488ecb429e"
    "CHEM 212" -> "0820ab6003b0dae94287c1a2303a5f8e9a73c374df55eb488ecb429e"
    "CHEM 281" -> "0820ab6003b0dae94287c1a2303a5f8e9a73c374df55eb488ecb429e"
    "CHEM 112" -> "9779b5fa943774d7fdc8df89af690c1b4a20865c449f14333dd55419"
    "CHEM 116" -> "9779b5fa943774d7fdc8df89af690c1b4a20865c449f14333dd55419"
    "a86dcd4240996171ae503222384d7d57e9f7a3fccf06792be1f20295" -> "9779b5fa943774d7fdc8df89af690c1b4a20865c449f14333dd55419"
    "APSC 131" -> "1e38d9b7235deb45b9b2175131bd4d40a08003c5fd20721b6f58eb77"
    "CHEM 120" -> "1e38d9b7235deb45b9b2175131bd4d40a08003c5fd20721b6f58eb77"
    "CHEM 112" -> "df9383e911ce34dae4d5787a64fdd9bef1e27ada00502b0fdfc1524f"
    "316a2523747fb6766af0801a68a4214004fedb70d1970c22dda5b94f" -> "df9383e911ce34dae4d5787a64fdd9bef1e27ada00502b0fdfc1524f"
    "BIOL 102" -> "BIOL 103"
    "CHEM 222" -> "834a67ab3b9f48dd81c37627a63b49c23231b9fca468f1abcc777325"
    "CHEM 223" -> "834a67ab3b9f48dd81c37627a63b49c23231b9fca468f1abcc777325"
    "CHEM 116" -> "316a2523747fb6766af0801a68a4214004fedb70d1970c22dda5b94f"
    "APSC 111" -> "316a2523747fb6766af0801a68a4214004fedb70d1970c22dda5b94f"
    "APSC 112" -> "316a2523747fb6766af0801a68a4214004fedb70d1970c22dda5b94f"
    "APSC 131" -> "316a2523747fb6766af0801a68a4214004fedb70d1970c22dda5b94f"
    "BCHM 313" -> "BCHM 410"
    "BCHM 315" -> "BCHM 410"
    "BCHM 316" -> "BCHM 410"
    "BCHM 317" -> "BCHM 410"
    "103cf374be597a18316c995d77dc233cc9f2f8424fef71aaaad300d8" -> "BCHM 410"
    "CHEM 211" -> "CHEM 223"
    "CHEM 212" -> "CHEM 223"
    "103cf374be597a18316c995d77dc233cc9f2f8424fef71aaaad300d8" -> "BCHM 315"
    "8c8297db09ba9091969d5eaf4a3a71da74bdaea2acb33545d49ec890" -> "BCHM 315"
    "BCHM 315" -> "BCHM 316"
    "0820ab6003b0dae94287c1a2303a5f8e9a73c374df55eb488ecb429e" -> "CHEM 222"
    "CHEM 113" -> "CHEM 114"
    "BCHM 315" -> "BCHM 317" [ style="dashed" ]
    "BCHM 316" -> "BCHM 317" [ style="dashed" ]
    "7b0299e0d1705984bb66efa68397bba64034b5c6148efbfe89bdb24e" -> "CHEM 281"
    "CHEM 112" -> "CHEM 282"
    "CHEM 281" -> "CHEM 282"
    "BIOL 102" -> "BIOL 205"
    "BIOL 103" -> "BIOL 205"
    "APSC 111" -> "APSC 112"
    "APSC 171" -> "APSC 112"
    "APSC 131" -> "a86dcd4240996171ae503222384d7d57e9f7a3fccf06792be1f20295"
    "APSC 132" -> "a86dcd4240996171ae503222384d7d57e9f7a3fccf06792be1f20295"
    "1e38d9b7235deb45b9b2175131bd4d40a08003c5fd20721b6f58eb77" -> "APSC 132"
    "BIOL 205" -> "BCHM 218"
}

This is then rendered, either using Viz.js on the client (this branch) or using GraphViz through pydot on the server this branch. This gets you a result like the following:

image

Small link to visualize the prerequisite graph on the course's detail page

image

Decisions to be made before merging

  • Should Viz.js be used on the client or GraphViz on the server?
  • What should the page look like with the graph on it?
  • Where should the link to visualize the prerequisites for the class go?
  • How should parser inaccuracies be handled

Testing status

  • I have run the parser on all subjects through HIST, however, I haven't had the chance to scrape the remainder of the subjects yet. It doesn't hang on any of these.
  • As Django requests run on a non-main thread, I cannot use the timeout code which I found to prevent the code from hanging. I don't think that the code should ever hang, however it did frequently in development.
  • Parsing shouldn't take more than about a minute or two, it is pretty fast.
  • There are a few inaccuracies in the parsed json. Usually this is due to ambiguously worded prerequisite text or course numbers stated without the subject. (example ABC xxx and ABC yyy or ABC zzz or ABC xxx; yyy; zzz). Not sure how to fix either of these problems.

@pR0Ps
Copy link
Collaborator

pR0Ps commented Aug 7, 2013

I'm not sure if I like storing data as a JSON string on the course object. It seems like it's a little hacked on when using a RDBMS. An improvement might be to make another model to store that information. Doing it that way would also have the advantage of being able to directly get the prerequisites for any given course instead of having to get the course object, get the parsed_reqs property, and parse the JSON each time it's accessed.

I'm also a little wary of using something as heavy as a LLVM translation of a C application to JS. We can't have people downloading ~2MB of Javascript just to view a graph. Is there some way to do more of the work in Python and only do the final rendering with some lighter JS? I'm pretty sure the webhost won't allow installation of arbitrary packages, so installing GraphViz probably isn't possible (@ChrisCooper confirm?). Another option would be to use D3.js to render, as it's pretty light (140KB, uncompressed).

That being said, I really like the idea here and the generated graph looks really cool (if a little confusing at first glance). I think it could be a really nice addition to the site.

@mystor
Copy link
Author

mystor commented Aug 7, 2013

It shouldn't be to big of a deal to use foreign keys instead of the json object.
I have been using mongo recently so the json field is the first thing which comes to mind.

D3 could work, but I haven't done any work with it before. I'll look into making a simple force directed layout tonight after work. I worry that it will look even more confusing than the graphviz one though, as graphviz makes an effort to keep things in a hierarchy.
I could try to hold 100, 200 etc level courses in the use general area to maintain that hierarchy.

I only wonder what the advantage of moving the from json to a series of tables is, I worry that instead of making it any easier it will simply make it more of a pain to modify a course's prerequisites as multiple different rows in different tables will need to be modified.

I agree that viz.js is a bad solution. Gzipped it takes 600k of space, which is still a bit absurd. Graphviz would be the best solution IMO, but if webfaction doesn't allow for arbitrary binary installs, I don't suppose we have much of a choice.

I'll post pictures of the d3 solution once I get it up, I should have a prototype by this evening I hope. If we decide to go the rdbms route with the tables I will work on that another time, it will be a bit more work to transition to that.

@mystor
Copy link
Author

mystor commented Aug 7, 2013

I did a quick shot at making a graph layout using D3.js. The same graph which you see above currently looks something like this with a force-directed layout.
Now of course, I could play with the values and get something a bit nicer looking (I hope), but it gives you a bit of an idea what type of thing we are looking at.

image

It feels a bit noisy, especially for a complicated prerequisite tree like BCHM 410. Also, the course of interest is just placed randomly in the frame, meaning that I had to color it differently so that you could identify it at all.

Personally I prefer the appearance of the graphviz solution, but I understand that it is likely impractical. Also you can drag this one around, good old fashioned D3 style.

I based the code to get that off of this example.

EDIT: Found another promising option: dagre. I will need to modify the dot output & generate some CSS, but other than that it should work pretty well (theoretically).
AFAIK it is built on top of D3. I'll throw something together using it.

@mystor
Copy link
Author

mystor commented Aug 8, 2013

I have made a version built on top of d3 and dagre. This is what it looks like:
image

The total javascript payload right now is about 186k uncompressed, with 143 of that being d3.js.
I think that it looks pretty good, so I'm going to merge the changes I have made into this pull request.

Almost everything seems to work right now. The biggest thing left to do right now would be transition to a rdbms backend for the data, rather than a text field containing JSON.

@pR0Ps
Copy link
Collaborator

pR0Ps commented Aug 13, 2013

Nice, looks awesome! It seems like most of the stylistic properties can be set via CSS too so the look can be customized later if needed. The most important thing is that it's fast and light.

@mystor
Copy link
Author

mystor commented Aug 14, 2013

Yes, the style can be changed using CSS and small Python code changes if needed.

In terms of things I would like to fix before this is merged, there are likely a few code quality problems (e.g. I am not proud of the parser, it is poorly designed IMO), and I am still using the JSON field on the model.

If we decide to use a rdbms system, I guess we would merge the current prerequisite parser into this one, such that the models could be merged as well. That shouldn't be too difficult to do (however it is likely that this parser captures less courses than the current one).

I don't have time to look into that right now, but in the coming weeks I will try to find time to make these changes.

If I make this change, the prerequisite parser will probably be separate from the scraping process, hopefully this isn't too much of an issue, I could possibly make it automatically run after the parser.

@twocents
Copy link

twocents commented Sep 4, 2013

This is amazing work, you guys. I'm one of Chris's friends who's been following and helping out with qcumber with anything that doesn't involve computer knowledge ... because I have none (unless you count CISC 100 lol). I think this visualization project has really great potential. My wheels are already spinning and once this gets off the ground, I'm sure I'll have lots more ideas as to where to go from here (as Chris can attest to, lols).

The main concern I have right now is readability. Visually it looks logical, but tracing the path from a first year course to the upper year courses gets confusing at the junctions with multiple exits, or when AND leads to OR. There should be some kind of highlighting mechanism to delineate one path separate from the others.

Great work! I'll be lurking and periodically offering opinions and encouragement :)

@mystor
Copy link
Author

mystor commented Sep 7, 2013

I agree, it can be confusing to traverse visually, once I switch the backend to using RDBMS, I will look into implementing something like that. It shouldn't be too difficult to do.

@ChrisCooper
Copy link
Owner

Another thing that might help with readability (since some of these graphs are by nature quite complex) is to be able to limit it to a number of steps backward. For example, the BCHM 315 example could be optionally limited to MBIO 218, BCHM 218, CHEM 222, CHEM 223, and CHEM 282, since these are the first courses you encounter along that chain.

@mystor
Copy link
Author

mystor commented Sep 12, 2013

I am not sure whether that would be a good idea. The primary reason which I can think of for viewing these graphs is to see what first, second & third year courses you need to take in order to take a fourth year class which you are interested in. If we limited how far back you can go, I feel like that would defeat the purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prerequisite Charts

4 participants