
March 30, 2024
PSU Course Map—Visualizing Penn State's Curriculum
Try it out here: psucoursemap.pillar.land
I made this project because I found it difficult to plan for courses ahead of time that required several prerequisite courses. I needed a tool to help me list all the courses I would need to take each semester leading up to the course I wanted tot take. I designed a (currenly closed-source) scrapper called Lion Index, which scrapes the Penn State course bulletin and generates a graph of all courses and their prerequisites. PSU Course Map is a demostration of Lion Index's capabilities. It turns the vast course catalog into an interactive graph you can search and explore. Type a course and see its relationship to other courses.
Each course is a node. Relationships like prerequisites, corequisites, and concurrent options are edges. The graph lets you see chains, alternative paths, and cross-department links that are hard to spot in a bulletin. It can be used as a planner and a way to study how the curriculum is designed.

MATH 141 Relationships
Parsing Data
Writing a scrapper is hard because data isn't always always following a pattern you can predict. The PSU course catalog has various pages dedicated to each program and each page isn't always maintained in the same way. Designing a scrapper that can handle all the different page types is a challenge. To solve this, I created a small trait that each page type can implement to define how to extract a list of items:
pub trait Scrappable<T> {
fn extract_list(html: &str) -> Vec<T>;
}
This trait abstracts away the logic of scraping a page and extracting a list of items. This allows a project to be better maintainable as I can focus on writing genernal logic that applies to all pages, and also add specific rules for pages that need them.
Turning Data into a Requirement Tree
Penn State's bulletin text mixes text with punctuation in inconsistent ways, requiring careful parsing to extract the relationships between courses. The goal is to transform phrases like "C or better in MATH 140, and CMPSC 121; or CMPSC 131" into a standard format like MATH 140 & (CMPSC 121 | CMPSC 131)
so that it can be easily processed with a tree data structure.
The parsing process begins with normalization and error correction. Course bulletins are often maintained by different departments with varying standards, leading to common typos and inconsistent formatting. The parser handles these variations by applying systematic replacements:
let chunk = chunk.to_ascii_uppercase().replace_many(&[
(";", " OR"),
(",", " AND"),
("CONCURRENT COURSES", "CONCURRENT"),
("RECOMMENDED PREPARATIONS", "RECOMMENDED PREPARATION"),
("PRERQUISITE", "PREREQUISITE"),
("PREREQUISTE", "PREREQUISITE"),
("PREQUISITE", "PREREQUISITE"),
("PREREQ ", "PREREQUISITE "),
]).trim_all();
This preprocessing step normalizes conventions where semicolons typically indicate OR
relationships and commas indicate AND
relationships (Eventhough a majority of pages follow this rule. This isn't always consistent and some pages need to be manually verified). It also corrects common misspellings of "prerequisite" that appear frequently across different department pages. The trim_all()
method removes excessive whitespace that could interfere with token recognition.
The core logic functions like a lexer that processes the normalized text character by character, building tokens in a buffer until complete words or course identifiers are recognized. The lexer maintains state about which type of requirement is currently being processed (prerequisite, corequisite, concurrent, or recommended):
match buf.trim_all().as_str() {
"AND" => { requirement_text.push('&'); }
"OR" => { requirement_text.push('|'); }
"PREREQUISITE" => { requirement_text = switch_mode(Some(Prerequisite), requirement_text); }
"CONCURRENT" => { requirement_text = switch_mode(Some(Concurrent), requirement_text); }
"COREQUISITE" => { requirement_text = switch_mode(Some(Corequisite), requirement_text); }
"RECOMMENDED" => { requirement_text = switch_mode(Some(Recommended), requirement_text); }
}
To make tree serialization more simple and space efficent, AND
and OR
are converted to single-character symbols (&
, |
). When the lexer encounters requirement type keywords, it calls switch_mode()
to flush the current requirement text into the appropriate bucket and begin collecting text for the new requirement type.
Once all courses have been parsed, the final output is a set of requirement trees for each course, where each tree represents the logical structure of course dependencies. The tree structure helps to preserves both the hierarchical nature of nested requirements and the alternative paths available to students This structure can be serialized as a compact logical expression making storage and querying more efficient.

Course Tree of AERSP 301: Aerospace Structures
The serialized logical expression for AERSP 301 is
(EMCH 210 | EMCH 213) & MATH 220 & (MATH 230 | (MATH 231 & MATH 232 )) & (MATH 250 | MATH 251)
Database Layout
I wanted to keep the database simple by using SQLite, which can be easily distributed and queried without requiring a dedicated server.
CREATE TABLE IF NOT EXISTS UndergraduateProgram (
id INTEGER PRIMARY KEY,
title VARCHAR(255) NOT NULL,
link VARCHAR(255) NOT NULL UNIQUE,
image VARCHAR(255),
type_id INT NOT NULL,
college_id INT, -- can be null
FOREIGN KEY (type_id) REFERENCES UndergraduateProgramType(id),
FOREIGN KEY (college_id) REFERENCES College(id)
);
-- check if it exists, if not create it.
CREATE TABLE IF NOT EXISTS UndergraduateProgramType (
id INTEGER PRIMARY KEY,
type VARCHAR(255) UNIQUE NOT NULL
);
-- check if it exists, if not create it.
CREATE TABLE IF NOT EXISTS College (
id INTEGER PRIMARY KEY,
name VARCHAR(255) UNIQUE NOT NULL
);