CS294-248: Topics in Database Theory

Room: Soda 310

Time: Tuesday, Thursday: 11-12:20

Description

Data processing is at the heart of computing today. Data from small to large is stored, transformed, streamed, updated, queried. Five decades of research have produced a wealth of sophisticated data management techniques that allow us today to perform these tasks very efficiently. This course presents the theoretical foundations of such techniques. The topics covered include: static analysis of queries, basic query evaluation algorithms and hypertree decomposition, incremental view maintenance, worst-case-optimal joins, constraints and semantic query optimization, provenance semirings, and datalog with various extensions. These topics use concepts from logic and model theory, algorithms, graph theory, algebra, complexity theory, and probability theory. All the required theoretical background will be covered in the lectures. Students taking this course are expected to have some basic knowledge of relational databases and SQL (or some other declarative query languages, e.g. Spark, or SparQL, or datalog). There will be a few homework assignments consisting of theoretical exercises. Students will be graded based on their homework assignments and on class participation.

Prerequisites

A good grade in CS186 and in one of (CS170 or CS172).

Is this course for you?

This is an advanced course. If you are unsure whether this course is for you, take this very simple self test.

It covers only theory. If you are interested in advances systems topics, I strongly recomment CS286: Graduate DB Systems in Spring 2024. (This is a natural graduate-level follow-on to the undergrad CS186 class.)

Schedule

Lecture on Thursday, 9/24 is canceled because of conflict with the Simons' Bootcamp.

First lecture will be on Tuesday, 8/29.

Dates/VideosUnitContent
8/21-8/25Bootcamp at Simons. It is recommended that you attend 1 or 2 of the mini-courses
8/29, 8/31 1Logic and Queries
9/5, 9/72Conjunctive Queries
9/12, 9/143Incremental View Maintenance (Dan Olteanu)
9/19, 9/214AGM Bound, Worst-Case Optimal Algorithms (Hung Ngo)
9/25-9/27Workshop 1 at Simons
10/3,10/55Information Inequalities and Database Constraints
10/10,10/126Database Constraints (cont'd), Repairs, Incomplete Databases
10/16-10/20Workshop 2 at Simons
10/24,10/267Semirings and K-Relations, WCOJ and Tree Decomposition (white-board only)
Recommended talk by Reinhard Pichler slides, talk.
10/31, 11/28-9FAQ (Hung Ngo), Introduction to Datalog (whiteboard only)
11/7, 11/99Datalog (cont'd) by Zoom only11/9 lecture CANCELED
11/13-11/17Workshop 3 at Simons
11/21, 11/289Datalog, Final review of the class

Zoom link

(Zoom link for the final presentations is further below)

Available here (the zoom link is on the left).

Direct link is here. You need to be authenticated.

But please attend all lectures in person

Homework Assignments

Please submit on bcourses.

HW 1 due on Monday, 9/4

HW 2 due on Monday, 9/11

HW 3 due on Monday, 9/25

HW 4 due on Monday, 10/9

Final Report and presentation

When: Thursday, November 30th, 9:30am

Where: The Simon's Institute, Calvin Lab CR116

Task: pick a theory problem/result and explain it to a wide audience. Give a short presentation (15' including questions).

ReportWrite a short report, say about 2-5 pages. Please submit it (via bcourses) by November 29, i.e. before the presentation

Some suggested topics (but feel free to choose beyond these).

Simons' Program

This course is associated with the Simons' Program on Logic and Algorithms in Database Theory and AI, which has three workshops scheduled during the quarter. Students are expected to attend some of the talks at those workshops.

Final Presentations

Zoom link: here

NameTopicTime
1Douglas, ChrisDifferential Dataflow9:30:00 AM
2Frank, Reggie0-1 Laws in Logic9:45:00 AM
3Haan, AltanParallel Query Complexity10:00:00 AM
4Hou, TylerIVM10:15:00 AM
5Fang, VivianAlternate definitions of acyclicity in hypergraphs (zoom)10:30:00 AM
6Bergamaschi, ThiagoThe Approximate Implication Problem (zoom)10:45:00 AM
BREAK11:00:00 AM
7Johnson, AutumnAlgorithmic Information Theory and Data Compression11:30:00 AM
8Matute, GabrielData Provenance11:45:00 AM
9Pan, SichengQuery Equivalence with SMT12:00:00 PM
10Park, JiwonEF games12:15:00 PM
LUNCH BREAK12:30:00 PM
11Power, ConorFree Termination (Conor's Research on Coordination-Free Query Computation Beyond Monotonicity)1:30:00 PM
12Ren, XuandiAccessing Queries with Self-Joins1:45:00 PM
13Sandoval, RicardoDifferentially Private Querying2:00:00 PM
14Laddad, ShadajEgglog meets Semirings: Contextual Rewrites2:15:00 PM

Grading

The grade in this class is based on the participation during the lectures, the homework assignments, and the final presentation.