CS294-248: Topics in Database Theory
Room: Soda 310
Time: Tuesday, Thursday: 11-12:20
Data processing is at the heart of computing today. Data from small to large is stored, transformed, streamed, updated, queried. Five decades of research have produced a wealth of sophisticated data management techniques that allow us today to perform these tasks very efficiently. This course presents the theoretical foundations of such techniques. The topics covered include: static analysis of queries, basic query evaluation algorithms and hypertree decomposition, incremental view maintenance, worst-case-optimal joins, constraints and semantic query optimization, provenance semirings, and datalog with various extensions. These topics use concepts from logic and model theory, algorithms, graph theory, algebra, complexity theory, and probability theory. All the required theoretical background will be covered in the lectures. Students taking this course are expected to have some basic knowledge of relational databases and SQL (or some other declarative query languages, e.g. Spark, or SparQL, or datalog). There will be a few homework assignments consisting of theoretical exercises. Students will be graded based on their homework assignments and on class participation.
Prerequisites A good grade in CS186 and in one of (CS170 or
Is this course for you?
This is an advanced course. If you are unsure whether this
course is for you, take this very simple self
It covers only theory. If you are interested in advances systems
topics, I strongly recomment CS286: Graduate DB Systems in Spring
2024. (This is a natural graduate-level follow-on to the undergrad
Lecture on Thursday, 9/24 is canceled because of conflict
with the Simons' Bootcamp.
First lecture will be on Tuesday, 8/29.
|8/21-8/25||Bootcamp at Simons. It is
recommended that you attend 1 or 2 of the mini-courses|
|8/29, 8/31 ||1||Logic and Queries|
|9/5, 9/7||2||Conjunctive Queries|
|9/12, 9/14||3||Incremental View Maintenance (Dan Olteanu)|
|9/19, 9/21||4||AGM Bound, Worst-Case Optimal Algorithms (Hung Ngo) |
|9/25-9/27||Workshop 1 at Simons|
|10/3, 10/5||5||Database Constraints|
|10/10, 1012||6||Incomplete, Probabilistic Databases|
|10/16-10/20||Workshop 2 at Simons|
|10/24, 10/26||7||Semirings and K-Relations (Val Tannen)|
|10/31, 11/2||8-9||FAQ, Datalog, Chase|
|11/7, 11/9||9||Datalog, Chase (cont'd)|
|11/13-11/17||Workshop 3 at Simons|
Available here (the zoom link is on the left).
You need to be authenticated.
But please attend all lectures in person
HW 1 due on Monday, 9/4
HW 2 due on Monday, 9/11
HW 3 due on Monday, 9/25
Final Report and presentation
Task: pick a theory problem/result and explain it to a wide
audience. Write a short report. Suggested length: 3-5 page.
Give a short presentation (10'), in class: tentative date, Tuesday
Some suggested topics (but feel free to choose beyond these).
- 0/1 Law for First Order Logic.
- Existential Second Order Logic = NP
- Ehrenfeucht-Fraisse Games
- Pebble Games
- Suppose a query written in FO is monotone. Can we write
this query as a UCQ (Union of Conjunctive Queries)? Reference.
- Order-invariant queries in FO are strictly more expressive
than FO. This is a famous example by Gurevich, see exercise
- Is stratified datalog as expressive as fixpoint logic?
(Fixpoint logic means FO extended with fixpoint of monotone
queries.) This was stated
and Harel (Theorem 5.4): you don't need to read that paper,
because it was found to be wrong! Who proved that it is wrong?
What is the fixpoint query that cannot be written in stratified
- Describe the different, alternative definitions of
acyclicity for hypergraphs: alpha-acyclic, beta-acyclic,
gamma-acyclic, and Berge-acyclic. This is classic, well known
paper. The paper is long, deep, and very well written.
Your task in this project would be to present the main ideas in
a simple, understandable way.
- (MORE SUGGESTIONS COMING SOON)
This course is associated with the Simons' Program on Logic and Algorithms in Database Theory and AI, which has three workshops scheduled during the quarter. Students are expected to attend some of the talks at those workshops.
The grade in this class is based on the participation during the
lectures, the homework assignments, and a final assignment, which may
involve a report and either a poster or a presentation TBD.