CS294-248: Topics in Database Theory
Room: Soda 310
Time: Tuesday, Thursday: 11-12:20
Description
Data processing is at the heart of computing today. Data from small to large is stored, transformed, streamed, updated, queried. Five decades of research have produced a wealth of sophisticated data management techniques that allow us today to perform these tasks very efficiently. This course presents the theoretical foundations of such techniques. The topics covered include: static analysis of queries, basic query evaluation algorithms and hypertree decomposition, incremental view maintenance, worst-case-optimal joins, constraints and semantic query optimization, provenance semirings, and datalog with various extensions. These topics use concepts from logic and model theory, algorithms, graph theory, algebra, complexity theory, and probability theory. All the required theoretical background will be covered in the lectures. Students taking this course are expected to have some basic knowledge of relational databases and SQL (or some other declarative query languages, e.g. Spark, or SparQL, or datalog). There will be a few homework assignments consisting of theoretical exercises. Students will be graded based on their homework assignments and on class participation.
Prerequisites
A good grade in CS186 and in one of (CS170 or
CS172).
Is this course for you?
This is an advanced course. If you are unsure whether this
course is for you, take this very simple self
test.
It covers only theory. If you are interested in advances systems
topics, I strongly recomment CS286: Graduate DB Systems in Spring
2024. (This is a natural graduate-level follow-on to the undergrad
CS186 class.)
Schedule
Lecture on Thursday, 9/24 is canceled because of conflict
with the Simons' Bootcamp.
First lecture will be on Tuesday, 8/29.
Dates/Videos | Unit | Content |
8/21-8/25 | | Bootcamp at Simons. It is
recommended that you attend 1 or 2 of the mini-courses |
8/29, 8/31 | 1 | Logic and Queries |
9/5, 9/7 | 2 | Conjunctive Queries |
9/12, 9/14 | 3 | Incremental View Maintenance (Dan Olteanu) |
9/19, 9/21 | 4 | AGM Bound, Worst-Case Optimal Algorithms (Hung Ngo) |
9/25-9/27 | | Workshop 1 at Simons |
10/3,10/5 | 5 | Information Inequalities and Database Constraints |
10/10,10/12 | 6 | Database
Constraints (cont'd), Repairs, Incomplete Databases |
10/16-10/20 | | Workshop 2 at Simons |
10/24,10/26 | 7 | Semirings and K-Relations, WCOJ and Tree Decomposition (white-board only) Recommended talk by Reinhard Pichler slides, talk. |
10/31, 11/2 | 8-9 | FAQ (Hung Ngo), Introduction to Datalog (whiteboard only) |
11/7, 11/9 | 9 | Datalog (cont'd) by Zoom only11/9 lecture CANCELED |
11/13-11/17 | | Workshop 3 at Simons |
11/21,
11/28 | 9 | Datalog, Final review of the class |
Zoom link
(Zoom link for the final presentations is further below)
Available here (the zoom link is on the left).
Direct link
is here.
You need to be authenticated.
But please attend all lectures in person
Homework Assignments
Please submit
on bcourses.
HW 1 due on Monday, 9/4
HW 2 due on Monday, 9/11
HW 3 due on Monday, 9/25
HW 4 due on Monday, 10/9
Final Report and presentation
When: Thursday, November 30th, 9:30am
Where: The Simon's Institute, Calvin Lab CR116
Task: pick a theory problem/result and explain it to a wide
audience. Give a short presentation (15' including questions).
ReportWrite a short report, say about 2-5 pages. Please
submit it
(via bcourses)
by November 29, i.e. before the presentation
Some suggested topics (but feel free to choose beyond these).
- Is your research related in any way to a theory problem?
You can pick that one and present it.
- 0/1 Law for First Order Logic.
- Existential Second Order Logic = NP
- Ehrenfeucht-Fraisse Games
- Pebble Games
- Suppose a query written in FO is monotone. Can we write
this query as a UCQ (Union of Conjunctive Queries)? Reference.
- Order-invariant queries in FO are strictly more expressive
than FO. This is a famous example by Gurevich, see exercise
17.27 in
the Alice-book.
- Is stratified datalog as expressive as fixpoint logic?
(Fixpoint logic means FO extended with fixpoint of monotone
queries.) This was stated
by Chandra
and Harel (Theorem 5.4): you don't need to read that paper,
because it was found to be wrong! Who proved that it is wrong?
What is the fixpoint query that cannot be written in stratified
datalog?
- Describe the different, alternative definitions of
acyclicity for hypergraphs: alpha-acyclic, beta-acyclic,
gamma-acyclic, and Berge-acyclic. This is classic, well known
topic, from
this landmark
paper. The paper is long, deep, and very well written.
Your task in this project would be to present the main ideas in
a simple, understandable way.
Simons' Program
This course is associated with the Simons' Program on Logic and Algorithms in Database Theory and AI, which has three workshops scheduled during the quarter. Students are expected to attend some of the talks at those workshops.
Final Presentations
Zoom link: here
| Name | Topic | Time |
1 | Douglas, Chris | Differential Dataflow | 9:30:00 AM |
2 | Frank, Reggie | 0-1 Laws in Logic | 9:45:00 AM |
3 | Haan, Altan | Parallel Query Complexity | 10:00:00 AM |
4 | Hou, Tyler | IVM | 10:15:00 AM |
5 | Fang, Vivian | Alternate definitions of acyclicity in hypergraphs (zoom) | 10:30:00 AM |
6 | Bergamaschi, Thiago | The Approximate Implication Problem (zoom) | 10:45:00 AM |
| | BREAK | 11:00:00 AM |
7 | Johnson, Autumn | Algorithmic Information Theory and Data Compression | 11:30:00 AM |
8 | Matute, Gabriel | Data Provenance | 11:45:00 AM |
9 | Pan, Sicheng | Query Equivalence with SMT | 12:00:00 PM |
10 | Park, Jiwon | EF games | 12:15:00 PM |
| | LUNCH BREAK | 12:30:00 PM |
11 | Power, Conor | Free Termination (Conor's Research on Coordination-Free Query Computation Beyond Monotonicity) | 1:30:00 PM |
12 | Ren, Xuandi | Accessing Queries with Self-Joins | 1:45:00 PM |
13 | Sandoval, Ricardo | Differentially Private Querying | 2:00:00 PM |
14 | Laddad, Shadaj | Egglog meets Semirings: Contextual Rewrites | 2:15:00 PM |
Grading
The grade in this class is based on the participation during the
lectures, the homework assignments, and the final presentation.