NoSQL Databases, Fall 2019 – Lectures

We will go through the following presentations, starting on Thursday, September 19, 2019.

  • Lecture 1 (9/19/2019): Why NoSQL, Principles, Overview, Course organizationslides
    • content: Motivation for NoSQL databases (Big Data, Big Users, Cloud Computing, Horizontal scalability); Value of Relational databases; General principles of NoSQL databases; Types of NoSQL databases (basic characteristics, uses cases, representatives); One example: Database technologies behind Facebook;
    • covered terms: Big Data (Volume, Velocity, Variety), OLTP/OLAP/RTAP, RDBMS, ACID, Aggregate-oriented data models, Key-value stores, Document databases, Column-family stores, Graph databases
  • Lecture 2 (9/26/2019): Distributed Computing with MapReduceslides
    • content: Distributed File Systems, Google File System (GFS), MapReduce programming model; MapReduce Framework; Apache Hadoop ecosystem; Apache Spark
    • covered terms: Distributed File Systems: GFS, chunk server; MapReduce: Map, Combine, Grouping/Shuffling, Reduce; Hadoop Distributed File System (NameNode, DataNode, HeartBeat, BlockReport); Apache YARN, JobTracker, TaskTracker
  • Lecture 3 (10/3/2019): Principles of NoSQL Databases: Data Model, Distribution & Consistencyslides
    • content: Basic Principles of NoSQL Databases – Aggregate data model, horizontal scaling, relaxing consistency; Models of Data Distribution; Consistency in databases, transactions; Relaxing consistency in distributed databases – theories and techniques; relaxing durability;
    • covered terms: aggregate data model, vertical/horizontal scalability (scaling up/out), sharding, replication (master-slave, peer-to-peer), read/write/replication consistency, CAP Theorem, eventual consistency, BASE, Quorums
  • Lecture 4 (10/10/2019): Distributed Key-value Storesslides
    • content: Key challenges and solutions: data sharding, data balancing, replica management, management of nodes; Comparison of Individual Stores: features to consider, connecting to database;Fundamentals; Suitable Use Cases; Basic Example (Riak)
    • covered terms: Amazon Dynamo, consistent hashing, virtual nodes, version stamps (counter, GUID, hash, timestamp), vector stamps (Lamport timestamps, vector clocks, version vectors, matrix clocks), anti-entropy, read repair, gossip protocols, two-phase commit protocol (2PC), multi-version concurrency control (MVCC), levels of isolation, skew write anomaly
  • Lecture 5 (10/17/2019): Key-value Stores II: Embedded, Distributed, and In-memory Storesslides
    • content: embedded stores: LevelDB; Distributed key-value stores: Riak, Infinispan; in-memory caches: Memcached. Serialization: Protocol Buffers, Apache Thrift
    • covered terms: Log-structured Merge-Tree (LSM Tree), SSTable; Riak Links, Indexes, Search; memory cache, data eviction, distributed transaction management (X/Open XA), Lucene (Solr); Memcached; object serialisation (marshalling), Protocol Buffers, Apache Thrift
  • Lecture 6 (10/24/2019): Document Databasesslides
    • content: Text Data Formats; Document Databases: Usage and Principles Behind, MongoDB: Data Models, Querying, Updates, Indexes, BSON, Distribution, MapReduce, Journaling, Transactions
    • covered terms: JSON, XML; MongoDB
  • Lecture 7 (10/31/2019): Column-family Storesslides
    • content: Column Family Data Model, System Architectures; Cassandra: CQL, Data Partitioning & Replication, Local Data Persistence, Queries
    • covered terms: Google BigTable, Cassandra, HBase, column family, super columns, CQL, memtables, SSTable, lightweight transactions
  • Lecture 8 (11/7/2019): Graph Databasesslides
    • content: Graph Databases: Mission, Data, Example; Graph Theory: Representations, Data Locality, Graph Partitioning and Traversal; Types of Queries; Transactional Databases; Neo4j: Basics
    • covered terms: Directed/undirected graphs, Adjacency Matrix, Adjacency List, Incidence Matrix, Laplacian Matrix; Breadth-first Search (BFS), BFS Layout, Bandwidth minimization problem, Graph Partitioning (1D, 2D); Sub-graph, Super-graph, Similarity Queries; (Non-)Mining-Based Graph Indexing Techniques; Neo4j
  • Lecture 9 (11/14/2019): project’s consulations (starting at 8:20)
  • Lecture 10 (11/21/2019, start at 8:30 am): Jan Plhák, ScyllaDB expert, head of C++ development at Kiwi.com
    • Title: Kiwi.com’s 7-year journey from a single PostgreSQL node to distributed Cassandra cluster and eventually Scylla
  • Lecture 11 (11/28/2019) – canceled
  • Lecture 12-14 (12/5/2019, 12/12/2019, 12/19/2019): Presentations of Projects