- Lecture 1 (2/22/2016): Why NoSQL, Principles, Overview – slides
- content: Motivation for NoSQL databases (Big Data, Big Users, Cloud Computing, Horizontal scalability); Value of Relational databases; General principles of NoSQL databases; Types of NoSQL databases (basic characteristics, uses cases, representatives); One example: Database technologies behind Facebook;
- covered terms: Big Data (Volume, Velocity, Variety), OLTP/OLAP/RTAP, RDBMS, ACID, Aggregate-oriented data models, Key-value stores, Document databases, Column-family stores, Graph databases
- Lecture 2 (2/29/2016): Distributed Computing with MapReduce – slides
- content: Distributed File Systems, Google File System (GFS), MapReduce programming model; MapReduce Framework; Apache Hadoop ecosystem
- covered terms: Distributed File Systems: GFS, chunk server; MapReduce: Map, Combine, Grouping/Shuffling, Reduce; Hadoop Distributed File System (NameNode, DataNode, HeartBeat, BlockReport); Apache YARN, JobTracker, TaskTracker
- Lecture 3 (3/7/2016): Principles of NoSQL Databases: Data Model, Distribution & Consistency – slides
- content: Basic Principles of NoSQL Databases – Aggregate data model, horizontal scaling, relaxing consistency; Models of Data Distribution; Consistency in databases, transactions; Relaxing consistency in distributed databases – theories and techniques; relaxing durability;
- covered terms: aggregate data model, vertical/horizontal scalability (scaling up/out), sharding, replication (master-slave, peer-to-peer), read/write/replication consistency, CAP Theorem, eventual consistency, BASE, Quorums
-
- Lecture 4 (3/14/2016): Distributed Key-value Stores – slides
- content: Key challenges and solutions: data sharding, data balancing, replica management, management of nodes; Comparison of Individual Stores: features to consider, connecting to database;Fundamentals; Suitable Use Cases; Basic Example (Riak)
- covered terms: Amazon Dynamo, consistent hashing, virtual nodes, version stamps (counter, GUID, hash, timestamp), vector stamps (Lamport timestamps, vector clocks, version vectors, matrix clocks), anti-entropy, read repair, gossip protocols, two-phase commit protocol (2PC), multi-version concurrency control (MVCC), levels of isolation, skew write anomaly
- Lecture 5 (3/21/2016): Key-value Stores II: Representatives, Local Storage, Serialization – slides
- content: Riak, Infinispan: Features, Technologies Behind, Examples; LevelDB: efficient local key-value store
- covered terms: Riak Links, Indexes, Search; memory cache, data eviction, distributed transaction management (X/Open XA), Lucene (Solr); Log-structured Merge-Tree (LSM Tree), SSTable; object serialisation (marshalling), Protocol Buffers, Apache Thrift
- Lecture 6 (4/4/2016): Document Databases – slides
- content: Text Data Formats; Document Databases: Usage and Principles Behind, MongoDB: Data Models, Querying, Updates, Indexes, BSON, Distribution, MapReduce, Journaling, Transactions
- covered terms: JSON, XML; MongoDB
Lecture 7 (4/11/2016): Column-family Stores – slides
- content: Column Family Data Model, System Architectures; Cassandra: CQL, Data Partitioning & Replication, Local Data Persistence, Queries
- covered terms: Google BigTable, Cassandra, HBase, column family, super columns, CQL, memtables, SSTable, lightweight transactions
Lecture 8 (4/18/2014): Graph Databases – slides