NoSQL Databases – Lectures, Fall 2014

  • Lecture 1 (17/9/2014): Why NoSQL, Principles, Overviewslides
    • content: Motivation for NoSQL databases (Big Data, Big Users, Cloud Computing, Horizontal scalability); Value of Relational databases; General principles of NoSQL databases; Types of NoSQL databases (basic characteristics, uses cases, representatives); One example: Database technologies behind Facebook;
    • covered terms: Big Data (Volume, Velocity, Variety), OLTP/OLAP/RTAP, RDBMS, ACID, Aggregate-oriented data models, Key-value stores, Document databases, Column-family stores, Graph databases
  • Lecture 2 (1/10/2014): Principles – Data Distribution & Consistencyslides
    • content: Basic Principles of NoSQL Databases – flexible data models, horizontal scaling, relaxing consistency; Models of Data Distribution: sharding, replication, combination; Consistency in databases, transactions; Relaxing consistency in distributed databases – theories and techniques; relaxing durability;
    • covered terms: vertical/horizontal scalability (scaling up/out), sharding, replication (master-slave, peer-to-peer), read/write/replication consistency, CAP Theorem, eventual consistency, BASE, Quorums
  • Lecture 3 (8/10/2014): MapReduce, Hadoopslides + practice slides
    • content: Google File System (GFS), MapReduce; Apache Hadoop: HDFS, MapReduce; Hadoop in Practice
    • covered terms: GFS Master, GFS Chunkserver; MapReduce: Map, Combine, Grouping/Shuffling, Reduce; Hadoop Distributed File System (NameNode, DataNode, HeartBeat, BlockReport); Apache YARN, JobTracker, TaskTracker
  • Lecture 4 (15/10/2014): Key-value Storesslides
    • content: Fundamentals; Suitable Use Cases; Basic Example (Riak); Key challenges and solutions: data sharding, data balancing, replica management, management of nodes; Comparison of Individual Stores: features to consider, connecting to database;
    • covered terms: Amazon Dynamo, consistent hashing, virtual nodes, version stamps (counter, GUID, hash, timestamp), vector stamps (Lamport timestamps, vector clocks, version vectors, matrix clocks), anti-entropy, read repair, gossip protocols, two-phase commit protocol (2PC), multi-version concurrency control (MVCC), levels of isolation, skew write anomaly
  • Lecture 5 (22/10/2014): Key-value Stores – Practiceslides + data for practice

    • content: Riak, Infinispan: Features, Technologies Behind, Practical examples
    • covered terms: Hinted handoffs, REST Service, object marshalling, data eviction, distributed transaction management (X/Open XA), Lucene
  • Lecture 6 (29/10/2014): Data Formats, Document Databasesslides
    • content: Data Formats used in NoSQL Databases; Data Serialization; Document Databases: Usage and Principles Behind, MongoDB: Data Models, Querying, Updates, Indexes, BSON Distribution, MapReduce, Journaling, Transactions
    • covered terms: JSON, XML; Protocol Buffers, Apache Thrift; MongoDB
  • Lecture 7 (5/11/2014): MongoDB: Practiceslides
    • content: MongoDB: Installation, Data insert/update, Various types of queries, Index definition, Monitoring; Connection from Python (Mongoengine)
  • Lecture 8 (12/11/2014): Column-family Storesslides
    • content: Data Model, System Architectures, Data Partitioning & Replication, Local Data Persistence, Queries; Cassandra: Practical Experience
    • covered terms: Google BigTable, Cassandra, HBase, column family, super columns, CQL, memtables, SSTable
  • Lecture 9 (19/11/2014): Graph Databases: Principlesslides
    • content: Graph Databases: Mission, Data, Example; Graph Theory: Representations, Data Locality, Graph Partitioning and Traversal; Types of Queries; Transactional Databases; Neo4j: Basics
    • covered terms: Directed/undirected graphs, Adjacency Matrix, Adjacency List, Incidence Matrix, Laplacian Matrix; Breadth-first Search (BFS), BFS Layout, Bandwidth minimization problem, Graph Partitioning (1D, 2D); Sub-graph, Super-graph, Similarity Queries; (Non-)Mining-Based Graph Indexing Techniques; Neo4j
  • Lecture 10 (26/11/2014): Neo4j Graph Databaseslides
    • content: Neo4j: Basic information, Data model, Java API (embedded database), Traversal of the graph, Cypher query language, Other interfaces: Experience with Web UI, Neo4j internals
  • Lecture 11 (3/12/2014): Presentation of Projects
    • content: presentation of group projects
  • Lecture 12 (10/12/2014): Invited Talk: Karel Minařík – My path to NoSQLslides (PDF 14MB)
    • content: I would like to introduce selected database systems I have worked with during my career; I will analyze both their principles and practical applicability. Namely, I will focus on CouchDB, Redis and Elasticsearch, their common properties and differences. There’ll be time for a discussion after the talk.
  • Lecture 13 (17/12/2014): Presentation of Projects II
    • content: presentation of group projects