NoSQL Databases 2018 – Lectures

Course OverviewÂ | Lectures | SeminarsÂ | Group ProjectsÂ | Sources

Lecture 1: Why NoSQL, Principles, Overview, Course organization – slides
- content: Motivation for NoSQL databases (Big Data, Big Users, Cloud Computing, Horizontal scalability); Value of Relational databases; General principles of NoSQL databases; Types of NoSQL databases (basic characteristics, uses cases, representatives); One example: Database technologies behind Facebook;
- covered terms: Big Data (Volume, Velocity, Variety), OLTP/OLAP/RTAP, RDBMS, ACID, Aggregate-oriented data models, Key-value stores, Document databases, Column-family stores, Graph databases
Lecture 2: Distributed Computing with MapReduce – slides
- content: Distributed File Systems, Google File System (GFS), MapReduce programming model; MapReduce Framework; Apache Hadoop ecosystem; Apache Spark
- covered terms: Distributed File Systems: GFS, chunk server; MapReduce: Map, Combine, Grouping/Shuffling, Reduce; Hadoop Distributed File System (NameNode, DataNode, HeartBeat, BlockReport); Apache YARN, JobTracker, TaskTracker
Lecture 3: Principles of NoSQL Databases: Data Model, Distribution & Consistency – slides
- content: Basic Principles of NoSQL Databases – Aggregate data model, horizontal scaling, relaxing consistency; Models of Data Distribution; Consistency in databases, transactions; Relaxing consistency in distributed databases – theories and techniques; relaxing durability;
- covered terms: aggregate data model, vertical/horizontal scalability (scaling up/out), sharding, replication (master-slave, peer-to-peer), read/write/replication consistency, CAP Theorem, eventual consistency, BASE, Quorums
Lecture 4: Distributed Key-value Stores – slides
- content: Key challenges and solutions: data sharding, data balancing, replica management, management of nodes; Comparison of Individual Stores: features to consider, connecting to database;Fundamentals; Suitable Use Cases; Basic Example (Riak)
- covered terms: Amazon Dynamo, consistent hashing, virtual nodes, version stamps (counter, GUID, hash, timestamp), vector stamps (Lamport timestamps, vector clocks, version vectors, matrix clocks), anti-entropy, read repair, gossip protocols, two-phase commit protocol (2PC), multi-version concurrency control (MVCC), levels of isolation, skew write anomaly
Lecture 5: Key-value Stores II: Embedded, Distributed, and In-memory Stores – slides
- content: embedded stores: LevelDB; Distributed key-value stores: Riak, Infinispan; in-memory caches: Memcached. Serialization: Protocol Buffers, Apache Thrift
- covered terms: Log-structured Merge-Tree (LSM Tree), SSTable; Riak Links, Indexes, Search; memory cache, data eviction, distributed transaction management (X/Open XA), Lucene (Solr); Memcached; object serialisation (marshalling), Protocol Buffers, Apache Thrift
Lecture 6: Document Databases – slides
- content: Text Data Formats; Document Databases: Usage and Principles Behind, MongoDB: Data Models, Querying, Updates, Indexes, BSON, Distribution, MapReduce, Journaling, Transactions
- covered terms: JSON, XML; MongoDB
Lecture 7 (4/9/2018): Column-family Stores – slides
- content: Column Family Data Model, System Architectures; Cassandra: CQL, Data Partitioning & Replication, Local Data Persistence, Queries
- covered terms: Google BigTable, Cassandra, HBase, column family, super columns, CQL, memtables, SSTable, lightweight transactions
Lecture 8 (4/16/2018): Graph Databases – slides
- content: Graph Databases: Mission, Data, Example; Graph Theory: Representations, Data Locality, Graph Partitioning and Traversal; Types of Queries; Transactional Databases; Neo4j: Basics
- covered terms: Directed/undirected graphs, Adjacency Matrix, Adjacency List, Incidence Matrix, Laplacian Matrix; Breadth-first Search (BFS), BFS Layout, Bandwidth minimization problem, Graph Partitioning (1D, 2D); Sub-graph, Super-graph, Similarity Queries; (Non-)Mining-Based Graph Indexing Techniques; Neo4j
Lecture 9-11 (4/23/2018, 4/30/2018, 5/7/2018): Presentation of Projects
- content: presentation of group projects
Lecture 12 (14/5/2018): A Small Peek into Big Data Analytics
- by VÃ¡clav Lorenc (Senior Security Analyst), Oracle | NetSuite
- annotation: â€œData is the new bacon! Splunk is the new Excel!â€ — Big data and data analytics in general are gaining momentum in contemporary world. But what is it really about? How difficult it is to start with data analytics? Can you start small with big data problems?
  
  In the talk, you’ll get a very brief overview of a practical data analysis — basic knowledge and few tools will be described, as well as general motivation; all that combined with more or less funny stories from real world. We’ll focus both on technical and non-technical aspects of the data analytics, both equally important for day-to-day activities.