Systems Fundamentals (SF)
A computer system is a set of hardware and software infrastructure upon which applications are constructed. Computer systems have become a pillar of people’s daily life. As such, learning the knowledge about computer systems, grasping the skills to use and design these systems, and understanding the fundamental rationale and principles in computer systems are essential to equip students with the necessary competency toward a career related to computer science.
In the curriculum of computer science, the study of computer systems typically spans across multiple courses, including, but not limited to, operating systems, parallel and distributed systems, communications networks, computer architecture and organization and software engineering. The System Fundamentals knowledge area, as suggested by its name, focuses on the fundamental concepts in computer systems that are shared by these courses within their respective cores. The goal of this knowledge area is to present an integrative view of these fundamental concepts in a unified albeit simplified fashion, providing a common foundation for the different specialized mechanisms and policies appropriate to the particular domain area. These concepts include an overview of computer systems, basic concepts such as state and state transition, parallelism, resource allocation and scheduling, and so on. Compared to CS2013, the SF knowledge area makes the following major changes to the knowledge units:
- Added two new units: system security and system design;
- Added a new unit of system performance, which includes the topics from the deprecated unit of proximity and the deprecated unit of virtualization and isolation;
- Added a new unit of performance evaluation, which includes the topics from the deprecated unit of evaluation and the deprecated unit of quantitative evaluation;
- Changed the unit of computational paradigms to overview of computer systems, deprecated some topics in the unit, and added topics from the deprecated unit of cross-layer communications;
- Changed the unit of state and state transition to basic concepts, and added topics such as finite state machines;
- Changed some topics in the unit of parallelism, such as simple application-level parallel processing;
- Deprecated the unit of cross-layer communications, and moved parts of its topics to the unit of overview of computer systems;
- Deprecated the units of evaluation and quantitative evaluation, and moved parts of their topics to the unit of performance evaluation;
- Deprecated the units of proximity and virtualization and isolation, and moved parts of their topics to the unit of system performance;
- Renamed the unit of reliability through redundancy to system reliability.
SF. Systems Fundamentals. [18 Core-Tier1 hours, 9 Core-Tier2 hours]
| Core-Tier1 hours | Core-Tier2 hours | Includes Electives | |
| SF/Overview of Computer Systems | 3 | N | |
| SF/Basic Concepts | 4 | N | |
| SF/Parallelism | 3 | N | |
| SF/Resource Allocation and Scheduling | 1 | 2 | N |
| SF/System Performance | 2 | 2 | N |
| SF/Performance Evaluation | 2 | 2 | N |
| SF/System Reliability | 1 | 1 | N |
| SF/System Security | 1 | 1 | Y |
| SF/System Design | 1 | 1 | Y |
SF/Overview of Computer Systems
[3 Core-Tier1 hours]
Topics:
- Basic building blocks and components of a computer (gates, flip-flops, registers, interconnections; datapath + control + memory)
- Hardware as a computational paradigm: Fundamental logic building blocks; Logic expressions, minimization, sum of product forms
- Programming abstractions, interfaces, use of libraries
- Distinction between application and OS services, remote procedure call
- Application-OS interaction
- Basic concept of pipelining, overlapped processing stages
- Basic concept of scaling: going faster vs. handling larger problems
Learning Outcomes:
- Describe the basic building blocks of computers and their role in the historical development of computer architecture.
- Design a simple logic circuit using the fundamental building blocks of logic design to solve a simple problem (e.g., adder).
- Use tools for capture, synthesis, and simulation to evaluate a logic circuit design.
- Describe how computing systems are constructed of layers upon layers, based on separation of concerns, with well-defined interfaces, hiding details of low layers from the higher layers.
- Describe that hardware, OS, VM, application are additional layers of interpretation/processing.
- Describe the mechanisms of how errors are detected, signaled back, and handled through the layers.
- Construct a simple program (e.g., a TCP client/server) using methods of layering, error detection and recovery, and reflection of error status across layers.
- Find bugs in a layered program by using tools for program tracing, single stepping, and debugging.
- Understand the concept of strong vs. weak scaling, i.e., how performance is affected by scale of problem vs. scale of resources to solve the problem. This can be motivated by the simple, real-world examples.
SF/Basic Concepts
[4 Core-Tier1 hours]
Topics:
- Digital vs. Analog/Discrete vs. Continuous Systems
- Simple logic gates, logical expressions, Boolean logic simplification
- Clocks, State, Sequencing
- State and state transition (e.g., starting state, final state, life cycle of states)
- Finite state machines (e.g., NFA, DFA)
- Combinational Logic, Sequential Logic, Registers, Memories
- Computers and Network Protocols as examples of State Machines
Learning Outcomes:
- Describe the differences between digital and analog systems, and between discrete and continuous systems. Can give real-world examples of these systems.
- Describe computations as a system characterized by a known set of configurations with transitions from one unique configuration (state) to another (state).
- Describe the distinction between systems whose output is only a function of their input (stateless) and those with memory/history (stateful).
- Develop state machine descriptions for simple problem statement solutions (e.g., traffic light sequencing, pattern recognizers).
- Describe a computer as a state machine that interprets machine instructions.
- Explain how a program or network protocol can also be expressed as a state machine, and that alternative representations for the same computation can exist.
- Derive time-series behavior of a state machine from its state machine representation (e.g., TCP connection management state machine).
SF/Parallelism
[3 Core-Tier1 hours]
[Cross-reference: PD/Parallelism Fundamentals]
Topics:
- Sequential vs. parallel processing
- Application-level sequential processing: single thread
- Simple application-level parallel processing: request level (web services/client-server/distributed), single thread per server, multiple threads with multiple servers, pipelining
- Parallel programming vs. concurrent programming
- Request parallelism vs. Task parallelism
- Multicore architectures and hardware support for synchronization
Learning Outcomes:
- Write a simple sequential problem and a simple parallel version of the same program.
- Evaluate the performance of simple sequential and parallel versions of a program with different problem sizes, and be able to describe the speed-ups achieved.
- Demonstrate on an execution timeline that parallelism events and operations can take place simultaneously (i.e., at the same time). Explain how work can be performed in less elapsed time if this can be exploited.
- Explain other uses of parallelism, such as for reliability/redundancy of execution.
- Describe the differences between the concepts of instruction parallelism, data parallelism, thread parallelism/multitasking, task/request parallelism.
- Write one simple parallel program (e.g., PageRank) in more than one parallel programming paradigm (e.g., a simple parallel program that manages shared resources through synchronization primitives; and a simple parallel program that performs simultaneous operation on partitioned data through task parallel).
- Use performance tools to measure speed-up achieved by parallel programs in terms of both problem size and number of resources.
SF/Resource Allocation and Scheduling
[1 Core-Tier1 hour and 2 Core-Tier2 hours]
Topics:
- Different types of resources (e.g., processor share, memory, disk, net bandwidth)
- Common scheduling algorithms (e.g., first-come-first-serve scheduling, priority-based scheduling, fair scheduling and preemptive scheduling )
- Advantages and disadvantages of common scheduling algorithms
Learning Outcomes:
- Define how finite computer resources (e.g., processor share, memory, storage and network bandwidth) are managed by their careful allocation to existing entities.
- Describe how common scheduling algorithms work.
- Describe the pros and cons of common scheduling algorithms
- Implement common scheduling algorithms, and evaluate their performances.
SF/System Performance
[2 Core-Tier1 hours and 2 Core-Tier2 hours]
[Cross-reference: AR/Memory Management, OS/Virtual Memory]
Topics:
- Latencies in computer systems
- Speed of light and computers (one foot per nanosecond vs. one GHz clocks)
- Memory vs. disk latencies vs. across the network memory
- Caches and the effects of spatial and temporal locality on performance in processors and systems
- Caches and cache coherency in databases, operating systems, distributed systems, and computer architecture
- Introduction into the processor memory hierarchy and the formula for average memory access time
- Rationale of virtualization and isolation: protection and predictable performance
- Levels of indirection, illustrated by virtual memory for managing physical memory resources
- Methods for implementing virtual memory and virtual machines
Learning Outcomes:
- Explain the importance of locality in determining system performance.
- Calculate average memory access time and describe the tradeoffs in memory hierarchy performance in terms of capacity, miss/hit rate, and access time.
- Explain why it is important to isolate and protect the execution of individual programs and environments that share common underlying resources.
- Describe how the concept of indirection can create the illusion of a dedicated machine and its resources even when physically shared among multiple programs and environments.
- Measure the performance of two application instances running on separate virtual machines, and determine the effect of performance isolation.
SF/Performance Evaluation
[2 Core-Tier1 hours and 2 Core-Tier2 hours]
Topics:
- Performance figures of merit
- Workloads and representative benchmarks, and methods of collecting and analyzing performance figures of merit
- CPI (Cycles per Instruction) equation as tool for understanding tradeoffs in the design of instruction sets, processor pipelines, and memory system organizations.
- Amdahl’s Law: the part of the computation that cannot be sped up limits the effect of the parts that can
- Analytical tools to guide quantitative evaluation
- Order of magnitude analysis (Big O notation)
- Analysis of slow and fast paths of a system
- Events on their effect on performance (e.g., instruction stalls, cache misses, page faults)
- Understanding layered systems, workloads, and platforms, their implications for performance, and the challenges they represent for evaluation
- Microbenchmarking pitfalls
Learning Outcomes:
- Explain how the components of system architecture contribute to improving its performance.
- Explain the circumstances in which a given figure of system performance metric is useful.
- Explain the usage and inadequacies of benchmarks as a measure of system performance.
- Describe Amdahl’s law and discuss its limitations.
- Use limit studies or simple calculations to produce order-of-magnitude estimates for a given performance metric in a given context.
- Use software tools to profile and measure program performance.
- Design and conduct a performance-oriented experiment of a common system (e.g., an OS and Spark).
- Conduct a performance experiment on a layered system to determine the effect of a system parameter on figure of system performance.
SF/System Reliability
[1 Core-Tier1 hours and 1 Core-Tier2 hours]
Topics:
- Distinction between bugs and faults
- Reliability through redundancy: check and retry
- Reliability through redundancy: redundant encoding (error correcting codes, CRC, FEC)
- Reliability through redundancy: duplication/mirroring/replicas
- Other approaches to reliability
Learning Outcomes:
- Explain the distinction between program errors, system errors, and hardware faults (e.g., bad memory) and exceptions (e.g., attempt to divide by zero).
- Articulate the distinction between detecting, handling, and recovering from faults, and the methods for their implementation.
- Describe the role of error correcting codes in providing error checking and correction techniques in memories, storage, and networks.
- Apply simple algorithms for exploiting redundant information for the purposes of data correction.
- Compare different error detection and correction methods for their data overhead, implementation complexity, and relative execution time for encoding, detecting, and correcting errors.
SF/System Security
[1 Core-Tier1 hours and 1 Core-Tier2 hours]
Topics:
- Common system security issues (e.g., virus, denial-of-service attack and eavesdropping)
- Countermeasures
- Cryptography
- Security architecture
- Intrusion detection systems, firewalls
Learning Outcomes:
- Describe some common system security issues and give examples.
- Describe some countermeasures against system security issues.
SF/System Design
[1 Core-Tier1 hours and 1 Core-Tier2 hours]
Topics:
- Common criteria of system design (e.g., liveness, safety, robustness, scalablility and security)
- Designs of representative systems (e.g., Apache web server, Spark and Linux)
Learning Outcomes:
- Describe common criteria of system design.
- Given the functionality requirements of a system and its key design criterias, provide a high-level design of this system.
- Describe the design of some representative systems.