2025¶

Question 1 [25 marks] — Ship Tracking Database¶

(i) ER Model — [10 marks]¶

Provide a suitable Entity Relationship schema for the SHIP_TRACKING database. State any assumptions you make.

Entities and Attributes¶

Entity	Attributes	Primary Key
Ship	shipName, tonnage, hullType	shipName
Port	portName, country, continent, bodyOfWater	portName
Visit	visitDate, arrivalTime, departureTime	(portName, shipName, visitDate)

Relationship: HomePort¶

Each ship has exactly one home port. This is a 1:N relationship from Port to Ship with total participation on the Ship side (every ship must have a home port).

Relationship: Visits¶

A ship visits multiple ports over time, and a port is visited by multiple ships. This is an M:N relationship between Ship and Port, represented as the weak entity Visit (an associative entity with its own attributes).

Visit is an associative entity (linking Ship and Port) with attributes visitDate, arrivalTime, and departureTime.
The composite primary key of Visit is (shipName, portName, visitDate).
shipName is a FK referencing Ship(shipName).
portName is a FK referencing Port(portName).

Hull Type Constraint¶

The hull type is an attribute of Ship with a domain constraint — values must be one of flat-bottom, round-bottom, v-shaped, or multi-hulled. In the relational schema this is enforced via a CHECK constraint.

Body of Water / Continent¶

Each port is located on a body of water (sea, ocean, or lake) and in a country, which belongs to a continent. These are single-valued attributes of Port:

bodyOfWater — the sea, ocean, or lake the port is on.
country — the country the port is located in.
continent — the continent of that country.

No separate entity is needed for countries or bodies of water unless the requirements explicitly demand tracking multiple ports per country/ocean with their own attributes. Since the requirements only need to record the country and continent per port, these are modelled as simple attributes.

Current Location¶

The current location of a ship is recorded at 5-minute intervals. This is a separate relationship (or a separate table in the relational mapping) called LocationRecord that captures the ship's position over time. Each record stores the ship name, timestamp, and the port where the ship is currently docked (or NULL if in transit).

ER Diagram (Textual Representation)¶

┌──────────────────┐
│      Port        │
├──────────────────┤
│ *portName        │
│   country        │
│   continent      │
│   bodyOfWater    │
└───────┬──────────┘
        │ 1
        │ HomePort
        │ (Ship: total)
        │ N
        ▼
┌──────────────────┐
│      Ship        │
├──────────────────┤
│ *shipName        │
│   tonnage        │
│   hullType       │
└──────────────────┘
        │
        │ Visits (M:N, via associative entity Visit)
        │
┌──────────────────┐
│     Visit        │
│  (associative)   │
├──────────────────┤
│ *shipName (FK)   │
│ *portName (FK)   │
│ *visitDate       │
│   arrivalTime    │
│   departureTime  │
└──────────────────┘

Current Location tracking:
┌──────────────────┐
│   LocationRecord │
├──────────────────┤
│ *shipName (FK)   │
│ *recordTime      │
│   currentPort    │
│                  │
│ FK: shipName     │
│   REFERENCES     │
│   Ship(shipName) │
│ FK: currentPort  │
│   REFERENCES     │
│   Port(portName) │
└──────────────────┘

#### Assumptions Made

1. **shipName uniquely identifies each ship** — no two ships share the same name. In practice, a ship ID would be used, but the requirements do not specify one.
2. **portName uniquely identifies each port** — each port has a unique name.
3. **A ship has exactly one home port** — this is a mandatory (total participation) 1:N relationship.
4. **Visit is an associative entity** — each visit records a specific ship arriving at a specific port on a specific date. A ship can visit the same port multiple times on different dates.
5. **Current location is recorded independently of visits** — the 5-minute interval location tracking is a separate entity from the Visit entity, as it captures the ship's state at arbitrary time intervals (including when in transit with no port).
6. **A location record may have NULL currentPort** — the ship may be in transit between ports, so no port is recorded.
7. **Hull type is a constrained attribute** — only the four specified values are valid.
8. **Country and continent are simple attributes** — no separate entities for countries or bodies of water are needed as no additional attributes or relationships are required for them.

### (ii) Relational Mapping — [15 marks]

**Map the ER schema into a relational database schema and specify all primary and foreign keys. State any assumptions you make.**

#### Relational Schema

PORT(portName PK, country, continent, bodyOfWater)

SHIP(shipName PK, tonnage, hullType, homePort FK NOT NULL REFERENCES PORT(portName), CHECK (hullType IN ('flat-bottom', 'round-bottom', 'v-shaped', 'multi-hulled')))

VISIT(shipName FK NOT NULL REFERENCES SHIP(shipName), portName FK NOT NULL REFERENCES PORT(portName), visitDate PK, arrivalTime TIME, departureTime TIME, PK: (shipName, portName, visitDate))

LOCATION_RECORD(shipName FK NOT NULL REFERENCES SHIP(shipName), recordTime PK, currentPort FK NULL REFERENCES PORT(portName))

#### Schema Explanation

| Table              | Primary Key                              | Foreign Keys                                                                                              |
| ------------------ | ---------------------------------------- | --------------------------------------------------------------------------------------------------------- |
| **PORT**           | `portName`                               | None                                                                                                      |
| **SHIP**           | `shipName`                               | `homePort` → PORT(portName) — NOT NULL (total participation)                                              |
| **VISIT**          | `(shipName, portName, visitDate)`        | `shipName` → SHIP(shipName); `portName` → PORT(portName)                                                  |
| **LOCATION_RECORD**| `shipName, recordTime`                   | `shipName` → SHIP(shipName); `currentPort` → PORT(portName) — NULL allowed (ship may be in transit)       |

#### Mapping Decisions

| ER Concept                  | Relational Mapping Strategy                                       | Rationale                                                                         |
| --------------------------- | ----------------------------------------------------------------- | --------------------------------------------------------------------------------- |
| Strong entity (Port)        | Direct conversion — all attributes become columns                 | Port has a full primary key (`portName`)                                          |
| Strong entity (Ship)        | Direct conversion + FK for HomePort                               | HomePort is a 1:N relationship; FK placed on the "many" side (Ship)               |
| Associative entity (Visit)  | Separate table with composite PK from both sides + relationship attributes | M:N relationship between Ship and Port requires a junction table |
| Current location tracking   | Separate table with FK to Ship and nullable FK to Port            | Captures time-series data; currentPort is nullable when ship is in transit        |
| Hull type constraint        | CHECK constraint on hullType                                      | Enforces domain restriction to the four allowed values                            |

#### Assumptions Made

1. **shipName is the unique identifier for ships** — no separate ship ID is needed.
2. **portName is the unique identifier for ports** — no separate port ID is needed.
3. **`recordTime` in LOCATION_RECORD is a timestamp** (DATE TIME) capturing the exact 5-minute interval when the location was recorded.
4. **A ship can have multiple visits to the same port** — the composite key `(shipName, portName, visitDate)` allows this.
5. **`currentPort` in LOCATION_RECORD is nullable** — when a ship is in transit, no port is assigned.
6. **The homePort FK in SHIP is NOT NULL** — every ship must have a home port (total participation in the HomePort relationship).

---

## Question 2 [25 marks] — Airline Flight Database (Relational Algebra)

### (a) [5 marks]

**For each flight, list the flight number, airline, date, departure airport and arrival airport along with the number of available seats, for each leg.**

Note: The question requires counting reservations per leg to compute available seats (MAX_SEATS minus booked count). Counting/aggregation is not part of the basic relational algebra operators covered in the course (SELECT, PROJECT, JOIN, UNION, INTERSECTION, DIFFERENCE). Below is the relational algebra for the join and projection portion; the aggregation step is noted as an extension.

-- Step 1: Join all relevant tables to get MAXSEATS per leg AvailableSeats ← π FLIGHT_NO, AIRLINE, DATE, DEP_AIRPORT, ARR_AIRPORT, LEG_NO, MAX_SEATS ( FLIGHT ⋈{FLIGHT.FLIGHTNO = FLI.FLIGHT_NO} ( FLIGHT_LEG_INSTANCE ⋈{FLI.AIRPLANEID = A.AIRPLANE_ID} ( AIRPLANE ⋈ ( π TYPE, MAX_SEATS(AIRPLANE_TYPE) ) AS AT ) AS A ) AS FLI )

-- Step 2: Count reservations per leg (aggregation — extension to basic RA) BookedSeats ← GROUP BY FLIGHT_NO, LEG_NO, DATE, DEP_AIRPORT, ARR_AIRPORT AGGREGATE COUNT(SEAT_NO) → bookedCount ( π FLIGHT_NO, LEG_NO, DATE, DEP_AIRPORT, ARR_AIRPORT, SEAT_NO(RESERVATION) )

-- Step 3: Join and compute available seats RESULT ← π FLIGHTNO, AIRLINE, DATE, DEP_AIRPORT, ARR_AIRPORT, LEG_NO, (MAX_SEATS - bookedCount) ( AvailableSeats ⋈{AvailableSeats.FLIGHT_NO = BS.FLIGHT_NO ∧ AvailableSeats.LEG_NO = BS.LEG_NO ∧ AvailableSeats.DATE = BS.DATE ∧ AvailableSeats.DEP_AIRPORT = BS.DEP_AIRPORT ∧ AvailableSeats.ARR_AIRPORT = BS.ARR_AIRPORT} BookedSeats AS BS )

---

### (b) [5 marks]

**For all the airplane types available in the database, list the airplane type, name of the company, maximum seats available and the airports (or airport codes) where these planes can land.**

π TYPE, COMPANY, MAXSEATS, AIRPORT_CODE ( AIRPLANE_TYPE ⋈ ALLOWED_LANDING )

Or more explicitly:

RESULT ← π AIRPLANE_TYPE.TYPE, COMPANY, MAX_SEATS, AIRPORT_CODE (AIRPLANE_TYPE ⋈ ALLOWED_LANDING)

This is a simple natural join between AIRPLANE_TYPE and ALLOWED_LANDING, projecting the required attributes. The join produces one row per (airplane type, airport) pair where that type is allowed to land.

---

### (c) [5 marks]

**List the name, seat number and phone numbers of all the customers of all flights or flight legs that departed from Houston Intercontinental Airport (airport code 'iah') and arrived in Los Angeles International Airport (airport code 'lax') on '2016-03-16'.**

TARGET_RESERVATIONS ← σ DEP_AIRPORT='iah' ∧ ARR_AIRPORT='lax' ∧ DATE='2016-03-16' (RESERVATION)

RESULT ← π NAME, SEATNO, PHONE ( CUSTOMER ⋈ TARGET_RESERVATIONS )

Or as a single expression:

RESULT ← π NAME, SEAT_NO, PHONE ( CUSTOMER ⋈ ( σ DEP_AIRPORT='iah' ∧ ARR_AIRPORT='lax' ∧ DATE='2016-03-16' (RESERVATION) ) )

---

### (d) [5 marks]

**List fare information for all the flights run by the airline 'Delta Airlines'.**

DELTA_FLIGHTS ← σ AIRLINE='Delta Airlines'(FLIGHT)

RESULT ← π FLIGHTNO, CLASS, AMOUNT ( DELTA_FLIGHTS ⋈ FARE )

Or more concisely:

RESULT ← π FLIGHT_NO, CLASS, AMOUNT ( σ AIRLINE='Delta Airlines'(FLIGHT) ⋈ FARE )

---

### (e) [5 marks]

**Retrieve the number of available seats on all flights run by Delta Airline, on '2016-04-09'.**

Note: This question requires counting reservations per leg to compute available seats. Aggregation (COUNT, GROUP BY) is not part of the basic relational algebra operators covered in the course (SELECT, PROJECT, JOIN, UNION, INTERSECTION, DIFFERENCE). Below is the relational algebra for the join and projection portion; the aggregation step is noted as an extension.

-- Step 1: Filter Delta flights DELTA_FLIGHTS ← σ AIRLINE='Delta Airlines'(FLIGHT)

-- Step 2: Join with leg instances, filter by date DELTALEGS ← π FLIGHT_NO, LEG_NO, DATE, DEP_AIRPORT, ARR_AIRPORT, AIRPLANE_ID ( DELTA_FLIGHTS ⋈ ( FLIGHT_LEG_INSTANCE AS FLI ) )

-- Step 3: Join with airplane types to get MAXSEATS DELTA_WITH_SEATS ← π FLIGHT_NO, LEG_NO, DATE, DEP_AIRPORT, ARR_AIRPORT, AIRPLANE_ID, MAX_SEATS ( DELTA_LEGS ⋈{AIRPLANEID = A.AIRPLANE_ID} ( AIRPLANE ⋈ ( π TYPE, MAX_SEATS(AIRPLANE_TYPE) ) AS AT ) AS A )

-- Step 4: Count booked seats per leg on the date (aggregation — extension to basic RA) DELTA_BOOKED ← GROUP BY FLIGHT_NO, LEG_NO, DATE, DEP_AIRPORT, ARR_AIRPORT AGGREGATE COUNT(SEAT_NO) → bookedCount ( π FLIGHT_NO, LEG_NO, DATE, DEP_AIRPORT, ARR_AIRPORT, SEAT_NO ( σ DATE='2016-04-09'(RESERVATION) ) )

-- Step 5: Join and compute available seats RESULT ← π FLIGHTNO, LEG_NO, DATE, (MAX_SEATS - bookedCount) ( DELTA_WITH_SEATS ⋈{DELTA_WITH_SEATS.FLIGHT_NO = BS.FLIGHT_NO ∧ DELTA_WITH_SEATS.LEG_NO = BS.LEG_NO ∧ DELTA_WITH_SEATS.DATE = BS.DATE} DELTA_BOOKED AS BS )

---

## Question 3 [25 marks] — University Database (SQL)

### (i) [4 marks]

**Create each of the tables in the above schema. Making sure to appropriately apply constraints, include primary and foreign keys.**

```sql
CREATE TABLE COURSE (
    COURSE_ID   VARCHAR PRIMARY KEY,
    TITLE       VARCHAR NOT NULL,
    CREDITS       INT NOT NULL CHECK (CREDITS > 0)
);

CREATE TABLE INSTRUCTOR (
    INST_ID     VARCHAR PRIMARY KEY,
    NAME        VARCHAR NOT NULL,
    DEPT        VARCHAR NOT NULL
);

CREATE TABLE SECTION (
    SECTION_ID  VARCHAR PRIMARY KEY,
    COURSE_ID   VARCHAR NOT NULL REFERENCES COURSE(COURSE_ID),
    INST_ID     VARCHAR NOT NULL REFERENCES INSTRUCTOR(INST_ID),
    SEMESTER    VARCHAR NOT NULL,
    YEAR        INT NOT NULL
);

CREATE TABLE STUDENT (
    STU_ID      VARCHAR PRIMARY KEY,
    NAME        VARCHAR NOT NULL,
    DOB         DATE NOT NULL,
    MAJOR       VARCHAR NOT NULL,
    CLASS       INT NOT NULL
);

CREATE TABLE TAKES (
    STU_ID      VARCHAR NOT NULL REFERENCES STUDENT(STU_ID),
    SECTION_ID  VARCHAR NOT NULL REFERENCES SECTION(SECTION_ID),
    GRADE       VARCHAR,
    PRIMARY KEY (STU_ID, SECTION_ID)
);

Note: The STU_ID and SECTION_ID columns are both part of the composite primary key. Per entity integrity (covered in the course), no part of a primary key can be NULL, so both are declared NOT NULL. This also satisfies referential integrity by ensuring every takes record references existing students and sections.

(ii) [4 marks]¶

Retrieve the names of all courses along with the name of the instructor taught during the fall of 2008.

SELECT C.TITLE, I.NAME
FROM COURSE C
JOIN SECTION S ON C.COURSE_ID = S.COURSE_ID
JOIN INSTRUCTOR I ON S.INST_ID = I.INST_ID
WHERE S.SEMESTER = 'fall' AND S.YEAR = 2008;

(iii) [4 marks]¶

For each section taught by Professor Anderson, retrieve the course number, semester, year, and number of students who took the section.

SELECT S.COURSE_ID, S.SEMESTER, S.YEAR, COUNT(T.STU_ID) AS NUM_STUDENTS
FROM SECTION S
JOIN INSTRUCTOR I ON S.INST_ID = I.INST_ID
LEFT JOIN TAKES T ON S.SECTION_ID = T.SECTION_ID
WHERE I.NAME = 'Professor Anderson'
GROUP BY S.COURSE_ID, S.SEMESTER, S.YEAR;

(iv) [4 marks]¶

Retrieve the name and transcript of each junior student (Class = 1) majoring in mathematics (MATH). A transcript includes course name, course number, credit hours, semester, year, and grade for each course completed by the student.

SELECT S.NAME, C.TITLE AS COURSE_NAME, C.COURSE_ID, C.CREDITS,
       S2.SEMESTER, S2.YEAR, T.GRADE
FROM STUDENT S
JOIN TAKES T ON S.STU_ID = T.STU_ID
JOIN SECTION S2 ON T.SECTION_ID = S2.SECTION_ID
JOIN COURSE C ON S2.COURSE_ID = C.COURSE_ID
WHERE S.CLASS = 1 AND S.MAJOR = 'MATH';

(v) [4 marks]¶

Retrieve the names and major departments of all straight-A students (students who have a grade of A in all their courses).

SELECT S.NAME, S.MAJOR
FROM STUDENT S
WHERE NOT EXISTS (
    SELECT *
    FROM TAKES T
    WHERE T.STU_ID = S.STU_ID
      AND (T.GRADE <> 'A' OR T.GRADE IS NULL)
);

Note: The condition explicitly handles NULL grades — a student with a NULL grade is not a straight-A student. The T.GRADE IS NULL clause ensures that students with any NULL grade are excluded.

Alternatively, using aggregation:

SELECT S.NAME, S.MAJOR
FROM STUDENT S
JOIN TAKES T ON S.STU_ID = T.STU_ID
GROUP BY S.STU_ID, S.NAME, S.MAJOR
HAVING COUNT(*) = COUNT(CASE WHEN T.GRADE = 'A' THEN 1 END);

Note: In the aggregation version, NULL grades are counted in COUNT(*) but not in the CASE expression, so students with any NULL grade are correctly excluded.

(vi) [5 marks]¶

Retrieve the names and major departments of all students who do not have any grade of A in any of their courses.

SELECT S.NAME, S.MAJOR
FROM STUDENT S
WHERE NOT EXISTS (
    SELECT *
    FROM TAKES T
    WHERE T.STU_ID = S.STU_ID
      AND T.GRADE = 'A'
);

Alternatively, using aggregation:

SELECT S.NAME, S.MAJOR
FROM STUDENT S
JOIN TAKES T ON S.STU_ID = T.STU_ID
GROUP BY S.STU_ID, S.NAME, S.MAJOR
HAVING COUNT(CASE WHEN T.GRADE = 'A' THEN 1 END) = 0;

Note: This query only returns students who have taken at least one course. To include students who have not taken any courses, use a LEFT JOIN:

SELECT S.NAME, S.MAJOR
FROM STUDENT S
LEFT JOIN TAKES T ON S.STU_ID = T.STU_ID
GROUP BY S.STU_ID, S.NAME, S.MAJOR
HAVING COUNT(CASE WHEN T.GRADE = 'A' THEN 1 END) = 0;

Question 4 [25 marks] — Concurrency Control¶

(i) [5 marks]¶

Consider the following two transactions, T1 and T2, and their operations. Analyze whether T2 operations are updated successfully when T1 is rolled back.

T1                              T2
────                              ────
Read(X)                         Read(X)
X = X – 2                       X = X + 3
Write(X)                        Write(X)
ROLLBACK                        COMMIT

Answer:

T2's COMMIT will succeed regardless of T1's ROLLBACK. Here is the step-by-step analysis:

T1 executes Read(X): T1 reads the current value of X (call it V). T1 computes X = V - 2 and writes it with Write(X). This is an uncommitted update.
T2 executes Read(X): The value T2 reads depends on the concurrency control protocol:
With strict 2PL (or MVCC): T1 holds an exclusive lock on X. T2 cannot read T1's uncommitted value. T2 waits until T1 commits or rolls back, then reads the current committed value V. T2 computes X = V + 3.
With no concurrency control (or read-uncommitted isolation): T2 reads T1's uncommitted value V - 2. T2 computes X = (V - 2) + 3 = V + 1.
T2 executes Write(X): T2 writes its computed value to X (still uncommitted).
T1 executes ROLLBACK: T1's Write(X) is undone. X is restored to V.
T2 executes COMMIT: T2's Write(X) is made permanent.

Result:

With strict 2PL: X = V + 3. T2's update is correct and successful.
Without concurrency control: X = V + 1. T2's update succeeds but is semantically incorrect (a dirty read led to a dirty write). This is a lost update anomaly.

Additionally, this scenario could exhibit a write-skew anomaly if T1 and T2 both read overlapping data sets and make disjoint updates based on stale reads — though in this specific case the primary issue is the lost update caused by T2 reading T1's uncommitted value.

Conclusion: T2's COMMIT always succeeds in writing its value. However, proper concurrency control is essential to ensure the result is semantically correct. Strict 2PL prevents T2 from reading T1's uncommitted write, ensuring T2 reads the original value V and produces the correct result V + 3.

(ii) [12 marks]¶

Does strict two-phase locking always guarantee strict schedules? Explain your answer in detail.

Answer:

Yes, strict two-phase locking guarantees strict schedules.

Definitions:

Strict Two-Phase Locking (Strict 2PL): A transaction acquires shared (read) and exclusive (write) locks following the two-phase protocol. The distinguishing property of strict 2PL is that all exclusive locks are held until the transaction commits or rolls back. Shared locks may be released during the shrinking phase.
Strict Schedule: A schedule is strict if no transaction reads or writes a data item that has been written by an uncommitted transaction. Equivalently: if T1 writes X and T2 reads or writes X, then T2's operation on X occurs only after T1 has committed or rolled back.

Proof:

In strict 2PL, when T1 writes a data item X, it acquires an exclusive lock on X. This lock is held until T1 commits or rolls back. Therefore:

No dirty reads: T2 cannot acquire a shared lock on X while T1 holds the exclusive lock. T2 can only read X after T1 has released its exclusive lock (i.e., after T1 commits or rolls back).
No dirty writes: T2 cannot acquire an exclusive lock on X while T1 holds the exclusive lock. T2 can only write X after T1 has released its exclusive lock.

Since both dirty reads and dirty writes are impossible under strict 2PL, no transaction can ever read or write data modified by an uncommitted transaction. This is precisely the definition of a strict schedule.

Recoverability: A strict schedule is also recoverable — no transaction reads uncommitted data, so no transaction ever needs to be rolled back due to reading data from a transaction that later aborts. This is a stronger property than recoverability alone: strict schedules prevent both dirty reads and cascading rollbacks.

Therefore, strict 2PL guarantees strict schedules.

(iii) [3 marks]¶

Provide a definition for deadlock.

Answer:

A deadlock is a situation in which two or more transactions are each waiting for a lock held by another transaction in the set, forming a circular wait condition. As a result, none of the transactions in the cycle can proceed because each is blocked by another.

Formally, a deadlock occurs when there exists a cycle in the wait-for graph — a directed graph where nodes represent transactions and an edge from T1 to T2 means T1 is waiting for a lock held by T2. If the graph contains a cycle, all transactions in that cycle are deadlocked.

(iv) [5 marks]¶

Does cautious waiting avoid deadlock?

Answer:

Yes, cautious waiting avoids deadlock.

Explanation:

Cautious waiting is a deadlock prevention protocol used in timestamp-based concurrency control (such as Thomas' Write Rule or variations of 2PL with timestamps). The key idea is:

When a transaction T1 requests a lock on a data item X that is currently held by transaction T2:
If T1's timestamp is older than T2's timestamp (T1 started before T2), T1 is placed in a wait state (T1 waits for T2 to release the lock).
If T1's timestamp is newer than T2's timestamp (T1 started after T2), T1 is wounded (or rolled back) — T1 is aborted and restarted with a new timestamp.

Why cautious waiting prevents deadlock:

The critical property of cautious waiting is that it breaks the circular wait condition — one of the four necessary conditions for deadlock (along with mutual exclusion, hold and wait, and no preemption).

By always aborting the younger transaction (the one with the newer timestamp) when there is a conflict, the protocol ensures that a transaction can only wait for older transactions. Since timestamps are assigned in order and never change, there can be no cycle in the wait-for graph: a transaction with timestamp T_i can only wait for transactions with timestamps T_j where T_j < T_i. Since there is no way to form a cycle (you can only wait for older transactions, which can only wait for even older ones, eventually reaching the oldest transaction which cannot wait for anyone), deadlock is impossible.

Trade-off: While cautious waiting prevents deadlock, it may result in more aborts and restarts compared to deadlock detection and recovery schemes. However, it eliminates the overhead of maintaining and scanning the wait-for graph for cycles.