# EdTech Classroom Engagement Schema

> They opened the assignment. Did they actually read it?

Canonical URL: <https://datadriven.io/problems/edtech_classroom_engagement_schema>

Domain: Data Modeling · Difficulty: medium · Seniority: L4

## Problem

We run a digital classroom platform used by K-12 schools. Teachers post assignments, students submit them, and everyone messages each other in class channels. We need to track student engagement to surface insights for teachers and to power a school district analytics dashboard. Design the data model.

## Worked solution and explanation

### Why this problem exists in real interviews

This probes whether a candidate distinguishes **course (template)** from **class instance (occurrence)** and whether they can separate **impression from read** in engagement events. Both distinctions quietly control the validity of every downstream metric.

> **Trick to Solving**
>
> Before drawing tables, a strong candidate asks: does 'Algebra I' refer to the curriculum or the specific section Mr. Li teaches this year? And is a 'view' the same as a 'read'? The signal is that both are two entities masquerading as one.
> 
> 1. Split `dim_courses` from `dim_class_instances`
> 2. Bridge student-class with enrollment history
> 3. Model impressions and reads as distinct event types
> 4. Keep submissions on their own fact table

---

### Break down the requirements

#### Step 1: Split curriculum from occurrence

A course is a reusable template; a class instance is a specific teacher teaching a specific section in a specific school year. Both roll up into different reports (curriculum effectiveness vs teacher performance).

#### Step 2: Bridge student-class with history

`bridge_class_enrollment` carries `(student_key, class_instance_key, enrolled_at, dropped_at)`. Roster changes mid-semester are common, and without a history bridge the engagement denominator is wrong the moment a student drops.

#### Step 3: Distinguish impression from read

An impression is 'appeared in feed'; a read is 'opened for 3+ seconds.' Two different event types on `fact_engagement_events` with the same grain, never collapsed into one 'view' column.

#### Step 4: Keep submissions on their own fact

`fact_assignment_submissions` has a different grain (one row per student per assignment) and different measures (submitted_at, score, is_late). Mixing them into engagement events loses the ability to compute completion rates.

---

### The solution

Below is one conceptually sound approach: course and class instance as separate dimensions, enrollment as an SCD Type 2 bridge, and two fact tables at two distinct grains.

> **Why this works**
>
> Two grains, two fact tables, and a history bridge for roster. Engagement rates stay correct even when a student drops mid-term because the denominator is computed from the bridge at the relevant point in time. The trade-off is one extra join on every classroom report.

> **Interviewers watch for**
>
> A strong candidate calls out the course-versus-instance split in the first question. They also insist that impression and read are different events. Weak candidates collapse everything into a single 'view' count and cannot explain how a dropped student affects the engagement baseline.

> **Common pitfall**
>
> Using the current class roster as the denominator for a month-ago engagement rate. Anyone who dropped mid-month silently pumps up the numerator-to-denominator ratio, and the dashboards trend upward for the wrong reason.

---

### The analysis pattern

**Read rate by class instance last week**

```sql
SELECT
    ci.class_instance_key,
    COUNT(DISTINCT CASE WHEN e.event_type = 'read' THEN e.student_key END) * 1.0 /
        NULLIF(COUNT(DISTINCT b.student_key), 0) AS read_rate
FROM dim_class_instances ci
JOIN bridge_class_enrollment b
    ON b.class_instance_key = ci.class_instance_key
    AND b.enrolled_at <= CURRENT_DATE
    AND (b.dropped_at IS NULL OR b.dropped_at > CURRENT_DATE - INTERVAL '7 days')
LEFT JOIN fact_engagement_events e
    ON e.class_instance_key = ci.class_instance_key
    AND e.student_key = b.student_key
    AND e.event_ts >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY ci.class_instance_key
```

---

### Trade-offs and alternatives

**Course plus class instance with enrollment bridge**

Reusable curriculum, accurate denominators, clean teacher attribution. Cost: two dimensions and one bridge to maintain.

**Single flat classes table**

Simpler schema, fewer joins. Cost: a curriculum change rewrites every class row and any dropped student is either invisible or double counted.

---

## Common follow-up questions

- How would you handle a student who transfers between class sections of the same course mid-year? _(Tests whether the bridge close-then-open pattern handles same-course moves.)_
- How do you compute the school district engagement rollup without double-counting students in multiple classes? _(Tests whether the candidate aggregates distinct students at the student level first.)_
- What if impressions flood in at 50k per second per school? _(Tests whether the candidate partitions fact_engagement_events by event_date.)_
- How would you redact PII when a student's guardian requests a data deletion? _(Tests FERPA-style deletion cascades on student_key.)_

## Related

- [All practice problems](https://datadriven.io/problems)
- [Mock interview mode](https://datadriven.io/interview/edtech_classroom_engagement_schema)
- [Data Modeling Interview Questions](https://datadriven.io/data-modeling-interview-questions)
- [Data Engineering Interview Prep Guide](https://datadriven.io/data-engineer-interview-prep)
- [Daily Challenge](https://datadriven.io/daily)

---

Source: DataDriven (https://datadriven.io). 100% free data engineering interview prep. Live code execution against Postgres 16, Python 3.11, and Spark sandboxes. No paywall, no premium tier, no signup gate.