(+91) 7 999 01 02 03

Suraz Big Data Program Case Study

Ayush Dixit
15 Posts


Suraz seriously loves teaching Big data. At the beginning of 2022, he decides to embark on a learning venture and open an Online Learning Centre where he taught his 3 favorite courses: Hadoop, Spark, and Scala.

Suraz’s Learning venture is in need of your assistance to help the Online Learning Centre stay afloat - the Learning venture has captured some very basic data from its few months of operation but has no idea how to use their data to help them run the courses.


Suraz wants to use the data to answer a few simple questions about his students, especially about their learning patterns, how much money and time they’ve spent, and also which courses are their favorite. Having this deeper connection with his students will help him deliver a better and more personalized experience for his loyal students.

He plans on using these insights to help him decide whether he should expand the existing student referral program - additionally, he needs help to generate some basic datasets so his team can easily inspect the data without needing to use SQL.

Suraz has provided you with a sample of his overall student data due to privacy issues - but he hopes that these examples are enough for you to write fully functioning SQL queries to help him answer his questions!


Suraz has shared with you 3 key datasets for this case study:

  • sales
  • courses
  • students

You can inspect the entity-relationship diagram and example data below.


All datasets exist within the suraz_bigdata_program database schema - be sure to include this reference within your SQL scripts as you start exploring the data and answering the case study questions.

The sales table captures all student_id level purchases with a corresponding order_date and course_id information for when and what menu items were ordered.

student_id order_date course_id
ABC 2022-01-01 1
ABC 2022-01-01 2
ABC 2022-01-07 2
ABC 2022-01-10 3
ABC 2022-01-11 3
ABC 2022-01-11 3
BCD 2022-01-01 2
BCD 2022-01-02 2
BCD 2022-01-04 1
BCD 2022-01-11 1
BCD 2022-01-16 3
BCD 2022-02-01 3
CDE 2022-01-01 3
CDE 2022-01-01 3
CDE 2022-01-07 3


Table2: course

The course table maps the course_id to the actual course_name and price of each course item.


course_id course_name price
1 Hadoop 12000
2 Spark 15000
3 Scala 15000


Table 3: student

The final student table captures the joining_date when a student_id joined any course of Suraz’s Big Data program.

student_id joining_date
ABC 2022-01-07
BCD 2022-01-09


Interactive SQL Scripts

Below are SQL scripts to create the required data sets:

CREATE SCHEMA suraz_bigdata_program;

"stdent_id" VARCHAR(5),
"order_date" DATE,
"course_id" INTEGER

("stdent_id", "order_date", "course_id")
('ABC', '2022-01-01', '1'),
('ABC', '2022-01-01', '2'),
('ABC', '2022-01-07', '2'),
('ABC', '2022-01-10', '3'),
('ABC', '2022-01-11', '3'),
('ABC', '2022-01-11', '3'),
('BCD', '2022-01-01', '2'),
('BCD', '2022-01-02', '2'),
('BCD', '2022-01-04', '1'),
('BCD', '2022-01-11', '1'),
('BCD', '2022-01-16', '3'),
('BCD', '2022-02-01', '3'),
('CDE', '2022-01-01', '3'),
('CDE', '2022-01-01', '3'),
('CDE', '2022-01-07', '3');

"course_id" INTEGER,
"course_name" VARCHAR(10),
"price" INTEGER

("course_id", "course_name", "price")
('1', 'Hadoop', '12000'),
('2', 'Spark', '15000'),
('3', 'Scala', '15000');

CREATE TABLE student (
"student_id" VARCHAR(5),
"joining_date" DATE

("student_id", "joining_date")
('ABC', '2022-01-07'),
('BCD', '2022-01-09');


Each of the following case study questions can be answered using a single SQL statement:

  1. What is the total amount each student spent in the BigDataProgram?
  2. How many days has each student visited the BigDataProgram?
  3. What was the first item from the course purchased by each student?
  4. What is the most purchased course from the courses and how many times was it purchased by all students?
  5. Which course was the most popular for each student?
  6. Which course was purchased first by the user after they became students?
  7. Which course was purchased just before the user became a student?
  8. What are the total course and amounts spent for each user before they became a student?
  9. If each RS1000 spent equates to 10 points and Hadoop has a 2x points multiplier - how many points would each student have?
  10. In the first week after a student joins the program (including their joining date) they earn 2x points on all items, not just Hadoop - how many points do students ABC and BCD have at the end of January?


If you’d like to use this case study for one of your portfolio projects or in a personal blog post - please remember to link back to this URL also don’t forget to share some LinkedIn updates using the #SQLChallenge hashtag and remember to tag me!


I really hope you enjoyed this fun little case study - it definitely was fun for me to create! 






Published By : Ayush Dixit
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.