SQL 10. Managing Tables & Databases

In previous posts, we focused on querying tables. In this post, we’ll explore data definition in SQL: how to create, modify, and delete database objects like tables, constraints, and databases themselves. Mastering these operations shows that you not only know how to query data, but also how to design and manage the structures that hold it. We’ll cover: Data types Constraints Primary and foreign keys Creating and altering tables and databases We’ll cover views, indexes, and data manipulation (INSERT, UPDATE, DELETE) in separate posts. ...

June 6, 2025 · map[name:Minjun Jeon]

Big Data 6. Slowly Changing Dimensions (SCD)

Slowly Changing Dimensions (SCD) Slowly Changing Dimensions (SCDs) refer to attributes in a data warehouse dimension table that change infrequently—yet unpredictably—over time. When a source system updates a dimension attribute, we need a strategy to decide how that change will be reflected (or not) in the data warehouse. In some cases, storing historical values isn’t necessary; in others, retaining a full history is critical for reporting, auditing, or trend analysis. ...

June 5, 2025 · map[name:Minjun Jeon]

Big Data 3. Data Warehouses, Data Lakes, and Lake Houses

1. Data Warehouses A Data Warehouse (DW) is a centralized repository that aggregates data from multiple sources into a unified, consistent store optimized for analytics and business intelligence (BI) activities, such as reporting, forecasting, and data mining. Unlike transactional databases (OLTP), data warehouses are built for analytical workloads (OLAP): they support complex queries over large volumes of data, often using denormalized schemas, dimensions, and facts for performance. Data warehouses typically use: ...

June 4, 2025 · map[name:Minjun Jeon]

Big Data 4. Denormalized Schema in Data Warehouses

1. Denormalized Schema A denormalized schema is a database design that reduces the number of joins by combining related data into fewer tables, often at the expense of some redundancy. Optimized for read-heavy analytical workloads Not ideal for OLTP systems (which benefit from normalization) In a typical data warehouse: Data is modeled to maximize query performance and simplify analytics. This contrasts with normalized schemas used in transactional databases, where the goal is data integrity and avoiding redundancy. 2. Star and Snowflake Schemas Two common denormalized designs are: ...

June 4, 2025 · map[name:Minjun Jeon]

Big Data 5. Modern Data Stack

As the cost of cloud storage and compute continues to fall and internet speeds rise, data engineers are rethinking how data is stored, processed, and analyzed. This shift has led to what we now call the modern data stack: a flexible, scalable, cloud-native approach that replaces traditional monolithic systems. In this post, we’ll explore key architectural changes that enabled the modern data stack, from the transition from SMP to MPP, to the decoupling of storage and compute, and the rise of columnar databases, ELT workflows, and the tooling ecosystem that glues it all together. ...

June 4, 2025 · map[name:Minjun Jeon]

Big Data 1. Data Maturity Model

1. Intro The Maslow’s hierarchy of needs models the basic needs of humans as a hierarchy of 5 levels from the bottom to higher-level needs at the top. In this model, one cannot proceed to the next level unless the basic needs of the current level is met. Silmiarly, one can think of data hierarchy of needs. The 5 levels are Data Collection, Data Wrangling, Data Integration, BI and Analytics, and Artificail Intellgence. ...

June 3, 2025 · map[name:Minjun Jeon]

SQL 9. Union and Subqueries

1. UNION/UNION ALL In contrast to JOINS, UNION combines rows instead of columns. a) Basic Syntax The syntax is as follows: SELECT column_name1, column_name2 FROM table_name1 UNION SELECT column_name3, column_name4 FROM table_name2 The number of columns from the first table should match the number of columns from the second table. Columns are matched by the order. For example column_name1 and column_name3 don’t need to be identitcal. The merged table will have column names of the first table, i.e. column_name1 and column_name2. ...

June 3, 2025 · map[name:Minjun Jeon]

SQL 8. Joins

JOIN is used to combine information from multiple tables in one query. There are four types of JOINs: INNER JOIN, OUTER JOIN, LEFT JOIN, and RIGHT JOIN. When combining two tables, you need one common column, so called reference column or join column. 1. INNER JOIN INNER JOIN combines two tables only for the rows where the value in the join column occurs in both tables. If there are m matching rows in the first table and n matching rows in the second table for the join column value, the result will contain m × n rows for that value. ...

June 2, 2025 · map[name:Minjun Jeon]

SQL 4. String Functions

In this post, we are going to look at basic string functions in SQL. 1. UPPER, LOWER, LENGTH UPPER changes all lowercase alphabets to uppercase. LOWER changes all uppercase letters to lowercase. Finally, LENGTH gives the number of characters. The example is as follows: SELECT UPPER(first_name) LOWER(last_name) FROM customer WHERE LENGTH(first_name) > 10 The above code outputs the first names in uppercase and last names in lowercase from the table customer where the length of the first name is more than 10. ...

June 1, 2025 · map[name:Minjun Jeon]

SQL 5. Date and Time Functions

In this post, we are going to look at basic date and time functions in SQL. Before we dive deeper into the date and time functions, let’s look at standard date/time types. 0. Data/Time types Types Description Examples date Just date without time ‘2025-06-01’ time (with/without time zone) Just time without date ‘01:02:03.678+02’ timestamp (with/without time zone) Date and time ‘2022-11-28 01:02:03.678+02’ intervals Time interval ‘3 days 01:02:03.678’ 1. EXTRACT EXTRACT is used to extract parts of timestamp/date. The syntax is as follows: ...

June 1, 2025 · map[name:Minjun Jeon]