2. Array/List

Arrays, more commonly known as lists in Python, are one of the most fundamental and widely used data structures. They store items in a sequential, ordered manner, and support a wide range of operations efficiently. This post is designed as a refresher for interviews or technical review, with code examples, time complexity analysis, and clarifications on memory behavior. 1. What is an Array (List)? In Python, the built-in list type is a dynamic array implementation. Arrays: ...

June 5, 2025 · map[name:Minjun Jeon]

Big Data 6. Slowly Changing Dimensions (SCD)

Slowly Changing Dimensions (SCD) Slowly Changing Dimensions (SCDs) refer to attributes in a data warehouse dimension table that change infrequently—yet unpredictably—over time. When a source system updates a dimension attribute, we need a strategy to decide how that change will be reflected (or not) in the data warehouse. In some cases, storing historical values isn’t necessary; in others, retaining a full history is critical for reporting, auditing, or trend analysis. ...

June 5, 2025 · map[name:Minjun Jeon]

Big Data 3. Data Warehouses, Data Lakes, and Lake Houses

1. Data Warehouses A Data Warehouse (DW) is a centralized repository that aggregates data from multiple sources into a unified, consistent store optimized for analytics and business intelligence (BI) activities, such as reporting, forecasting, and data mining. Unlike transactional databases (OLTP), data warehouses are built for analytical workloads (OLAP): they support complex queries over large volumes of data, often using denormalized schemas, dimensions, and facts for performance. Data warehouses typically use: ...

June 4, 2025 · map[name:Minjun Jeon]

Big Data 4. Denormalized Schema in Data Warehouses

1. Denormalized Schema A denormalized schema is a database design that reduces the number of joins by combining related data into fewer tables, often at the expense of some redundancy. Optimized for read-heavy analytical workloads Not ideal for OLTP systems (which benefit from normalization) In a typical data warehouse: Data is modeled to maximize query performance and simplify analytics. This contrasts with normalized schemas used in transactional databases, where the goal is data integrity and avoiding redundancy. 2. Star and Snowflake Schemas Two common denormalized designs are: ...

June 4, 2025 · map[name:Minjun Jeon]

Big Data 5. Modern Data Stack

As the cost of cloud storage and compute continues to fall and internet speeds rise, data engineers are rethinking how data is stored, processed, and analyzed. This shift has led to what we now call the modern data stack: a flexible, scalable, cloud-native approach that replaces traditional monolithic systems. In this post, we’ll explore key architectural changes that enabled the modern data stack, from the transition from SMP to MPP, to the decoupling of storage and compute, and the rise of columnar databases, ELT workflows, and the tooling ecosystem that glues it all together. ...

June 4, 2025 · map[name:Minjun Jeon]

Big Data 1. Data Maturity Model

1. Intro The Maslow’s hierarchy of needs models the basic needs of humans as a hierarchy of 5 levels from the bottom to higher-level needs at the top. In this model, one cannot proceed to the next level unless the basic needs of the current level is met. Silmiarly, one can think of data hierarchy of needs. The 5 levels are Data Collection, Data Wrangling, Data Integration, BI and Analytics, and Artificail Intellgence. ...

June 3, 2025 · map[name:Minjun Jeon]

1. Big O asymptotic analysis

1. What is good code? Readability : How easily humans can understand, modify, and maintain a piece of code. Scalability: How well the code can handle increased load and complexity. Big O, a shorthand for Big O asymptotic analysis, measures the scalability of code. Scalability consists of runtime and memory. There is usally a trade-off between runtime and memory. In this post, we introduce the Big O notation in terms of runtime and look at how to measure the space complexity of an algorithm at the end. ...

June 2, 2025 · map[name:Minjun Jeon]

SQL 1. Basics

Here, we go through basic SQL commands: 1. SELECT SELECT is used to select and return data. For a single column, the syntax is as below: SELECT column_name FROM table_name1 For multiple columns, you just list the column names with ,. SELECT column_name1, column_name2 FROM table_name1 To select all columns, use *. SELECT * FROM table_name1 Note that the exact format doesn’t matter, e.g. the two examples are the same: ...

May 29, 2025 · map[name:Minjun Jeon]