Loading lesson...
Text is the language of databases
Making text behave since day one
Topics covered: String Data Type, CONCAT(), LENGTH(), LOWER(), UPPER()
A string is a data type that holds text. Strings are enclosed in single quotes in SQL. The text inside can contain letters, numbers, spaces, punctuation, or special characters. A string could be as short as one character or as long as several megabytes of text. When you see values like 'Alice', 'alice@email.com', or 'San Francisco' in a database, those are strings. The single quotes tell the database "this is text, not a keyword or number." Before manipulating strings, it helps to understand how
Basic Concatenation The most common use case is combining first and last names into a full name: The query takes first_name, adds a space ' ', then adds last_name. The space is a literal string argument. Without it, you would get 'AliceChen' instead of 'Alice Chen'. Building Complex Strings Advanced Usage Multiple Arguments Handling Empty Values Best Practices When to Use CONCAT() At scale, even simple string operations can become performance bottlenecks.
Basic Usage Measuring string length is useful for validation, filtering, and analysis. Filtering by Length Data Quality Checks Finding Empty or NULL LENGTH vs Empty Check Analysis & Best Practices Aggregate Length Analysis This query calculates the average length of comments and finds the longest one. Such analysis helps you understand data distribution, set appropriate column sizes, or identify unusually long entries. When to Use LENGTH()
Basic Normalization All email addresses are now lowercase. This ensures consistency and makes comparisons reliable. Without normalization, 'Alice@Email.com' and 'alice@email.com' would be treated as different values. Case-Insensitive Compare By converting both sides of a comparison to lowercase, you can match strings regardless of case: Data Handling Preserving Original Data Non-Letter Characters Best Practices Case normalization is one of the most common data cleaning steps. Applying it consist
Display Formatting Status values are converted to uppercase for consistent display. This formatting makes labels stand out and ensures uniformity regardless of how the data was entered. Normalizing Identifiers Product codes, order numbers, and other identifiers are often standardized as uppercase: All codes are now uppercase, ensuring consistency for comparisons, joins, and reporting. This normalization prevents mismatches caused by case differences. Advanced Usage Case-Insensitive Match UPPER()