String Functions: Advanced

Yelp hosts over 250 million reviews, and its data team uses advanced SQL string functions to extract structured insights from that ocean of unstructured text. By applying string aggregation and pattern matching across review corpora, they identify recurring complaint phrases, surface the dishes mentioned most often in five-star reviews, and detect sentiment trends that feed directly into their rating algorithm. What appears to users as a simple star score is actually the product of large-scale text analytics running over millions of free-form paragraphs. The advanced string techniques you are about to learn are the foundation for that kind of analysis.

SPLIT_TO_MAP()

Daily Life

Interviews

Parse key-value pairs from raw text

SPLIT_TO_MAP parses a string containing key-value pairs into a map data structure. The function takes three arguments: the string to parse, the delimiter between pairs, and the delimiter between keys and values. Once parsed, you can access individual values by their keys using bracket notation.

The syntax is SPLIT_TO_MAP(string, entry_delimiter, key_value_delimiter). The entry_delimiter separates different key-value pairs (commonly "&" or ","). The key_value_delimiter separates each key from its value (commonly "=" or ":"). The result is a MAP type that supports key-based lookups.

Basic Syntax

SPLIT_TO_MAP works with any consistently delimited key-value string. The most common patterns are URL query strings and configuration entries.

Parsing Query Strings

URL query strings are the most common use case. They contain parameters in key=value format separated by ampersands:

request_id	query_params
1	user_id=42&action=view&page=home
2	user_id=108&action=edit&page=settings
3	user_id=7&action=delete&page=profile

	SELECT
	request_id,
	SPLIT_TO_MAP(query_params, '&', '=') AS params_map,
	SPLIT_TO_MAP(query_params, '&', '=') 'user_id' AS user_id,
	SPLIT_TO_MAP(query_params, '&', '=') 'action' AS action
	FROM api_calls

Result

request_id	params_map	user_id	action
1	{user_id=42, action=view, page=home}	42	view
2	{user_id=108, action=edit, page=settings}	108	edit
3	{user_id=7, action=delete, page=profile}	7	delete

SPLIT_TO_MAP parses the query string into a map. The bracket notation ['user_id'] extracts the value for that specific key. This converts unstructured text into structured, queryable data without complex string manipulation.

Parsing Config Strings

Configuration data often arrives as delimited strings from external systems. SPLIT_TO_MAP handles various delimiter formats:

	SELECT
	config_id,
	SPLIT_TO_MAP(settings, ',', ':') 'timeout' AS timeout_ms,
	SPLIT_TO_MAP(settings, ',', ':') 'retry' AS retry_count
	FROM app_configs

Result

config_id	timeout_ms	retry_count
prod	5000	3
dev	30000	1
staging	10000	2

Here the entry delimiter is a comma and the key-value delimiter is a colon. The function adapts to whatever format your source data uses. You define the parsing rules, and SPLIT_TO_MAP applies them consistently.

Delimiter Format Variants

The same function handles different delimiter conventions. The key is matching the delimiters to your data format. Common patterns include URL-style (&/=), config-style (,/:), and pipe-style (|/=).

TIP

Accessing a non-existent key returns NULL, not an error. Always validate that expected keys exist before building logic on their values. Use COALESCE() to provide default values for missing keys.

Advanced Usage

Beyond simple extraction, SPLIT_TO_MAP results can be used directly in queries for filtering, joining, and aggregation.

Filtering by Map Values

Once parsed, map values can be used in WHERE clauses for filtering:

	SELECT
	request_id,
	query_params
	FROM api_calls
	WHERE SPLIT_TO_MAP(query_params, '&', '=') 'action' = 'delete'

Result

request_id	query_params
3	user_id=7&action=delete&page=profile

This query finds all requests where the action parameter equals "delete". The map extraction happens inline, turning a text search problem into a structured data query.

Best Practices

Choosing the right tool matters. SPLIT_TO_MAP shines with structured key-value data but is not a replacement for a proper JSON parser.

When to Use SPLIT_TO_MAP()

SPLIT_TO_MAP is particularly useful when working with serialized key-value data stored as strings.

URL Query Strings

Parse parameters from web logs and analytics data

Configuration Values

Extract settings from serialized config strings

External API Data

Process key-value pairs from API responses

Embedded Filters

Filter records based on embedded parameter values

Understanding the execution model helps you predict behavior with duplicate keys and malformed input.

How SPLIT_TO_MAP() Executes

SPLIT_TO_MAP scans the string once, splitting on delimiters
Returns empty map if input is empty; returns NULL if input is NULL
Duplicate keys result in the last value winning; key lookup is case-sensitive

This function is heavily used in analytics pipelines that process URL-encoded or parameter-style data.

> Complete this query to parse a URL query string into a map and extract the action parameter.

SELECT
  call_id,
  ___(endpoint,  ___,  ___) 'action' AS action
FROM api_calls

SPLIT_TO_MAP

'&'

'='

SPLIT

','

SPLIT_TO_MAP converts flat key-value strings into queryable maps, turning text parsing into structured data access.

The entry delimiter separates individual key-value pairs, while the key-value delimiter separates each key from its value.

Accessing a non-existent key in the resulting map returns NULL, not an error. Use COALESCE to provide defaults for missing keys.

String Building with CONCAT and LPAD()

Daily Life

Interviews

Format strings with padding and alignment

CONCAT combines multiple strings into a single result. Combined with LPAD for zero-padding and CAST for number formatting, you can build formatted strings with precise control over their appearance. This provides the flexibility to create display strings, report labels, and standardized identifiers directly within SQL queries.

The syntax is CONCAT(value1, value2, value3, ...). The LPAD(string, length, pad_char) function pads a string to a fixed width by prepending pad characters. ROUND(number, decimals) rounds to a specific number of decimal places.

Building Dynamic Strings

CONCAT is the foundation for building formatted output. It assembles literal text and column values in any order.

	SELECT
	product_name,
	price,
	CONCAT('$', ROUND(price, 2)) AS display_price
	FROM products

Result

product_name	price	display_price
Keyboard	89.9	$89.9
Monitor	299	$299
Mouse	24.999	$25

CONCAT combines a dollar sign literal with the rounded price. The ROUND function ensures prices display with consistent precision.

Currency Formatting

ROUND ensures monetary values display with exactly two decimal places. Combine it with CONCAT and a dollar sign prefix:

	SELECT
	product_name,
	price,
	CONCAT('$', ROUND(price, 2)) AS display_price
	FROM products

Result

product_name	price	display_price
Keyboard	89.9	$89.90
Monitor	299	$299.00
Mouse	24.999	$25.00

The ROUND function with 2 decimal places ensures consistent formatting. This pattern works reliably across different numeric precisions in your source data.

Zero-Padded Identifiers

LPAD pads a value with leading zeros to create fixed-width identifiers that sort correctly and display consistently:

	SELECT
	order_id,
	CONCAT(
	'ORD-',
	LPAD(
	CAST(order_id AS VARCHAR),
	6,
	'0'
	)
	) AS padded_id
	FROM orders

Result

order_id	padded_id
1	ORD-000001
42	ORD-000042
1000	ORD-001000

LPAD pads the order ID to 6 characters using zeros. Combined with CONCAT, this creates uniform identifiers like ORD-000042.

Building Complex Strings

CONCAT assembles multiple values in order, creating descriptive labels:

	SELECT
	CONCAT(
	customer_name,
	' placed order #',
	CAST(order_id AS VARCHAR),
	' for $',
	ROUND(total, 2)
	) AS order_summary
	FROM orders

Result

order_summary
Alice placed order #1042 for $156.99
Bob placed order #1043 for $89.50
Carol placed order #1044 for $234.00

Multiple CONCAT arguments combine text literals with column values. CAST converts integers to strings so CONCAT can join them.

Best Practices

Building formatted strings in SQL is powerful, but readability matters. Use clear variable names and keep string operations simple.

When to Use String Building

CONCAT with LPAD and ROUND excels at producing human-readable output with consistent styling.

CurrencyZero-PadFilenamesSummariesReports

Currency

Price Display

Round + CONCAT prefix

Zero-Pad

Padded IDs

LPAD for fixed width

Filenames

Name Patterns

CONCAT + date parts

Summaries

Data Labels

Multi-value CONCAT

Reports

Number Format

ROUND for decimals

Always CAST numeric values to VARCHAR before concatenating. Mixing types in CONCAT requires explicit conversion.

CONCAT with NULL values produces NULL output. If any input is NULL, the result is NULL. Use COALESCE to provide default values for potentially NULL columns.

> Complete this query to format a price as a dollar amount with two decimal places.

SELECT
  product_name,
  ___('$', ROUND(___, 2)) AS display_price
FROM products

CONCAT

price

FORMAT

total

CONCAT combines values in the order provided. For numeric formatting, use ROUND to control decimal places.

Always CAST non-string values to VARCHAR when concatenating. This ensures consistent behavior across different data types.

LPAD pads strings to fixed widths using a specified character, ideal for creating sortable identifiers.

TO_BASE64() / FROM_BASE64()

Daily Life

Interviews

Encode and decode binary data as text

Base64 encoding converts binary data into a text representation using 64 printable ASCII characters. TO_BASE64 encodes binary data to Base64 text. FROM_BASE64 decodes Base64 text back to binary. These functions enable safe storage and transmission of binary data through text-only channels.

The syntax is TO_BASE64(varbinary) for encoding and FROM_BASE64(varchar) for decoding. Base64 uses characters A-Z, a-z, 0-9, plus (+), and slash (/). The encoded output is approximately 33% larger than the original binary because every 3 bytes become 4 characters.

Basic Concepts

Base64 solves a fundamental problem: binary data contains bytes that are not valid text characters. Attempting to store raw binary in a text field corrupts the data. Base64 transforms binary into safe text that survives any text processing.

Original Text	Base64 Encoded
Hello	SGVsbG8=
SQL	U1FM
Data!	RGF0YSE=

The = characters at the end are padding. Base64 works in groups of 4 characters, and padding fills incomplete groups. The encoded result contains only letters, numbers, plus, slash, and equals.

Encoding & Decoding

To encode a text string, first convert it to binary using TO_UTF8, then encode to Base64:

	SELECT
	message,
	TO_BASE64(TO_UTF8(message)) AS encoded
	FROM messages

Result

message	encoded
Secret data	U2VjcmV0IGRhdGE=
API key: xyz	QVBJIGtleTog eHl6
Password123	UGFzc3dvcmQxMjM=

TO_UTF8 converts the string to bytes, then TO_BASE64 converts those bytes to Base64 text. This two-step process is required because TO_BASE64 expects binary input, not string input.

Decoding Base64 to Text

Decoding reverses the process: FROM_BASE64 produces binary, then FROM_UTF8 converts to text:

Example: FROM_UTF8(FROM_BASE64('SGVsbG8gV29ybGQ=')) returns 'Hello World'. The decoding chain must reverse the encoding chain exactly.

TIP

Base64 is encoding, not encryption. Anyone can decode Base64 data. It provides safe transmission but zero confidentiality. Never use Base64 as a security measure.

Best Practices

Many APIs require Base64-encoded payloads or return Base64-encoded responses:

	SELECT
	api_response_id,
	FROM_UTF8(
	FROM_BASE64(response_body)
	) AS decoded_response
	FROM api_calls
	WHERE response_encoded = TRUE

Result

api_response_id	decoded_response
resp_001	{"status": "success", "data": [...]}
resp_002	{"status": "error", "code": 404}

Log tables often store API responses in Base64 to handle binary content safely. This query decodes them for analysis. The same pattern works for request bodies, file contents, or any binary data stored as text.

Base64 Function Uses

TO_BASE64 and FROM_BASE64 serve opposite purposes in data pipelines, often used at system boundaries.

TO_BASE64() Uses

Store binary in text columns
Transmit via text protocols
Prepare data for APIs
Encode file contents

The reverse operation covers the decoding side of the same pipeline:

FROM_BASE64() Uses

Decode API responses
Read stored binary data
Process encoded payloads
Restore original content

Base64 encoding expands data by approximately 33%, trading size for text-safe representation.

How Base64 Executes

TO_BASE64 processes 3 bytes at a time, outputting 4 characters
FROM_BASE64 reverses: 4 characters to 3 bytes; invalid input causes errors
NULL input produces NULL output; 33% size overhead

Base64 encoding is ubiquitous in modern web architectures.

> Complete this query to decode a Base64-encoded API response back to readable text.

SELECT
  call_id,
  ___(___(err_msg)) AS decoded
FROM api_calls

FROM_BASE64

FROM_UTF8

TO_BASE64

TO_UTF8

The decoding chain must reverse the encoding chain exactly: FROM_BASE64 produces bytes, FROM_UTF8 interprets those bytes as text.

Base64 is encoding, not encryption. Anyone can decode Base64 data, so it provides zero confidentiality.

Base64-encoded data is approximately 33% larger than the original because every 3 bytes become 4 Base64 characters.

TO_HEX() / FROM_HEX()

Daily Life

Interviews

Convert between hex and readable formats

Hexadecimal (hex) encoding represents binary data using characters 0-9 and A-F. TO_HEX converts binary data to a hexadecimal string. FROM_HEX converts a hexadecimal string back to binary. Each byte becomes exactly two hex characters, making hex useful for debugging and byte-level inspection.

The syntax is TO_HEX(varbinary) for encoding and FROM_HEX(varchar) for decoding. Hex encoding doubles the size of the data (each byte becomes two characters) but provides a direct byte-by-byte representation that is easy to read and manipulate.

Basic Concepts

Hexadecimal uses 16 symbols: 0-9 represent values 0-9, and A-F represent values 10-15. Two hex digits represent one byte (values 0-255).

Character	ASCII Value	Hex Value
A	65	41
B	66	42
Z	90	5A
a	97	61
0	48	30

The letter "A" has ASCII value 65, which is 41 in hexadecimal (4*16 + 1 = 65). This direct byte-to-hex mapping makes hex ideal for inspecting exactly what bytes a string contains.

Encoding & Decoding

To see the hex representation of text, first convert to bytes with TO_UTF8:

	SELECT
	text_value,
	TO_HEX(TO_UTF8(text_value)) AS hex_bytes
	FROM ab_results

Result

text_value	hex_bytes
Hello	48656C6C6F
SQL	53514C
AB	4142

Each character becomes two hex digits. "Hello" is 5 characters, producing 10 hex digits. Reading the hex, 48 is "H", 65 is "e", and so on. This visibility into the actual bytes helps debug encoding issues.

Decoding Hex to Text

Decoding reverses the hex encoding. FROM_UTF8(FROM_HEX('44617461')) returns 'Data'. The hex string is converted to bytes, then those bytes are interpreted as UTF-8 text.

Best Practices

Hex encoding reveals hidden characters and encoding problems that are invisible in normal display:

	SELECT
	user_input,
	LENGTH(user_input) AS visible_length,
	LENGTH(
	TO_HEX(TO_UTF8(user_input))
	) / 2 AS actual_bytes
	FROM user_data

Result

user_input	visible_length	actual_bytes
Hello	5	5
Hello	6	6
Café	4	5

The word "Cafe" with an accented e has 4 visible characters but 5 bytes because the accented e uses 2 bytes in UTF-8. Hex encoding exposes this difference, helping diagnose why string comparisons fail or lengths seem wrong.

Web colors like #FF5733 are hex representations of RGB values. FF is red (255), 57 is green (87), 33 is blue (51). The same hex encoding principle applies to color representation.

When to Use Hex Functions

Hex encoding is useful when you need to inspect or work with data at the byte level.

Debug Encoding

Inspect byte sequences to diagnose encoding issues

Color Codes

Work with RGB color codes and hash values

Byte Validation

Validate data at the byte level for integrity

Hex output is always exactly twice the length of the input bytes, making size calculations predictable.

TO_HEX processes each byte, outputting two characters. FROM_HEX expects pairs of valid hex characters (0-9, A-F). Hex input is case-insensitive, and the output is exactly double the input size.

> Complete this query to convert text to its hexadecimal byte representation.

SELECT
  kv_value,
  ___(___(kv_value)) AS hex_bytes
FROM kv_store

TO_HEX

TO_UTF8

FROM_HEX

FROM_UTF8

The encoding chain for hex is always text -> TO_UTF8 -> TO_HEX. Each byte becomes exactly two hex characters.

TO_HEX encoding is useful for debugging encoding issues because it reveals the exact bytes stored in a string.

Hex output is always exactly twice the byte length of the input, making size calculations predictable.

TO_UTF8() / FROM_UTF8()

Daily Life

Interviews

Handle character encoding conversions

UTF-8 is the dominant character encoding on the internet. TO_UTF8 converts a string to its UTF-8 byte sequence (varbinary). FROM_UTF8 converts a UTF-8 byte sequence back to a string. These functions bridge between text and binary representations, enabling precise encoding control.

The syntax is TO_UTF8(varchar) to get bytes and FROM_UTF8(varbinary) to get text. UTF-8 uses 1 to 4 bytes per character: ASCII characters use 1 byte, common international characters use 2-3 bytes, and emojis use 4 bytes. This variable-length encoding is efficient and universally supported.

Basic Concepts

Different characters require different numbers of bytes in UTF-8:

UTF-8 Byte Sequences

Hello: 5 characters, 5 bytes (ASCII, 1 byte each)
é (accented e): 1 character, 2 bytes
中 (Chinese): 1 character, 3 bytes
😀 (emoji): 1 character, 4 bytes

ASCII letters use 1 byte each. Accented European characters use 2 bytes. East Asian characters typically use 3 bytes. Emojis use 4 bytes. Understanding this helps predict storage requirements and debug length mismatches.

Encoding & Decoding

TO_UTF8 is required before any byte-level operation like Base64 or hex encoding:

Example: LENGTH(TO_UTF8(text_col)) returns the byte count for a text column, which may differ from character count for non-ASCII text. FROM_UTF8(binary_col) interprets a byte sequence as UTF-8 text.

Chaining Encodings

UTF-8 functions are the bridge between text and binary encodings:

	SELECT
	original_text,
	TO_BASE64(TO_UTF8(original_text)) AS base64,
	FROM_UTF8(
	FROM_BASE64(
	TO_BASE64(TO_UTF8(original_text))
	)
	) AS roundtrip
	FROM content_items

Result

original_text	base64	roundtrip
Hello World	SGVsbG8gV29ybGQ=	Hello World
Test 123	VGVzdCAxMjM=	Test 123

The pattern is always: text -> TO_UTF8 -> bytes -> encoding function. Decoding reverses: encoded -> decoding function -> bytes -> FROM_UTF8 -> text. This chain preserves the original data exactly.

Best Practices

FROM_UTF8 fails on invalid byte sequences. Some databases offer an optional replacement parameter:

FROM_UTF8(binary_data, '') with a second parameter specifies what to substitute for invalid bytes. An empty string removes them. A question mark or similar character makes corruption visible. This is useful when processing data from systems with inconsistent encodings.

TIP

When importing data from external sources, validate encoding early. Converting to UTF-8 and back (round-tripping) reveals corruption. Characters that do not survive the round-trip indicate encoding problems.

UTF-8 Function Uses

UTF-8 functions bridge text and binary representations, essential for encoding chains.

EncodingByte SizeBinaryValidationBridge

Encoding

Byte Conversion

Prepare text for Base64

Byte Size

Storage Check

Measure actual byte count

Binary

Binary Bridge

Convert text to varbinary

Validation

Verify Encoding

Round-trip integrity test

Bridge

Chain Functions

Link text and byte layers

UTF-8 encoding uses variable-width bytes, so international characters require more space than ASCII.

How UTF-8 Functions Execute

TO_UTF8 encodes characters to 1-4 bytes each
FROM_UTF8 decodes UTF-8 bytes; invalid sequences cause errors
NULL input produces NULL output; byte length depends on characters

UTF-8 dominates the modern web and powers international text processing at every major tech company.

These guidelines help you avoid common pitfalls when working with encoding and formatting functions.

✓Do

Always chain TO_UTF8 before TO_BASE64 or TO_HEX for text
Validate encoding integrity with round-trip tests
Use CONCAT and LPAD for consistent number and currency display
Handle missing map keys with COALESCE defaults
Document the encoding chain in complex transformations

✗Don't

Confuse Base64 encoding with encryption (it provides no security)
Assume character count equals byte count for international text
Mix encoding functions without proper conversion steps
Use hardcoded format specifiers that do not match data types
Skip error handling for FROM_UTF8 on untrusted input

Advanced string encoding and transformation techniques require careful practice to master. Put your skills to the test with real-world challenges.

> Complete this query to measure the actual byte size of text data.

SELECT
  kv_value,
  ___(___(kv_value)) AS byte_count
FROM kv_store

LENGTH

TO_UTF8

CHAR_LENGTH

FROM_UTF8

Character count and byte count differ for non-ASCII text. The word Café has 4 characters but 5 bytes because the accented é uses 2 bytes.

LENGTH(TO_UTF8(col)) is the standard way to measure actual storage size, essential for capacity planning with international text.

When importing data from external sources, round-tripping through TO_UTF8 and FROM_UTF8 reveals encoding corruption.

❯❯❯PUTTING IT ALL TOGETHER

> You are a data engineer at Zendesk extracting structured product codes and issue categories from free-text customer complaint fields so that tickets can be automatically routed to the correct support queue.

SPLIT_TO_MAP() parses the URL-encoded metadata appended to each complaint record by the intake API, converting key-value pairs into queryable map fields.

CONCAT() with LPAD() builds zero-padded ticket identifiers and standardized display strings for the routing dashboard using string concatenation.

TO_BASE64() and FROM_BASE64() encode and decode binary attachment payloads stored alongside complaints so they survive transmission through the text-only logging pipeline.

TO_UTF8() converts complaint text to bytes before TO_HEX() inspection, revealing hidden non-printable characters that cause downstream parsing failures in the routing engine.

KEY TAKEAWAYS

SPLIT_TO_MAP parses key-value strings into queryable map structures using customizable delimiters

SPLIT_PART() extracts specific parts from delimited strings using 1-based indexing

CONCAT() with LPAD() builds formatted strings with controlled padding and decimal places

Base64 encoding converts binary to text-safe format with 33% size overhead

Hex encoding shows exact byte values; each byte becomes two hex characters

TO_UTF8 / FROM_UTF8 bridge between text strings and binary byte sequences

Always use TO_UTF8 before TO_BASE64 or TO_HEX when encoding text

Decoding must reverse the encoding chain: FROM_BASE64 then FROM_UTF8

Production-grade text transformations

Encoding, formatting, and production-grade string handling

Category: SQL
Difficulty: advanced
Duration: 20 minutes
Challenges: 0 hands-on challenges

Topics covered: SPLIT_TO_MAP(), String Building with CONCAT and LPAD(), TO_BASE64() / FROM_BASE64(), TO_HEX() / FROM_HEX(), TO_UTF8() / FROM_UTF8()

Lesson Sections

SPLIT_TO_MAP() (concepts: sqlSplitToMap)
Basic Syntax Parsing Query Strings URL query strings are the most common use case. They contain parameters in key=value format separated by ampersands: Parsing Config Strings Delimiter Format Variants The same function handles different delimiter conventions. The key is matching the delimiters to your data format. Common patterns include URL-style (&/=), config-style (,/:), and pipe-style (|/=). Advanced Usage Filtering by Map Values This query finds all requests where the action parameter equal
String Building with CONCAT and LPAD() (concepts: sqlStringBuilding)
Building Dynamic Strings Currency Formatting Zero-Padded Identifiers Building Complex Strings Best Practices When to Use String Building
TO_BASE64() / FROM_BASE64() (concepts: sqlStringBuilding)
Basic Concepts Base64 solves a fundamental problem: binary data contains bytes that are not valid text characters. Attempting to store raw binary in a text field corrupts the data. Base64 transforms binary into safe text that survives any text processing. The = characters at the end are padding. Base64 works in groups of 4 characters, and padding fills incomplete groups. The encoded result contains only letters, numbers, plus, slash, and equals. Encoding & Decoding Decoding Base64 to Text Best P
TO_HEX() / FROM_HEX() (concepts: sqlCast)
Basic Concepts Hexadecimal uses 16 symbols: 0-9 represent values 0-9, and A-F represent values 10-15. Two hex digits represent one byte (values 0-255). The letter "A" has ASCII value 65, which is 41 in hexadecimal (4*16 + 1 = 65). This direct byte-to-hex mapping makes hex ideal for inspecting exactly what bytes a string contains. Encoding & Decoding Each character becomes two hex digits. "Hello" is 5 characters, producing 10 hex digits. Reading the hex, 48 is "H", 65 is "e", and so on. This visi
TO_UTF8() / FROM_UTF8() (concepts: sqlStringBuilding)
Basic Concepts Different characters require different numbers of bytes in UTF-8: ASCII letters use 1 byte each. Accented European characters use 2 bytes. East Asian characters typically use 3 bytes. Emojis use 4 bytes. Understanding this helps predict storage requirements and debug length mismatches. Encoding & Decoding Chaining Encodings UTF-8 functions are the bridge between text and binary encodings: Best Practices UTF-8 Function Uses UTF-8 functions bridge text and binary representations, es