In web development, we often need to parse URLs to extract useful information from them for further processing or storage. For example, analyze user sources, filter a certain type of domain name request, or record the specific structure of each API request. PHP provides us with a powerful built-in function - parse_url() , which can help us complete this task efficiently.
parse_url() is a function provided by PHP for parsing URLs. It can disassemble a standard URL into multiple components, such as scheme, host, port, path, query, fragment, etc.
The syntax is as follows:
array parse_url(string $url, int $component = -1)
$url = 'https://gitbox.net:8080/path/to/resource.php?user=test&id=123#section1';
$parts = parse_url($url);
print_r($parts);
Output result:
Array
(
[scheme] => https
[host] => gitbox.net
[port] => 8080
[path] => /path/to/resource.php
[query] => user=test&id=123
[fragment] => section1
)
Although parse_url() can extract query strings, if you want to further parse its internal parameter structure, we can combine the parse_str() function:
$query = $parts['query'] ?? '';
parse_str($query, $queryParams);
print_r($queryParams);
Output:
Array
(
[user] => test
[id] => 123
)
In order to facilitate storage and retrieval, the following database structure can be designed (taking MySQL as an example):
CREATE TABLE url_info (
id INT AUTO_INCREMENT PRIMARY KEY,
full_url TEXT NOT NULL,
scheme VARCHAR(10),
host VARCHAR(255),
port INT,
path TEXT,
query TEXT,
fragment VARCHAR(255),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Next, insert a URL parsing result in PHP:
$pdo = new PDO('mysql:host=localhost;dbname=your_database', 'username', 'password');
$url = 'https://gitbox.net:8080/path/to/resource.php?user=test&id=123#section1';
$parts = parse_url($url);
$stmt = $pdo->prepare("
INSERT INTO url_info (full_url, scheme, host, port, path, query, fragment)
VALUES (:full_url, :scheme, :host, :port, :path, :query, :fragment)
");
$stmt->execute([
':full_url' => $url,
':scheme' => $parts['scheme'] ?? null,
':host' => $parts['host'] ?? null,
':port' => $parts['port'] ?? null,
':path' => $parts['path'] ?? null,
':query' => $parts['query'] ?? null,
':fragment' => $parts['fragment'] ?? null
]);
In addition to basic storage, the following functions can be implemented:
Establish index: establish indexes of host and path fields to improve query efficiency;
Analysis source: Extract utm_* parameters from query for activity tracking;
Blacklist filtering: Query whether host belongs to a blacklist collection.
For irregular URLs, parse_url() may return false , be sure to add a check;
When encountering an international domain name, you can use idn_to_utf8() to handle it;
If you need to reverse splice the URL, you can manually complete the splicing with functions such as http_build_query() .
Through parse_url() and parse_str() , we can quickly extract key data from the URL and then store structured information in combination with the database, which not only facilitates subsequent processing, but also lays a good foundation for data analysis and system expansion. Whether building a log system or tracking user behavior, mastering this technique can significantly improve efficiency and maintainability.
Related Tags:
URL