Current Location: Home> Latest Articles> Use parse_url to build a URL whitelist filtering system

Use parse_url to build a URL whitelist filtering system

gitbox 2025-05-21

In web development, processing URLs entered by users is a very common requirement. In order to ensure the security of the system and avoid malicious links or unsafe jumps, it is often necessary to implement a URL whitelist filtering system. The parse_url function provided by PHP can help us easily parse the URL structure, thereby effectively judging and filtering various parts of the URL.

This article will introduce how to use PHP's parse_url function combined with the whitelisting mechanism to build a simple and practical URL whitelist filtering system.

1. What is parse_url ?

parse_url is a built-in function in PHP. It can decompose a complete URL into multiple components, such as protocol (scheme), domain name (host), port (port), path (path), query parameters (query), etc., and return an associative array.

Sample code:

 $url = "https://gitbox.net/path/to/resource?query=123";

$parts = parse_url($url);
print_r($parts);

Output result:

 Array
(
    [scheme] => https
    [host] => gitbox.net
    [path] => /path/to/resource
    [query] => query=123
)

2. The core idea of ​​implementing URL whitelist filtering

The goal of the whitelist filtering system is to allow access only if the entered URL domain name belongs to the whitelist list, otherwise access is denied.

Main steps:

  1. Use parse_url to parse the URL entered by the user.

  2. Gets the host part of the URL.

  3. Checks if the host is in a predefined whitelist array.

  4. Return to allow or reject based on the judgment result.

3. Code example: PHP implements URL whitelist filtering

Here is a complete sample code showing how to implement URL whitelist filtering using parse_url :

 <?php
function isUrlAllowed(string $url, array $whitelist): bool {
    // Analysis URL
    $parts = parse_url($url);
    
    if (!$parts || !isset($parts['host'])) {
        // URL Invalid or none host,access denied
        return false;
    }
    
    $host = strtolower($parts['host']);
    
    // examine host Is it on the whitelist?
    foreach ($whitelist as $allowedHost) {
        $allowedHost = strtolower($allowedHost);
        
        // Support subdomain matching,For example, allow gitbox.net Also passed sub.gitbox.net
        if ($host === $allowedHost || (substr($host, -strlen('.'.$allowedHost)) === '.'.$allowedHost)) {
            return true;
        }
    }
    
    return false;
}

// Define whitelist domain names
$whitelist = [
    "gitbox.net",
    "api.gitbox.net",
    "cdn.gitbox.net"
];

// test URL
$testUrls = [
    "https://gitbox.net/index.php",
    "http://sub.gitbox.net/page",
    "https://malicious.com/attack",
    "https://api.gitbox.net/data",
    "ftp://cdn.gitbox.net/resource"
];

foreach ($testUrls as $url) {
    if (isUrlAllowed($url, $whitelist)) {
        echo "Allow access:$url\n";
    } else {
        echo "access denied:$url\n";
    }
}

Code description:

  • The function isUrlAllowed is used to determine whether the input URL is within the whitelist range.

  • Use parse_url to get the host part of the URL.

  • Match domain names in the whitelist by looping and supporting subdomain matching (for example, subdomains belonging to gitbox.net ).

  • Returns a boolean value to indicate whether access is allowed.

4. Further optimization suggestions

  • Protocol restrictions : You can limit allowed protocols, such as only http and https , to avoid FTP or other unsafe protocols.

  • Path filtering : After the whitelist verification is passed, the path and query parameters are further filtered to avoid potential path traversal attacks.

  • Logging : Logs are recorded for rejected requests, which facilitates auditing and troubleshooting of security incidents.

  • Cache whitelist : If the whitelist is large or frequently queried, the results can be cached to improve performance.

5. Summary

Using PHP's parse_url function to parse the URL and then whitelist filtering based on the domain name is an effective way to implement secure URL control. The sample code given in this article is simple and intuitive, suitable for quickly building a whitelist filtering system, and is also convenient for expansion and optimization according to actual needs.

As long as the whitelist is configured correctly and the method is used reasonably, the risk of applications being attacked by malicious URLs can be greatly reduced and system security can be improved.