Path Traversal

A Brief Overview

When a web page requires certain resources from the server, it sends an HTTP request containing a resource identifier, which is sometimes a file name. The server then retrieves the resource using the file system API.

For example, a web page may have an <img> element like this:

<img src="loadImage?filename=001.png">

To load the resource, the browser sends an HTTP request to the server:

GET /loadImage?filename=001.png HTTP/1.1
Host: your.server

The server reads the URL, extracts the filename parameter and constructs a path inside the images directory:

from http.server import BaseHTTPRequestHandler, HTTPServer
from urllib.parse import urlparse, parse_qs
import os.path

current_folder = os.path.dirname(os.path.abspath(__file__))

image_folder = os.path.join(current_folder, "images")

class SimpleHander(BaseHTTPRequestHandler):
    def do_GET(self):
        # Parse the URL path and query parameters from the full request pth. urlparse splits the URL into parts (path, query string, etc.)
        parsed = urlparse(self.path)

        # Only handle requests where the path is "/loadImage"
        if parsed.path == "/loadImage":
            # parse_qs extracts query parameters as a dictionary:
            # {'filename': ['001.png']}
            query = parse_qs(parsed.query)

            # Get the first value of filename parameter or None if not present.
            filename = query.get("filename", [None])[0]

            if filename:
                target_path = os.path.join(image_folder, filename)
                print(f"Resolved path: {os.path.abspath(target_path)}\n")
                try:
                    with open(target_path, "rb") as f:
                        # Read all bytes from the file
                        content = f.read()

                    self.send_response(200)
                    self.send_header("Content-Type", "application/octet-stream")
                    self.end_headers()

                    # Send the file content back to the client
                    self.wfile.write(content)
                except FileNotFoundError:
                    self.send_response(404)
                    self.end_headers()
                    self.wfile.write(b"File not found")
        else:
            self.send_response(404)
            self.end_headers()
            self.wfile.write(b"Unknown path")

if __name__ == "__main__":
    server = HTTPServer(("localhost", 8080), SimpleHander)
    print("Serving on http://localhost:8080")
    server.serve_forever()

Because the server simply concatenates the user-supplied filename into the path without validation, an attacker can provide a traversal string such as ../../../server.py , which exposes the server's code, or access sensitive files like /etc/passwd .

curl -v -w "\n---Done---\n" "http://localhost:8080/loadImage?filename=../../../server.py"

Other Vulnerabilities

A straightforward solution might be to check whether the filename contains ../ or .. . However this can still be bypassed by the following attacking vectors:

The os.path.join discards the base directory if the second argument is an absolute path. This means any readable file can be accessed using /loadImage?filename=/path/to/secret. For example, os.path.join(image_folder, "/etc/passwd") resolves to /etc/passwd.
If the server runs on a Windows, path segments are separated by \. Filtering only literal ../ doesn’t prevent Windows-style traversal such as /loadImage?filename=..\\windows\\system.ini.
Sequences like ../ hidden behind symlinks remain exploitable. If an attacker can create or replace files inside the image folder, a symlink inside that directory lets them read outside the intended directory with a perfectly “safe” filename. For example, assume that the attacker can upload a file:
```
ln -s /etc/passwd profile.png
curl -F "file=@profile.png" [https://your-server/upload](https://your-server/upload)
```
and the server saves it naively. They now visit: https://your-server/loadImage?filename=profile.png . So instead of reading /uploads/images/profile.png, they follow the link and read /etc/passwd. Then the server returns that file to the attacker
If the server has unhandled exception, they may bubble up, causing the handler terminates without sending a response. Repeating malformed requests can trigger these exceptions faster than the server can recover, consuming the the request-processing loop or worker slots and effectively denying service to legitimate clients.

How to mitigate

The beset solution is to avoid using user-supplied input to filesystem APIs entirely. However, when that is not possible, the following steps can reliably prevent path traversal:

Define a trusted base directory (image_folder).
Build the candidate path using join.
Resolve it to an absolute, normalized path (realpath).
Verify that the resolved path is still inside the base directory.
Optionally: reject symlinks, enforce allowed extensions, etc.

from http.server import BaseHTTPRequestHandler, HTTPServer
from urllib.parse import urlparse, parse_qs
import os
import os.path

current_folder = os.path.dirname(os.path.abspath(__file__))

# 1. Canonical base directory for images
image_folder = os.path.join(current_folder, "images")
image_folder = os.path.realpath(image_folder)  # normalize & resolve symlinks

class SimpleHander(BaseHTTPRequestHandler):
    def do_GET(self):
        parsed = urlparse(self.path)

        if parsed.path == "/loadImage":
            query = parse_qs(parsed.query)
            filename = query.get("filename", [None])[0]

            if not filename:
                self.send_response(400)
                self.end_headers()
                self.wfile.write(b"Missing filename")
                return

            # 2. (Optional but recommended) Basic filename sanitization
            #    Reject path separators outright, if you don't need subdirectories.
            if os.path.sep in filename or (os.path.altsep and os.path.altsep in filename):
                self.send_response(400)
                self.end_headers()
                self.wfile.write(b"Invalid filename")
                return

            # 3. Construct candidate path inside image_folder
            candidate_path = os.path.join(image_folder, filename)

            # 4. Resolve to absolute real path (normalizes '..', follows symlinks)
            target_path = os.path.realpath(candidate_path)

            print(f"Resolved path: {target_path}\n")

            # 5. Ensure the resolved path is still inside image_folder
            #    This is the key path traversal guard.
            if not target_path.startswith(image_folder + os.sep):
                self.send_response(403)
                self.end_headers()
                self.wfile.write(b"Access denied")
                return

            # 6. (Optional) Reject symlinks explicitly
            try:
                if os.path.islink(target_path):
                    self.send_response(403)
                    self.end_headers()
                    self.wfile.write(b"Symlinks are not allowed")
                    return
            except FileNotFoundError:
                # Fall through to the normal 404 handling below
								pass

            # 7. Finally, open the file safely
            try:
                with open(target_path, "rb") as f:
                    content = f.read()

                self.send_response(200)
                self.send_header("Content-Type", "application/octet-stream")
                self.end_headers()
                self.wfile.write(content)

            except FileNotFoundError:
                self.send_response(404)
                self.end_headers()
                self.wfile.write(b"File not found")

        else:
            self.send_response(404)
            self.end_headers()
            self.wfile.write(b"Unknown path")

if __name__ == "__main__":
    server = HTTPServer(("localhost", 8080), SimpleHander)
    print("Serving on http://localhost:8080")
    server.serve_forever()

Where the Information Got Leaked

While studying this topic, a natural question arises: how do attackers know the structure of my file system well enough to escape the intended directory?

Here are some possible information sources:

URL structure & convention
- Many apps expose predictable paths: /images/27.jpg, /uploads/27.jpg, /static/img/27.jpg. If 27.jpg is requested through /image?file=27.jpg, attackers will assume there’s a base directory like .../images/ or .../uploads/.
- If the endpoint is /image?file=..., an attacker will try common parent folders used by the app (e.g. ../images/27.jpg, ../uploads/27.jpg).
Client-side code and public resources
- HTML, JavaScript, CSS, or config files (even sitemap.xml, robots.txt) may reveal directory names or filenames. Developers often leak resource paths in the front end.
- Inspecting the site (or its GitHub, public repos) often gives exact locations.
Error messages and path disclosure
- Poorly configured servers or frameworks sometimes return error stacks or messages containing file system paths (e.g., /var/www/app/public/images/27.jpg). That immediately reveals the absolute path or at least the directory depth.
- Even 404 pages can reveal whether a path exists or not (different responses for …/images/27.jpg vs …/doesnotexist.jpg).
Directory indexing, sitemap and exposed backups
- If directory listing is enabled, the attacker can see the folder contents directly.
- Backup filenames (e.g., index.php~, config.bak) or .git/ exposure help reconstruct layout.
Probing/trial-and-error (brute-force with feedback)
- The attacker sends requests with increasing numbers of ../ (or nested/overlapped variants) and watches differences:
  - HTTP status (200 vs 404 vs 500),
  - Response body (presence of known text),
  - Response size,
  - Timing differences.
- Example logic: try file=27.jpg, file=../27.jpg, file=../../27.jpg, ... until a different response indicates you moved up a directory.
- Because web apps often return different responses when you walk above webroot or hit a protected file, attackers can infer depth by observing where responses change.
Known platform layout
- Many frameworks have well-known directory structures (e.g., WordPress, Django, Node apps). Knowing the framework often gives the likely relative depth from a public asset to the system root or config files.
Absolute-path acceptance
- Some vulnerable endpoints accept absolute paths directly (e.g., /etc/passwd), removing the need to know relative depth. Attackers try absolute paths as a first quick test.
Side channels
- If the webapp includes or processes files (e.g., includes a template file), attackers might detect successful inclusion by subsequent outputs or behavior changes (e.g., different header, different error).
- Out-of-band techniques (OAST/OOB) can be used to detect file reads indirectly if the server triggers network interactions when a particular file is referenced (rare but used in advanced attacks).
Public source code, repos, or leaks
- If any part of the app or its config is public (GitHub, CI logs), that often contains absolute or relative paths.