- Published on
Path Traversal
- Authors

- Name
- Cookie
A Brief Overview
When a web page requires certain resources from the server, it sends an HTTP request containing a resource identifier, which is sometimes a file name. The server then retrieves the resource using the file system API.
For example, a web page may have an <img> element like this:
<img src="loadImage?filename=001.png">
To load the resource, the browser sends an HTTP request to the server:
GET /loadImage?filename=001.png HTTP/1.1
Host: your.server
The server reads the URL, extracts the filename parameter and constructs a path inside the images directory:
from http.server import BaseHTTPRequestHandler, HTTPServer
from urllib.parse import urlparse, parse_qs
import os.path
current_folder = os.path.dirname(os.path.abspath(__file__))
image_folder = os.path.join(current_folder, "images")
class SimpleHander(BaseHTTPRequestHandler):
def do_GET(self):
# Parse the URL path and query parameters from the full request pth. urlparse splits the URL into parts (path, query string, etc.)
parsed = urlparse(self.path)
# Only handle requests where the path is "/loadImage"
if parsed.path == "/loadImage":
# parse_qs extracts query parameters as a dictionary:
# {'filename': ['001.png']}
query = parse_qs(parsed.query)
# Get the first value of filename parameter or None if not present.
filename = query.get("filename", [None])[0]
if filename:
target_path = os.path.join(image_folder, filename)
print(f"Resolved path: {os.path.abspath(target_path)}\n")
try:
with open(target_path, "rb") as f:
# Read all bytes from the file
content = f.read()
self.send_response(200)
self.send_header("Content-Type", "application/octet-stream")
self.end_headers()
# Send the file content back to the client
self.wfile.write(content)
except FileNotFoundError:
self.send_response(404)
self.end_headers()
self.wfile.write(b"File not found")
else:
self.send_response(404)
self.end_headers()
self.wfile.write(b"Unknown path")
if __name__ == "__main__":
server = HTTPServer(("localhost", 8080), SimpleHander)
print("Serving on http://localhost:8080")
server.serve_forever()
Because the server simply concatenates the user-supplied filename into the path without validation, an attacker can provide a traversal string such as ../../../server.py , which exposes the server's code, or access sensitive files like /etc/passwd .
curl -v -w "\n---Done---\n" "http://localhost:8080/loadImage?filename=../../../server.py"
Other Vulnerabilities
A straightforward solution might be to check whether the filename contains ../ or .. . However this can still be bypassed by the following attacking vectors:
The
os.path.joindiscards the base directory if the second argument is an absolute path. This means any readable file can be accessed using/loadImage?filename=/path/to/secret.For example,os.path.join(image_folder, "/etc/passwd")resolves to/etc/passwd.If the server runs on a Windows, path segments are separated by
\. Filtering only literal../doesn’t prevent Windows-style traversal such as/loadImage?filename=..\\windows\\system.ini.Sequences like
../hidden behind symlinks remain exploitable. If an attacker can create or replace files inside the image folder, a symlink inside that directory lets them read outside the intended directory with a perfectly “safe” filename. For example, assume that the attacker can upload a file:ln -s /etc/passwd profile.png curl -F "file=@profile.png" [https://your-server/upload](https://your-server/upload)and the server saves it naively. They now visit:
https://your-server/loadImage?filename=profile.png. So instead of reading/uploads/images/profile.png, they follow the link and read/etc/passwd. Then the server returns that file to the attackerIf the server has unhandled exception, they may bubble up, causing the handler terminates without sending a response. Repeating malformed requests can trigger these exceptions faster than the server can recover, consuming the the request-processing loop or worker slots and effectively denying service to legitimate clients.
How to mitigate
The beset solution is to avoid using user-supplied input to filesystem APIs entirely. However, when that is not possible, the following steps can reliably prevent path traversal:
- Define a trusted base directory (
image_folder). - Build the candidate path using
join. - Resolve it to an absolute, normalized path (
realpath). - Verify that the resolved path is still inside the base directory.
- Optionally: reject symlinks, enforce allowed extensions, etc.
from http.server import BaseHTTPRequestHandler, HTTPServer
from urllib.parse import urlparse, parse_qs
import os
import os.path
current_folder = os.path.dirname(os.path.abspath(__file__))
# 1. Canonical base directory for images
image_folder = os.path.join(current_folder, "images")
image_folder = os.path.realpath(image_folder) # normalize & resolve symlinks
class SimpleHander(BaseHTTPRequestHandler):
def do_GET(self):
parsed = urlparse(self.path)
if parsed.path == "/loadImage":
query = parse_qs(parsed.query)
filename = query.get("filename", [None])[0]
if not filename:
self.send_response(400)
self.end_headers()
self.wfile.write(b"Missing filename")
return
# 2. (Optional but recommended) Basic filename sanitization
# Reject path separators outright, if you don't need subdirectories.
if os.path.sep in filename or (os.path.altsep and os.path.altsep in filename):
self.send_response(400)
self.end_headers()
self.wfile.write(b"Invalid filename")
return
# 3. Construct candidate path inside image_folder
candidate_path = os.path.join(image_folder, filename)
# 4. Resolve to absolute real path (normalizes '..', follows symlinks)
target_path = os.path.realpath(candidate_path)
print(f"Resolved path: {target_path}\n")
# 5. Ensure the resolved path is still inside image_folder
# This is the key path traversal guard.
if not target_path.startswith(image_folder + os.sep):
self.send_response(403)
self.end_headers()
self.wfile.write(b"Access denied")
return
# 6. (Optional) Reject symlinks explicitly
try:
if os.path.islink(target_path):
self.send_response(403)
self.end_headers()
self.wfile.write(b"Symlinks are not allowed")
return
except FileNotFoundError:
# Fall through to the normal 404 handling below
pass
# 7. Finally, open the file safely
try:
with open(target_path, "rb") as f:
content = f.read()
self.send_response(200)
self.send_header("Content-Type", "application/octet-stream")
self.end_headers()
self.wfile.write(content)
except FileNotFoundError:
self.send_response(404)
self.end_headers()
self.wfile.write(b"File not found")
else:
self.send_response(404)
self.end_headers()
self.wfile.write(b"Unknown path")
if __name__ == "__main__":
server = HTTPServer(("localhost", 8080), SimpleHander)
print("Serving on http://localhost:8080")
server.serve_forever()
Where the Information Got Leaked
While studying this topic, a natural question arises: how do attackers know the structure of my file system well enough to escape the intended directory?
Here are some possible information sources:
- URL structure & convention
- Many apps expose predictable paths:
/images/27.jpg,/uploads/27.jpg,/static/img/27.jpg. If27.jpgis requested through/image?file=27.jpg, attackers will assume there’s a base directory like.../images/or.../uploads/. - If the endpoint is
/image?file=..., an attacker will try common parent folders used by the app (e.g.../images/27.jpg,../uploads/27.jpg).
- Many apps expose predictable paths:
- Client-side code and public resources
- HTML, JavaScript, CSS, or config files (even
sitemap.xml,robots.txt) may reveal directory names or filenames. Developers often leak resource paths in the front end. - Inspecting the site (or its GitHub, public repos) often gives exact locations.
- HTML, JavaScript, CSS, or config files (even
- Error messages and path disclosure
- Poorly configured servers or frameworks sometimes return error stacks or messages containing file system paths (e.g.,
/var/www/app/public/images/27.jpg). That immediately reveals the absolute path or at least the directory depth. - Even 404 pages can reveal whether a path exists or not (different responses for
…/images/27.jpgvs…/doesnotexist.jpg).
- Poorly configured servers or frameworks sometimes return error stacks or messages containing file system paths (e.g.,
- Directory indexing, sitemap and exposed backups
- If directory listing is enabled, the attacker can see the folder contents directly.
- Backup filenames (e.g.,
index.php~,config.bak) or.git/exposure help reconstruct layout.
- Probing/trial-and-error (brute-force with feedback)
- The attacker sends requests with increasing numbers of
../(or nested/overlapped variants) and watches differences:- HTTP status (200 vs 404 vs 500),
- Response body (presence of known text),
- Response size,
- Timing differences.
- Example logic: try
file=27.jpg,file=../27.jpg,file=../../27.jpg, ... until a different response indicates you moved up a directory. - Because web apps often return different responses when you walk above webroot or hit a protected file, attackers can infer depth by observing where responses change.
- The attacker sends requests with increasing numbers of
- Known platform layout
- Many frameworks have well-known directory structures (e.g., WordPress, Django, Node apps). Knowing the framework often gives the likely relative depth from a public asset to the system root or config files.
- Absolute-path acceptance
- Some vulnerable endpoints accept absolute paths directly (e.g.,
/etc/passwd), removing the need to know relative depth. Attackers try absolute paths as a first quick test.
- Some vulnerable endpoints accept absolute paths directly (e.g.,
- Side channels
- If the webapp includes or processes files (e.g., includes a template file), attackers might detect successful inclusion by subsequent outputs or behavior changes (e.g., different header, different error).
- Out-of-band techniques (OAST/OOB) can be used to detect file reads indirectly if the server triggers network interactions when a particular file is referenced (rare but used in advanced attacks).
- Public source code, repos, or leaks
- If any part of the app or its config is public (GitHub, CI logs), that often contains absolute or relative paths.