Categorized Directory

Main Menu

  • Home
  • Search directory
  • Web crawlers
  • Collect data
  • Indexation
  • Bankroll

Categorized Directory

Header Banner

Categorized Directory

  • Home
  • Search directory
  • Web crawlers
  • Collect data
  • Indexation
  • Bankroll
Web crawlers
Home›Web crawlers›Crawling – PortSwigger

Crawling – PortSwigger

By Ed Robertson
May 6, 2022
0
0

BUSINESSPROFESSIONAL

The explore phase of an analysis involves navigating through the application, following links, submitting forms, and logging in as needed, to catalog the content of the application and the navigation paths within it. . This seemingly simple task presents a variety of challenges that Burp’s crawler is able to overcome, to create an accurate map of the application.

Basic approach

By default, Burp’s crawler navigates around a target application using Burp’s browser, clicking links and submitting inputs when possible. It builds a map of the app’s content and functionality in the form of a directed graph, representing the different locations in the app and the links between those locations:

The crawler makes no assumptions about the URL structure used by the application. Locations are identified (and later re-identified) based on their content, not the URL that was used to reach them. This allows the crawler to reliably handle modern applications that put ephemeral data, such as CSRF tokens or cache busters, into URLs. Even though the entire URL of each link changes on every occasion, the crawler still constructs an accurate map:

An app with ephemeral URLs that change every time

The approach also allows the crawler to manage applications that use the same URL to reach different locations depending on the state of the application or the user’s interaction with it:

An application that uses the same URL to reach different locations, depending on the state of the application or the user's interaction with it

As the crawler navigates and expands the coverage of the target application, it follows the edges of the graph that have not been completed. These represent links (or other navigation transitions) that have been observed in the app but not yet visited. But the crawler never “jumps” to a pending link and visits it out of context. Instead, it either navigates directly from its current location or it returns to the starting location and navigates from there. This reproduces as closely as possible the actions of a normal user browsing the site:

Return to the starting location of the exploration

Crawling in a way that makes no assumptions about the structure of the URL is very good at dealing with modern web applications, but can potentially lead to problems by seeing “too much” content. Modern websites often contain a mass of superfluous navigation paths (via footers, hamburger menus, etc.), which means that everything is directly linked to everything else. Burp’s crawler uses various techniques to solve this problem: it fingerprints links to previously visited places to avoid visiting them redundantly; it explores in a broad order that prioritizes the discovery of new content; and it has configurable thresholds that limit the scope of exploration. These measures also make it possible to correctly handle “infinite” applications, such as calendars.

Session management

When Burp’s crawler navigates around a target application using Burp’s Browser, it is able to automatically handle virtually any session management mechanism that modern browsers can use. There is no need to record macros or set up session management rules telling Burp how to get a session or verify that the current session is valid.

The crawler employs multiple crawler “agents” to parallelize its work. Each agent represents a distinct user of the application navigating with its own browser. Each agent has its own cookie box, which is updated when the application sends it a cookie. When an agent returns to the starting location to start exploring from there, their cookie box is cleared, to simulate a completely new browser session.

The requests the crawler makes while browsing are built dynamically based on the previous response, so CSRF tokens in URLs or form fields are handled automatically. This allows the crawler to properly navigate functions that use complex session management, without any configuration by the user:

Automatic management of session tokens during the crawl

Detect application state changes

Modern web applications are highly dynamic and it is common for the same application function to return different content on different occasions as a result of actions taken by the user in between. Burp’s crawler is able to detect application state changes that result from actions it performed while crawling.

In the example below, navigating the path BC causes the application to transition from state 1 to state 2. Link D goes to a logically different location in state 1 than in state 2. So the path AD goes to the empty cart, while ABCD goes to the populated cart. Rather than simply concluding that link D is non-deterministic, the crawler is able to identify the state change path that link D depends on. This allows the crawler to reliably reach the location of the filled basket at future, to access the other functions that are available from there:

Detect application state changes during crawl

Login to app

Burp’s crawler begins with an unauthenticated phase in which no credentials are submitted. Once this is complete, Burp will have discovered all of the login and self-registration features in the app.

If the application supports self-registration, Burp will attempt to register a user. You can also configure the crawler to use one or more pre-existing connections.

The crawler then proceeds to an authenticated phase. It will visit the login function multiple times and submit:

  • Self-registered account credentials (if applicable).
  • The credentials for each pre-existing account configured.
  • Fake credentials (these can achieve cool functions like account recovery).

For each set of credentials submitted to the login, Burp will then analyze the discovered content behind the login. This allows the crawler to capture the different functions available to different types of users:

Explore with different login credentials to access different functions available to different users

Exploration of volatile content

Modern web applications frequently contain volatile content, where the “same” location or function will return responses that differ significantly on different occasions, not necessarily as a result of user action. This behavior may result from factors such as social media channel feeds or user comments, online advertising, or truly random content (post of the day, A/B testing, etc.).

Burp’s crawler is able to identify many instances of volatile content and correctly re-identify the same location on different visits, despite the different responses. This allows the crawler to focus their attention on the “essential” elements of an application response set, which is probably most important in terms of discovering the main navigation paths to interesting content and features in the application. application :

Identify the basic elements of an HTML page and the variable content that changes on different occasions

In some cases, visiting a given link on different occasions will return responses that are too different to be treated as “identical”. In this situation, Burp’s crawler will capture both versions of the answer as two different locations and draw a non-deterministic edge in the graph. Provided the extent of non-determinism in the application is not too large, Burp can still crawl the associated content and reliably find its way to the content behind the non-deterministic link:

Explore when application responses are sometimes non-deterministic

Explore with Burp’s browser (browser-powered analytics)

By default, if your machine appears to support it, Burp will use Burp’s browser for all browsing of your target websites and applications. This approach offers several major advantages, allowing Burp Scanner to handle most client-side technologies that modern browsers can use.

One of the main benefits is the ability to efficiently crawl JavaScript-heavy content. Some websites have a dynamically generated navigation user interface using JavaScript. Although this content is not present in the raw HTML, Burp Scanner can use the browser to load the page, run any scripts needed to create the user interface, and then continue crawling as normal.

Burp’s browser also allows Burp Scanner to handle cases where websites modify requests on the fly using JavaScript event handlers. The crawler can trigger these events and run the appropriate script, modifying queries as needed. For example, a website can use JavaScript to generate a new CSRF token after a onclick event and add it to the next query. Burp Suite can interact with elements that have been made clickable by JavaScript event handlers.

If you prefer, you can also manually enable or disable browser-based scanning in your scanning configuration. You can find this option under Explore Options > Miscellaneous > Burp Browser Options.

Related posts:

  1. Which platform is right for you?
  2. SEO: what is it and how it works
  3. Empathy app helps grieving people complete tasks
  4. Web Scraper Software Market To Gain USD 948.60 Million At

Categories

  • Bankroll
  • Collect data
  • Indexation
  • Search directory
  • Web crawlers

Recent Posts

  • Live-Action TV Spider-Mans Who Didn’t Appear in No Way Home
  • Bennet bill would create federal definition of school shooting, direct incident data collection
  • The 10 Most In-Demand Entry-Level Remote Jobs Landing Right Now
  • Face-Scanner Clearview accepts the limits of the legal settlement | Economic news
  • Ex-minister embroiled in Hellenic row over staff cuts

Archives

  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • Privacy Policy
  • Terms and Conditions