Crawling web applications is one of the key phases of automated web application scanning. The objective of crawling is to collect all possible resources from the server in order to automate vulnerability detection on each of these resources. A resource that is overlooked during this discovery phase can mean a failure to detect some vulnerabilities. The introduction of Ajax throws up new challenges for the crawling engine. New ways of handling the crawling process are required as a result of these challenges. The objective of this paper is to use a practical approach to address this issue using rbNarcissus, Watir and Ruby.
Problem domain and new approach
Usually crawling engines are “protocol-driven” and open a socket connection on the target host or IP address and port. Once a connection is in place the crawler sends HTTP requests and tries to interpret responses. All these responses are parsed and resources are collected for future access. The resource parsing process is crucial and the crawler tries to collect possible sets of resources by fetching links, scripts, flash components and other significant data.
2. DOM event handling and dispatching
3. Dynamic DOM content extraction
Download the paper in PDF format here.