Identifying suspicious URLs
This video explores online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs. It shows that this application is particularly appropriate for online algorithms as the size of the training data is larger than can be efficiently processed in batch and because the distribution of features that typify malicious URLs is changing continuously.
Using a real-time system for gathering URL features, combined with a real-time source of labeled URLs from a large Web mail provider, the authors demonstrate that recently-developed online algorithms can be as accurate as batch techniques, achieving daily classification accuracies up to 99% over a balanced data set.
Update 2021-06-29
For privacy reasons we are not embedding videos from YouTube in our articles. The video can still be accessed directly on YouTube by clicking here. Do have in mind that the video was recorded back in 2010.
Here are some more recent Help Net Security articles that are dealing with this topic:
1) Google launches Chrome extension for reporting suspicious sites
2) Phishing attacks are a complex problem that requires layered solutions