8 million GitHub profiles scraped, data found leaking online

Technology recruitment site GeekedIn has scraped 8 million GitHub profiles and left the information exposed in an unsecured MongoDB database. The backup of the database was downloaded by at least one third party, and it’s likely being traded online.

GitHub profiles scraped

Troy Hunt, the security researcher who runs the Have I been Pwned? service and whose own information is in the compromised backup file, received the file, and ultimately notified GitHub of the matter.

His analysis of the file ultimately revealed that:

  • It contains 8.2 million unique email addresses, i.e. records about 8.2 million users of GitHub, Bitbucket (another web-based hosting service for projects), and possibly other online services.
  • Most of these records contain users’ names, usernames, email address, geographic location, professional skills, years of professional experience.
  • All of this information is already online on GitHub and those other services, accessible to anybody – GeekedIn just scraped it and created its own database, access to which is offered to companies interested in finding developers – for a fee.

When contacted, GitHub said that they allow third parties scraping of their users’ data, so long as it’s only used for the same purpose for which they gave that information to GitHub.

“Using scraped information for a commercial purpose violates our privacy statement and we do not condone this kind of use,” they told Hunt.

After he finally managed to get in touch with GeekedIn, they acknowledged the incidente and promised to secure the data.

Hunt made some of this data searchable in raw format through his service, but only a little over 1 million users will be able to find it. He only included the data of those who had a publicly available email address on GitHub.

“This incident is not about any sort of security vulnerability on GitHub’s behalf, rather it relates to a trove of data from their site which was inappropriately scraped and then inadvertently exposed due to a vulnerability in another service,” he made sure to note.

More about

Don't miss