Author: Matthew A. Russell
The only way you could have missed the fact that the social networking boom has led to huge amounts of social data becoming available to knowledgeable searchers is if you haven’t been using a computer and the Internet at all. This book will show you how to discover who’s talking to whom, what about and where they are located in the real world – in short, how to mine useful data from the social networks, blogs and email.
About the author
Matthew Russell, Vice President of Engineering at Digital Reasoning Systems and Principal at Zaffra, is a computer scientist who is passionate about data mining, open source, and web application technologies.
Inside the book
If you pick up this book, it is very desirable that you know something about programming in general and programming in Python in particular, otherwise, I guarantee you, you won’t understand most of what you are meant to.
The book starts by instantly diving into the problem of setting up a development environment – and doing so by using Python – and into collecting and analyzing Twitter data. To do that, you have to have an account of your own and enough followers whose data you want to mine. You will learn how to see what people are talking about at a given moment, extract relationships from the tweets and how to visualize all the graphs you discover.
Later, you will be able to learn things like “Given all my followers and all their followers, what is my potential influence if I get retweeted?”, by learning how to harness Twitter’s API.
Next, you’ll learn about microformats and how they allow the searcher to take existing content and make the data in it explicit and standardized so that it can be collected and made sense of. You will also see how you can mine your own mailbox(es) and which tools to use to have a clear overview of all the data and perform further analyses.
Other targets for mining data that are included in this book are LinkedIn, Facebook, blogs, etc. Each of them is in some way different than the others, so it requires a different approach and offers different possible results. LinkedIn, for example doesn’t show how people are connected among each other, but you can cluster contacts by job title or location.
Facebook is especially a great trove of interesting data, since the users are encouraged to share on so many aspects of their lives, and use it to chat, keep in touch, share photos and thoughts, and many other things. To mine the data collected on this social network, you’ll have to make an application that will do that for you – regardless of the fact that you yourself can access all that information by simply accessing your friend’s profiles.
Prepare to be sidetracked a lot of times while reading this book – it touches so many technologies and techniques that you’re bound to go searching more about them on the Web.
This book is a good choice for the burgeoning data analyst. People who are more interested in finding out stuff about their friends and colleagues due to sheer curiosity should skip this tome.