Hi all,
Last few days I have been thinking on various sources of data. Yes, I am talking about knowledge, mining and blah blah blah. Here, I am planning to share my thoughts about personality mining. Obviously, the next thing comes to my mind if personalized PazeRank algorithm. The idea is simple, we need to find the static pagerank of each webpage is web based on taste of different users. The idea was clear, we need to take into account the need of individuals.
But, here I am talking about similar but different topic. I am talking about how we can understand one entity buy analyzing whatever the entity is responding with. Lets keep it simple and keep it in textual domain. The sources of such responses could be the chatting history or the emails or anything. The sources also include blogs written by him/her, comments and tweets. It could be his status messages in facebook too. Note, they may not be accessible but that is not the point. The challenge is that every one source is different from the other. So, we have challenges for each of these sources of text. The other day someone told me that reading chat history is completely different than the usual corpus data we work with and after downloading all my chat history from gmail I understood he is correct. Each one of these sources are too complicated to mine.It is pretty tough to conclude whether or not a particular chat contains any useful information or not, analyzing it would be beyond our reach. But, with so many customers in the web, it would become important for companies to classify their customers into equivalence classes of personality so that each one class could be treated accordingly to attain maximum gain.
Till now, the web is full of information, but the question is how to gain from it. Everyday one new twitter is redefining the language of social interaction. Today, Wikipedia, twitter and facebook are major examples of source of data but who knows, there may be new facebook on our desk tomorrow. Are we ready for it?
Last few days I have been thinking on various sources of data. Yes, I am talking about knowledge, mining and blah blah blah. Here, I am planning to share my thoughts about personality mining. Obviously, the next thing comes to my mind if personalized PazeRank algorithm. The idea is simple, we need to find the static pagerank of each webpage is web based on taste of different users. The idea was clear, we need to take into account the need of individuals.
But, here I am talking about similar but different topic. I am talking about how we can understand one entity buy analyzing whatever the entity is responding with. Lets keep it simple and keep it in textual domain. The sources of such responses could be the chatting history or the emails or anything. The sources also include blogs written by him/her, comments and tweets. It could be his status messages in facebook too. Note, they may not be accessible but that is not the point. The challenge is that every one source is different from the other. So, we have challenges for each of these sources of text. The other day someone told me that reading chat history is completely different than the usual corpus data we work with and after downloading all my chat history from gmail I understood he is correct. Each one of these sources are too complicated to mine.It is pretty tough to conclude whether or not a particular chat contains any useful information or not, analyzing it would be beyond our reach. But, with so many customers in the web, it would become important for companies to classify their customers into equivalence classes of personality so that each one class could be treated accordingly to attain maximum gain.
Till now, the web is full of information, but the question is how to gain from it. Everyday one new twitter is redefining the language of social interaction. Today, Wikipedia, twitter and facebook are major examples of source of data but who knows, there may be new facebook on our desk tomorrow. Are we ready for it?