Hi all,
Last few days I have been thinking on various sources of data. Yes, I am talking about knowledge, mining and blah blah blah. Here, I am planning to share my thoughts about personality mining. Obviously, the next thing comes to my mind if personalized PazeRank algorithm. The idea is simple, we need to find the static pagerank of each webpage is web based on taste of different users. The idea was clear, we need to take into account the need of individuals.
But, here I am talking about similar but different topic. I am talking about how we can understand one entity buy analyzing whatever the entity is responding with. Lets keep it simple and keep it in textual domain. The sources of such responses could be the chatting history or the emails or anything. The sources also include blogs written by him/her, comments and tweets. It could be his status messages in facebook too. Note, they may not be accessible but that is not the point. The challenge is that every one source is different from the other. So, we have challenges for each of these sources of text. The other day someone told me that reading chat history is completely different than the usual corpus data we work with and after downloading all my chat history from gmail I understood he is correct. Each one of these sources are too complicated to mine.It is pretty tough to conclude whether or not a particular chat contains any useful information or not, analyzing it would be beyond our reach. But, with so many customers in the web, it would become important for companies to classify their customers into equivalence classes of personality so that each one class could be treated accordingly to attain maximum gain.
Till now, the web is full of information, but the question is how to gain from it. Everyday one new twitter is redefining the language of social interaction. Today, Wikipedia, twitter and facebook are major examples of source of data but who knows, there may be new facebook on our desk tomorrow. Are we ready for it?
Last few days I have been thinking on various sources of data. Yes, I am talking about knowledge, mining and blah blah blah. Here, I am planning to share my thoughts about personality mining. Obviously, the next thing comes to my mind if personalized PazeRank algorithm. The idea is simple, we need to find the static pagerank of each webpage is web based on taste of different users. The idea was clear, we need to take into account the need of individuals.
But, here I am talking about similar but different topic. I am talking about how we can understand one entity buy analyzing whatever the entity is responding with. Lets keep it simple and keep it in textual domain. The sources of such responses could be the chatting history or the emails or anything. The sources also include blogs written by him/her, comments and tweets. It could be his status messages in facebook too. Note, they may not be accessible but that is not the point. The challenge is that every one source is different from the other. So, we have challenges for each of these sources of text. The other day someone told me that reading chat history is completely different than the usual corpus data we work with and after downloading all my chat history from gmail I understood he is correct. Each one of these sources are too complicated to mine.It is pretty tough to conclude whether or not a particular chat contains any useful information or not, analyzing it would be beyond our reach. But, with so many customers in the web, it would become important for companies to classify their customers into equivalence classes of personality so that each one class could be treated accordingly to attain maximum gain.
Till now, the web is full of information, but the question is how to gain from it. Everyday one new twitter is redefining the language of social interaction. Today, Wikipedia, twitter and facebook are major examples of source of data but who knows, there may be new facebook on our desk tomorrow. Are we ready for it?
The article is interesting from a computer engineer perspective. Very well-written.
ReplyDeleteThese applications also interest me. However, these days, it has struck me that we are forgetting the right to identity in the pursuit of personality analysis.
It is sad that we dont respect the fact that people may want to be anonymous, not be modeled. Do we really have the moral right to create a 'kundli' of a person of this form?
Thankx for the appreciation :)
ReplyDeleteI do realize this moral issue and I am not against it.
I am talking about classifying him/her based on the class of personality he/she belongs. This still means we need to track his/her history. This is not immoral until we are not eavesdropping. If this tracking is done after informing the person then it can not be immoral.
Also placing something public on the web means allowing anyone to judge him/herself. This is how google ads works in my blog too.
But, one of the key points i tried to point out is that the way of social networking is changing day by day, so what would text analyst do if tomorrow twitter, fb becomes obsolete and a new party arranges new way of information sharing. Then everything done on analyzing twitter/fb etc will be in vain..