Big Data: Analysis of recent NSW bushfires tweets (Oct 2013)
Recently, I have developed a keen interest in these areas of research and been exploring some of data-mining tools such as R-Studio and CLUTO.
There are oceans of information out there on the net. One of the most interesting things about big data is the generation of new trend and pattern information for solving important or emerging problems. While huge data is available out there in the cloud, processing and mining those data is challenging. Many approaches, data-mining and knowledge-based techniques, computational algorithms and mining tools were developed to analysis these big data.
The purpose of analyzing big data is to discover key and actionable information and gain meaningful insights into the event, for example understand more about latest bush fires in New South Wales in Australia. The goal of such sentiment analysis is to provide real-time information
What is big data? How big is a big data?
Tim [i] defines big data as a scale of data that cannot be managed by conventional means. Internet-scale data has fostered the creation of new architectures and applications that are able to process this new class of data. These architectures are highly scalable and efficiently process data in parallel across a sea of servers.
The popular microblog Twitter sees 218m monthly active users, 3.5m monthly mobile users, 100 m daily users, and over 500m tweets per day, with an average of 5,700 tweets a second. At one time in Japan (on August 3 2013) during a TV show of Castle in the Skye, people took to Twitter so much of 143,199 tweets per second [ii].
What are people talking about recent NSW bush fires?
Recent bush fires in NSW Australia have drawn media and government attention. Many people took to twitter using various hashtags (#). As part of my concern and support for the affected people and agency services such as NSWRFS, I examined a large number of tweets using various data analysis techniques, including use of R-Studio. The goal of my investigation is understand what people are talking about. This would provide useful insights into cause and effects of the bush fires and what we can do about it.
Figure 1 illustrates a cloudtag, generated and analysed from over two thousand tweets retrieved at 1:10 PM Saturday, October 26 (AEST) from twitter with these hashtags: #nswfires, #nswbushfire, #nswbushfires ,#nswrfs, #sydneybushfire ,#sydneyfire, #sydneyfires, #sydneybushfires.
These hashtags are most commonly used in the tweets to indicate a discussion on recent bush fires in NSW Australia. Common words and tags, such as ‘today’ and twitter handles, such as @dailytelegraph, were removed.
For now I will leave the interpretation of the cloudtag to prospective readers.
Reference:
[i] Jones, Tim (2013), Process real-time big data with Twitter Storm: An introduction to streaming big data. Ibm.com/developerworks/