Getting down and dirty with social media data and NodeXL

Over the last few weeks I have been getting down and dirty with social media data and trialing new software – NodeXL.  Here are a few of my reflections as a digital researcher.

It is hard to estimate the time needed, not only because  the project I am working on is exploratory but also because I do not have the necessary new software skills.  As a result and unsurprisingly (you’d have thought I’d know better as an experienced academic) the work is now behind my over optimistic schedule.

Exploratory research and analytical software make strange bedfellows,  not knowing what the data will comprise of, how much of it there will be or what you will be able to demonstrate. NodeXL is  an open source template, offered by the Social Media Research Foundation through which you can create semantic  network visualisations. There is both a free version and a pro, paid for version – the latter allows you to investigate more social media platforms by pulling the API stream of social media data into the software and then performing various manipulations on the data which can end up looking  like this. image below.

The Social Media Research foundation is a US based response to the need for accessible, rigorous  tools for academics to research social media but who do not have large enough budgets to pay for the proprietary  commercial tools now available. node-xl-image-vague

Second, the quantity of social media data scrapped from Twitter and Instagram on just one specific hashtag  is copious and the data need significant cleaning – filtering out non English posts and commercial posts requires reading of the raw material as there are not automatic filters which can do this.

raw-social-media-data

Third, ordering the social media data into manageable sized files in date order per platform is important and from then creating  one giant file per platform scrape is also important.  My learning from this aspect  – take data piece by piece and ensure you can trace it back.

Fourth, the ability to capture social media images is potentially highly valuable BUT out of   context and without the words and posters who give the images their grounding, the images remain  meaningless.

Fifth, admitting that help is required and finding an expert who can help or offer advice in how to maximise the value from your data and analytical tools is worthwhile, even if it means acknowledging that you need help. Luckily in my case I have both Marc Smith one of the writers of NodeXL based in the USA, and Wasim Ahmed  @was321, a Sheffield University doctoral student who is an expert at handling NodeXL data and answered some of my queries, even though his subject area is not Marketing.

Getting close and dirty with these social media hashtag data has provided me with more questions, but also  insight which will be developed into a paper for 2017 submission. Yes it would be easier to pay someone to ‘do it’ for me but then I would find justifying the method and explaining the rigour impossible.