Getting down and dirty with social media data and NodeXL

Over the last few weeks I have been getting down and dirty with social media data and trialing new software – NodeXL.  Here are a few of my reflections as a digital researcher.

It is hard to estimate the time needed, not only because  the project I am working on is exploratory but also because I do not have the necessary new software skills.  As a result and unsurprisingly (you’d have thought I’d know better as an experienced academic) the work is now behind my over optimistic schedule.

Exploratory research and analytical software make strange bedfellows,  not knowing what the data will comprise of, how much of it there will be or what you will be able to demonstrate. NodeXL is  an open source template, offered by the Social Media Research Foundation through which you can create semantic  network visualisations. There is both a free version and a pro, paid for version – the latter allows you to investigate more social media platforms by pulling the API stream of social media data into the software and then performing various manipulations on the data which can end up looking  like this. image below.

The Social Media Research foundation is a US based response to the need for accessible, rigorous  tools for academics to research social media but who do not have large enough budgets to pay for the proprietary  commercial tools now available. node-xl-image-vague

Second, the quantity of social media data scrapped from Twitter and Instagram on just one specific hashtag  is copious and the data need significant cleaning – filtering out non English posts and commercial posts requires reading of the raw material as there are not automatic filters which can do this.


Third, ordering the social media data into manageable sized files in date order per platform is important and from then creating  one giant file per platform scrape is also important.  My learning from this aspect  – take data piece by piece and ensure you can trace it back.

Fourth, the ability to capture social media images is potentially highly valuable BUT out of   context and without the words and posters who give the images their grounding, the images remain  meaningless.

Fifth, admitting that help is required and finding an expert who can help or offer advice in how to maximise the value from your data and analytical tools is worthwhile, even if it means acknowledging that you need help. Luckily in my case I have both Marc Smith one of the writers of NodeXL based in the USA, and Wasim Ahmed  @was321, a Sheffield University doctoral student who is an expert at handling NodeXL data and answered some of my queries, even though his subject area is not Marketing.

Getting close and dirty with these social media hashtag data has provided me with more questions, but also  insight which will be developed into a paper for 2017 submission. Yes it would be easier to pay someone to ‘do it’ for me but then I would find justifying the method and explaining the rigour impossible.

Quandaries over real time in digital research

Recently I have been having a few quandaries over real time in digital research, and thinking about temporality generally, not least because I have a research colleague in Australia and thus our interactions are always proceeded by sorting out time differences across continents.


Whilst we as digital  researchers, marketers and digital enthusiasts talk about real time as in immediate, synchronous interactions, how many of our interactions are really in real time? Technology and how it interacts may be synchronous but  how it is actually used may not be. Someone can send me a WhatsApp message  but I may look at it five minutes later. An email can be sent, received but not read or responded to until after a cup of coffee is finished or a meeting held. On the other hand a Skype interview is synchronous and real time. Does it matter whether behaviour or responses  are synchronous, near  synchronous or asynchronous? Does  synchronous versus asynchronous data impact on the quantity or content of that collected data? The ability to track real time online shopping behaviour is highly valuable for online retailers and their brands and the real time manipulation of promotional campaigns is a triumph of technology. However, as researchers do we require such acceleration and ‘nowness’?


Authors who have discussed technology’s impact on time and ‘nowness’ include Manuel Castell’s The Rise of the Networked Society in which Castell outlines the concept of flows  rather than time where global interactions occur simultaneously and society becomes compressed by the speed of technologies transforming patterns of consumption, economic markets and societies.This technological determinism is argued against by  Judy Wajcman, amongst others, who in her recent text Pressed for Time which emphasises how technologies are supposed to be freeing us and that citizens should revisit their relationship with time.


As a digital researcher what is my relationship with time, what is my temporality?  Different research projects  require  data which may be real time or distant time. One of my doctoral students is considering complaint behaviour on social media concerning disappointing luxury experiences and grappling with whether she needs synchronous or asynchronous data.  Do you tweet at the point of disappointment or do you  email the brand  a few moments later or do you write a  poor review and post it on Tripadvisor days later? And furthermore what is impact of ‘nowness’ versus near now versus later on the research data generated?

Shaping ethical research practice in social media research

As social media is becoming integral to so many people’s lives, how we conduct ethical research within social media is becoming a hot topic for social science researchers. Digital technologies including  social media are  increasingly being used in academic and commercial research and various interdisciplinary social science groups are grappling with the ethical implications of the opportunities and challenges these technologies present.  Indeed 5 general guiding ethics principles  for social science researchers from  the Academy of Social Sciences  were outlined in summer 2015 and those of us involved in social media research are now trying to suggest how  our type of research should incorporate those principles which are;

  1. Social science is fundamental to a democratic society and should be inclusive of different interests, values, funders, methods and perspectives.
  2. All social science should respect the privacy, autonomy, diversity, values, and dignity of individuals, groups and communities.
  3. All social science should be conducted with integrity throughout, employing the most appropriate methods for the research purpose.
  4. All social scientists should act with regard to their social responsibilities in conducting and disseminating their research.
  5.  All social science should aim to maximise benefit and minimise harm.

Whilst very few would argue with the general principles above, certain aspects such as privacy can create challenges for us as researchers. So this week I’ll be presenting some ideas around  re-focusing the blurred lines between researchers and participants in social media research at an interdisciplinary conference in London as part of the NSMNSS network with the aim of  suggesting good practice in order to move the ethical research debate forward.

blurring glasses

Along with my co-author, Nina Reynolds, I will be outlining ideas such as asking for incremental, ongoing consent rather than one-off informed consent and giving constructive suggestions on the recent IPSOS MORI  DEMOS report  on social media research ethics. I ‘ll let you know the outcomes of our discussions in the next blog.

Ethical dilemmas in digital research

As part of my university wide role as Chair of Research Ethics I am increasingly facing questions about ethical dilemmas in digital research, some of which I thought I’d share in my  next  2 or 3 blog posts. So to kick off I am starting with issues around participant recruitment platforms and crowd working.

Accessing research participants for any primary research is undoubtedly getting harder and response rates are falling fast!  Digital technologies can play a role in alleviating this but care needs to be taken. The first issue which I have been grappling with is participant recruitment and the use of third party platforms through which to either recruit participants and or  gather research data through, such as online research questionnaires. These are now plentiful across all subject disciplines and  are based on three business models.

1) A research institute or university platform designed to facilitate research, which may have open or closed access and usually requires an university email address (

2) A spin off from number one type which has been developed as a small business  such as the Oxford University Software incubator firm (

3) A purely commercial platform aimed at academic research which claim to be approved by university ethics committees or IRBs (Institution Research or Review Boards in the USA) (e.g.

These various models  become more complicated when you  unpick their various payment options. Some are entirely free, some are free to upload your questionnaire but each response costs the researcher money or the respondent benefits through a reward system, points for questionnaire completion which equals discount vouchers etc. Another type charges a fee to post your research and also charges per completion of questionnaire. A further variation is the fremium model, sign up for a basic free version but subscribe for the useful version such as Many use Paypal as the payment intermediary which, for some universities, causes concern in the finance department. Furthermore some UK universities are concerned about the storage of the research data on these platforms, their stability and the security of the data. Various other quality and ethical issues arise from these platforms. Very few are explicit about which, if any, research organisations ethics policy they comply with such as ESOMAR, MRS, AoIR etc. The pool of participants who sign up to participate on these platforms  are highly self-limiting, and are unlikely to be representative of the desired target sample – unless you are looking for students or retiree silver surfers. Additionally, some platforms offer significant cash  incentives to people who refer participants on to the site.

mechanical turk ad

At a whole other level is Amazon’s Mechanical Turk operation (Mturk). This ‘job completion’ platform works on the basis that activities which need completing are posted to this internet marketplace by organisations, people can then browse and complete these tasks for payment or Amazon gift vouchers (depending on which country the workers are located in). Third party organisations have become involved whereby workers are contracted to the third party to complete multiple different Mturk activities and the third party retains most of the payment for the completion. Mturk is being used by academic researchers for certain types of studies, including structured questionnaires. Academic journal editors have very differing views on the appropriateness of using this platform, some regard this as a legitimate tool in the digital economy, others see it as a flawed approach with the potential for becoming embroiled in digital sweatshops. Further discussion on  crowdworking and the broader ethical implications can be found at

Mturk raises multiple ethical issues in the research context which are worth highlighting.

  1. Can we establish that those completing the task have been informed about the context of the research?
  2. Do those completing the task have free choice in whether or not to participate?
  3. To what extent is it necessary to inform those participating about how the data will be used and also the outcome of the research?
  4. Can the level of anonymity required be guaranteed?
  5. How can researchers using Mturk as a data collection tool guard against fraud?
  6. Will there be fair payment to those completing the tasks, when will they be paid etc?
  7. Sampling frames may be very distorted and inaccurate or even unknown to the researchers.

I don’t profess to have the answers to these research focused ethical dilemmas regarding conducting research within a digitalised world but at least we should know what questions to ask of ourselves and what questions to share with our research students.