social media

Big Data and Ethics

big-data_conew1In reading 'Big Data, Machine Learning, and the Social Sciences: Fairness, Accountability, and Transparency', something came back to me from some years ago.

Sometime in the year before I last left Trinidad and Tobago, I was strangely asked to see Member of Parliament Chandresh Sharma in Chaguanas. Since he and the present Prime Minister of the country allegedly said that there my family did not have deeds for property related to Otaheite Estate, relations were strained - so I decided to go and speak with him about that. 

It ended up being a political strategy meeting for the United National Congress (UNC) (not the People's Partnership, a distinction Trinidadians will understand) in Chaguanas, and I was to sit and wait through some presentations that I really had no patience for. I'm not a political beast. As it happened - and apparently the real reason I was invited - was the use of what we would now call 'Big Data' being used on a map for political purposes. 

Race (which is still, unfortunately, a predictor in Trinidad and Tobago politics), religion, familial relations and other aspects of people were on that map, tied to GPS coordinates. In the broad strokes, it's not a bad thing - but the way that people were discussing the data's use made my skin crawl. They apparently expected me to participate in making the map 'better', but in a small country like Trinidad and Tobago it seemed very invasive - particularly since they were even getting into shopping habits. While some may have seen just an attempt to win an election, I saw the people who were funding the UNC wanting this map for very different reasons. 

It bothered me. I didn't participate. I gave Sharma a piece of my mind in a diplomatic way - I have become better at that as I grew older - and walked away.

Did I see positive potential for that data? Yes. Did I trust those people with that data? No. Am I right? Maybe.

As luck had it, the People's Partnership - a coalition that mainly was made up of the UNC - won the election. Did the map have anything to do with it? I don't know, and I'm OK with that. 

And that's the trouble with Big Data. It's not that it doesn't have the potential to help with good causes, it's determining what a good cause is. It's also about how the data will otherwise be used as well as by whom. 

So what is Big Data? Big data is really an umbrella term for many ways of using all the data that we have surrounding us. The aspect of Big Data that really concerns most people isn't about SETI, or about determining meteorological forecasts. It's about people, and Michelle Chen's article hits the key issue neatly on the head:

...While it’s true that Big Data—the amassing of huge amounts of statistical information on social and economic trends and human behavior—can be empowering for some, it’s often wielded as a tool of control. While we’re busy tracking our daily carb intake, every data packet we submit, each image we toss into the cloud, is hoarded and parsed by powerful institutions that manage our everyday lives....

Her article, 'Is Big Data Reinforcing Social Inequalities', also hits a few key points that I personally agree with. Daniel J. Solove wrote much about this in his books, as have others, but in a world where social networks have consumers as their product, the responsibility is handed off to corporations filled with groups that decide how information is used. It doesn't always seem to be a matter of what is ethically right but instead of what one can legally do. But what is ethically right? 

That's the Big Data conundrum. Who gets to choose how all that data is used? In theory, a government of elected officials could be right, but governments make unpopular decisions all the time - and the ethics of politicians are constantly being found to be less than perfect. Some say that the free market should decide, yet the free market has similar issues. The power to decide really lays with the people providing the data in the first place, but in a world where 'tl;dr' is used, most people don't want to be bothered.

There is an issue here - the Big Data elephant in the room. Until more people begin understanding that there are ethical issues with big data, it will continue being a nebulous phrase that makes companies and governments happy but scares the dickens out of others. 

Just because someone can do something doesn't mean that they should. And just because there are serious concerns related to big data doesn't mean it shouldn't be used for the betterment of society. The issue seems to be, "Who do you trust?"

Tags, Time and Content Creators

You're next!This entry builds on the shoulders of 'The Trouble With Tagging' and 'Navigation in a Multidimensional World of Data'. If you feel like you're missing something, check out those entries.

In previous entries I mentioned the subjectivity of tags as well as the need to have more than one point to navigate from. While some of what I'm writing about has been done, it is masked by a single text box with a search button next to it - be it on this site or a search engine of your choosing.

Tag subjectivity really depends on the author, the time of the writing (what the tag meant at the time of writing) and the site on which the content is published.

The Author

As I mentioned before, there are two extremes of tagging that content creators are somewhere between. Simply put, these extremes are, 'be seen' and 'be accurate'.

We all know content that has been tagged to 'be seen' that isn't accurate. That's a constant battle with search engines against those that game the system to have their content seen, and the motivation for that is typically advertising. It's aggravating at times to type in a search phrase only to be inundated with a bunch of links best described as 'crap'.A photo I posted on Flickr, which I tagged very tongue in cheek, gets views because of the tags I used - and it's safe to say that someone searching for such things is more interested in content of another type. The same is true of this image. While both images are work safe (and very PG), people who search for certain keywords are likely upset with me because of the tagging. Of course, they won't complain, and I get a few chuckles.

Accuracy, on the other hand, is a bit different. Being a bit of a naturalist, I take photos of wildlife and tag them with their scientific names. A great example of this is this image of a young cane toad. Because I tagged it accurately, the image (with my permission) made it's way onto sites related to invasive species in Florida. In fact, images that I have licensed out have been tagged accurately - translating to 'getting paid'.

Getting a little bit ahead: Images that I have had a little fun with the tagging don't really earn. But then, I don't make money off of advertising on Flickr. In fact, Flickr doesn't make money advertising on Flickr.

Content creator subjectivity in tagging can allow for content to get views for the wrong reasons, or it can allow for content to get views for the right reasons. The wrong or right, despite what you may think as a content creator, is not up to the content creator. It's dependent on the audience and it's also dependent on time.


Almost all of my content views do not happen when I publish the content. My experience is that my style typically gets more reads after a few months. We could attribute this to a lot of things such as popularity of the topic and popularity of the content creator. I've never really been interested in being popular - I've been popular for periods - but I've found things that I've written about have been popular and sometimes are cyclically popular.

Some things are timeless. Music, books, movies - even ideas - some of these are timeless. In the grand scheme of things, they represent a very small percentage of what has been created.

Can you name something created on the internet and for the internet that's timeless? There are some things that are, but when it comes to popular content on the Internet, you'll likely not find anything that stands the test of time.

Then we get into what tags mean. For example, prior to February 4th, 2004, the tags 'social media' and 'social network' would not have included Facebook. Prior to July, 2006, Twitter wouldn't have been encapsulated by those tags either. Why? Because they didn't exist prior to those dates. And when it comes to social media and social networking, in a popular sense, how many people even know about or remember Orkut? Relatively few, I imagine.

So what tags mean is dependent on when they were used - and even The Semantic Sphere 1: Computation, Cognition and Information Economy doesn't really speak to that issue. Symbols, words, meanings - they change.

Don't believe me? Look up a random word at

Tags and Searches

There are obviously a lot of issues with tagging content, and while their typical use of what's popular now, over time the tag degrades. It's not hopeless, though.

There are two things that can be done with searches - and tagging content in general - that can be done to assure that content stands the test of time. The content creator and the time of publishing, generally speaking, are methods of searching - and maybe we should be treating them as tags within content management systems. Sure, you can search by person, and on some sites you can even search between specific dates, but those are not the standard and they are not the standard because they were never designed this way. They are treated differently in databases, typically in different database tables altogether.

A few of you might see where I'm going with this...


Anecdote: eCommerce, Social Media and Customer Service

Books in: R.A. Salvatore signed!I'm a fan of R.A. Salvatore for a variety of reasons, but suffice to say that he writes things that I enjoy reading. Some weeks ago, I came across his page on Facebook - and as a human being, he's pretty awesome too. I should mention C.J. Cherryh's 'Wave Without a Shore' site too, because she's another great author selling direct.

Over the years, even before the Internet, I gobbled his books from anywhere I could get them. I have fond memories of doing some stuff in Alaska with 3rd Battalion, 3rd Marines and having one of his books in my pack - which came in real handy for 24 hour days when everyone was bedded down and twiddling their thumbs. R.A. Salvatore's work has been a constant. 

The story should be that I bought some books, he signed them and sent them to me and I received them. That is the story, but something different happened in the middle that is worth mentioning.

They emailed me to see if I had received them.

That's a nice touch in a world of Amazon, where they believe you got them based on what UPS reports to them through layers of a silicon network. A human being, Diane, his wife, reached out to me. When I found that I had idiotically deleted the email with the tracking number, she sent it to me.

So let's take stock. We have:

  • an author that was selling books to consumers before the Internet.
  •  initial contact through social media with a consumer (me).
  • the consumer finding out about a solution to his problems - all the darned books lent out over the years and never gotten back, and wanting to read the series from start to end in the order the author intended.
  • the transaction, where the site and the consumer shook hands, traded some numbers, etc.
  • follow-up. 'Did you get it?'

The last part is the one that so many companies forget these days. It shows a level of interest in the consumer that seems to be becoming unfashionable.

It shouldn't become unfashionable.

This is the sort of experience that anyone involved with eCommerce should be trying to emulate. We all like to be treated like we're human beings and that we matter. In that respect, it's a little sad that this is something I find blogworthy considering how much I buy online. It demonstrates that it's not the norm of my experience.

Of course, it helps to have a great product.