Big Data and Ethics

big-data_conew1In reading 'Big Data, Machine Learning, and the Social Sciences: Fairness, Accountability, and Transparency', something came back to me from some years ago.

Sometime in the year before I last left Trinidad and Tobago, I was strangely asked to see Member of Parliament Chandresh Sharma in Chaguanas. Since he and the present Prime Minister of the country allegedly said that there my family did not have deeds for property related to Otaheite Estate, relations were strained - so I decided to go and speak with him about that. 

It ended up being a political strategy meeting for the United National Congress (UNC) (not the People's Partnership, a distinction Trinidadians will understand) in Chaguanas, and I was to sit and wait through some presentations that I really had no patience for. I'm not a political beast. As it happened - and apparently the real reason I was invited - was the use of what we would now call 'Big Data' being used on a map for political purposes. 

Race (which is still, unfortunately, a predictor in Trinidad and Tobago politics), religion, familial relations and other aspects of people were on that map, tied to GPS coordinates. In the broad strokes, it's not a bad thing - but the way that people were discussing the data's use made my skin crawl. They apparently expected me to participate in making the map 'better', but in a small country like Trinidad and Tobago it seemed very invasive - particularly since they were even getting into shopping habits. While some may have seen just an attempt to win an election, I saw the people who were funding the UNC wanting this map for very different reasons. 

It bothered me. I didn't participate. I gave Sharma a piece of my mind in a diplomatic way - I have become better at that as I grew older - and walked away.

Did I see positive potential for that data? Yes. Did I trust those people with that data? No. Am I right? Maybe.

As luck had it, the People's Partnership - a coalition that mainly was made up of the UNC - won the election. Did the map have anything to do with it? I don't know, and I'm OK with that. 

And that's the trouble with Big Data. It's not that it doesn't have the potential to help with good causes, it's determining what a good cause is. It's also about how the data will otherwise be used as well as by whom. 

So what is Big Data? Big data is really an umbrella term for many ways of using all the data that we have surrounding us. The aspect of Big Data that really concerns most people isn't about SETI, or about determining meteorological forecasts. It's about people, and Michelle Chen's article hits the key issue neatly on the head:

...While it’s true that Big Data—the amassing of huge amounts of statistical information on social and economic trends and human behavior—can be empowering for some, it’s often wielded as a tool of control. While we’re busy tracking our daily carb intake, every data packet we submit, each image we toss into the cloud, is hoarded and parsed by powerful institutions that manage our everyday lives....

Her article, 'Is Big Data Reinforcing Social Inequalities', also hits a few key points that I personally agree with. Daniel J. Solove wrote much about this in his books, as have others, but in a world where social networks have consumers as their product, the responsibility is handed off to corporations filled with groups that decide how information is used. It doesn't always seem to be a matter of what is ethically right but instead of what one can legally do. But what is ethically right? 

That's the Big Data conundrum. Who gets to choose how all that data is used? In theory, a government of elected officials could be right, but governments make unpopular decisions all the time - and the ethics of politicians are constantly being found to be less than perfect. Some say that the free market should decide, yet the free market has similar issues. The power to decide really lays with the people providing the data in the first place, but in a world where 'tl;dr' is used, most people don't want to be bothered.

There is an issue here - the Big Data elephant in the room. Until more people begin understanding that there are ethical issues with big data, it will continue being a nebulous phrase that makes companies and governments happy but scares the dickens out of others. 

Just because someone can do something doesn't mean that they should. And just because there are serious concerns related to big data doesn't mean it shouldn't be used for the betterment of society. The issue seems to be, "Who do you trust?"

Ubuntu: ASUS G74SX

I opted for Ubuntu instead of Gentoo because, for some reason, Gentoo was trying to toss me an AMD64 distro for an i7 through the Universal USB Installer (UUI).With an entire DVD to download, I just opted for Ubuntu. 

I was wiping the drive of all things Windows. Bye bye.

I was having some issues with installing Ubuntu from USB. They seemed to revolve around drive mounting during the install process, so eventually I tried booting to the Ubuntu install on the USB and checked the mount to the drive I was installing to. It seemed OK, but I decided to change it over to bootable LVM. Then I ran the install from within Ubuntu (instead of from the boot option).

Now everything's working. Just putting that out there unless someone else has issues. 

Creating a Software Development Environment

University of Maryland and Sourcefire Announce New Cybersecurity PartnershipI have some ideas that I want to play with, and as I hinted at here, I'm thinking of using C++ if only because some people have already done some of the heavy lifting with content management system frameworks, etc. I don't know that this will be a worthwhile exercise as far as what I would like to see, but almost all exercise is good exercise.

The first step is creating a solid development environment. Toward that end, I'm repurposing my old ASUS G74S - a beast of a machine when it first came out some years ago that will still outperform many new (and cheap) machines. As I write this, it's backing up the old Windows 7 system. 

Because the web is built in Linux, the system will become Linux based - wiped of all that was licensed from Microsoft. Which distro? I'm thinking Gentoo because it will allow me customize the build more easily than some other distros, but I haven't quite decided yet. It could end up being just about any distro.

Since Linux comes with just about all the development tools one needs (or can easily be had), the rest is pretty simple. From there I'll make a build of CppCMS and see whether it will work for what I want to do. I expect it will. Ultimately, I may integrate fuzzylite or a similar library to allow me some ideas on playing with dynamic navigation and tagging. Tossing wxWidgets into the mix, at least during development, seems to be a good idea. 

For backup, I expect I'll use github - at least for now. The objective is to make it Free Software (GPL) if my ideas work out, but I'd also not like to have people scamper off with the general ideas to stick it in proprietary code during development - if it's actually worthwhile. 

Really, I don't know that any of it would be worthwhile, but it's a fun idea for me to flesh out - and it's been a long time since I've had fun doing a project. It's kind of exciting to get started on something I'll enjoy doing.