All I see Is….

In my job, I’m the “numbers guy”. I work for a college, so the primary number everyone wants to know is enrollment. How many students do we have in the college for the semester? How many students are taking English classes? And from that number flows other numbers. And those numbers combine into formulas to provide state and federal funding. This number flows into that number which is transformed into these numbers, which delineate where funding sources come from in another system that is based on another metric of calculations. So, I’m a hardcore statistics guy, eh? Not really.

Binary WorldThere’s another side to all of this. Connectivity. Understanding why certain numbers are. Statistics is not going to teach you a lot of that. For instance, enrollment is closely tied to the employment rate within the counties that my college serves. When the economy is good, more people have jobs, and hence do not have or need a lot of time to take college classes. Thus enrollment goes down. When the economy takes a down-turn, those that are forced out of jobs, see a need to improve or update their skill-sets – which means they tend to enroll in college classes to do so. But that’s still only part of the story. Its the part of the story that most people in my job position tend to stop at. I only wish there was an easy method to quantify and describe the students who come to the school. Because there’s a lot more than just numbers at play here.

Sure, you have probably guessed that I am talking about the individual stories of the students’ themselves. The single mother, balancing her full-time job with the college classes that she is taking, while being sure that she is there to raise her two very young children. Or the student that has decided to take control of his life after a second DWI and a short sojourn in a jail cell for a week. Or the high school sophomore, who has found that high school classes are not as challenging for him, so he comes to the college as a dual-credit student – earning college credits with the same classes that are satisfying his high school requirements. Or the older student, in her late sixties, looking to learn more about these “dang” computers. Each an individual story. Each with unique needs and wants for their respective lives. Each one, a number that adds to the enrollment in the college.

When I was hired, my interview process had a single question that threw me off guard:  “If there is one thing that you could tell people about data, what would you tell them?” In my typical fashion, I answered off-the-cuff, and as brutally honest as I could. When I left the interview, I felt that I had blown the interview because of it. I found it later, it was the cinching moment that won my interview committee into my favor.

Data is essentially a grouping of ones and zeroes. That grouping has meaning because of the symbolism we give to it. From that grouping of symbolism, we infer conclusions from what is represented. Looking backwards over data gathered in a similar time-frame with a similar set of symbolic inference from a prescribed period of time, we can analyze a trend of growth, reduction or stagnation – all dependent on how we interpret the rise, fall and plateau of the numbers. But. Those numbers are representative of far more than just enrollment or the number of A grades given out by an instructor in a certain class taught on a certain campus. There are stories behind those numbers. Each student has a different reason for coming here to this college and taking the classes that they take. Every grade presented by an instructor in a class is reflective on how the instructor taught the class, but only when the balance of the student work is compared to the grade given. If we choose to dig only deep enough to find Success rates, Failure rates, and Withdrawal rates – we will certainly find a narrative that can be created for the casual reader of the derived analysis reports. But we miss the larger picture of who our student body is. We begin down the slippery slope of making our students into numbers, and setting our college’s diplomas as a piece of paper that when stamped, merely states that the student is capable of following instructions. When we decide not to turn over the data rocks to see what kind of student we have, we tread down a very dangerous path. I would tell people that data is more than just numbers, counted and regurgitated statistics. Those numbers are people. And they should be treated with the respect that they deserve, even while we work the sterile process of finding our associated head counts and rates.

But that’s where digging into the connectedness of things comes into place. The fox continues to crawl under your fence, come up on the porch, and eat your cat’s food. Why? Perhaps, if you looked around the neighborhood a little bit, you would notice that there is a wooded area that is being cleared for a strip-mall. Remember, the fox has been your neighbor all along. The fox has fed on rodents and other small animals in the area – keeping their population in check, so that they do not over-strip some of the plants. Those plants, until they were torn down for the strip-mall, blocked out the sounds of the nearby interstate, provided fresh and clean air to the immediate area. Now those plants are gone. That being the home environment for the smaller animals – they fled elsewhere. That left the fox with the need to find food elsewhere. Connectivity.

Certainly, people can look at the numbers and make inferences. Its easy to that because data is sterile. You don’t readily see it, except for the symbolic presentation given to you on your computer screen or printout. And while I dislike the sliminess of Cypher from the Matrix, I have to agree with him on one thing, where data representation is made

I don’t even see the code. All I see is blonde, brunette, redhead.

The Dark Art of Statistical Analysis – an Opinion

Last night, while browsing around the internet — I am on the road in Arkansas, back at my father’s house to move furniture and other heavy items that I wish to keep — I ran across an article on CNN proclaiming Christianity and Islam as the fastest growing religions in the world today. As I read through the article, I realized that there seemed to be a bit of a bias to either the study or the article. The article mostly speaks about the “big five” — Christianity, Muslim, Judaism, Hinduism, and Buddhism. There’s a nod to “Unaffiliated” beliefs, but that’s where it stops.

As I read through it, I started to get upset. How can Paganism, admittedly an amalgamation of many beliefs, not be included? ‘How unfair!’ – my mind kept crying out. And I started to write a scathing, very critical rant about this entire issue. And eventually, I grew tired enough to get to sleep, so I left the blog post unfinished. This morning, as I came back to my computer, and had a fresh cup of coffee in my hand, I sat and mulled my point. Watching the squirrels moving around in the forested area behind the town-home that we are renting — my mindset began to change, and I scraped that blog post in favor of this one.

Am I upset about the exclusion of the smaller faiths from the article, and possibly even the data study itself? Sure I am. I get just as upset when I hand data over to other departments at my job, only to watch them come to a very different conclusion than I had when pulling and compiling the data. But I also don’t have their agenda in my mind when I am looking at those numbers. In my position, I am supposed to be the Switzerland of the environment – always neutral. I can’t decry what the writer of the article says, nor can I proclaim the data study that was compiled to be completely inaccurate either. If I had access to the data utilized for the study, I might possibly come to the very same conclusions. I would actually need access to the raw data inputs that were utilized, as well as the explanations of what data inputs were discarded from the study, and why.

To be honest, data analysis is an art form. A dying one at that. It is easy to manipulate results to showcase or assist in the underlying definition of a previously stated conclusion. As a song lyric I vaguely recall once stated: “Give me facts and figures, and I can make them say anything you like” — or something like that. The hard part is taking those facts and figures and letting it tell the story it has. Even when it completely destroys the entire thesis of what you were trying to prove. But that’s not what happens in today’s society. Not one bit…

Somewhere down the line – my assumption would be that it happened somewhere between the late 1980s and the early 2000s – businesses realized that statistics could be manipulated in ways to get consumers to believe a particular way. I am sure its been done in various ways throughout History — its far too easy to do — but I believe the wholesale corruption of statistical analysis took place somewhere in there. And purely for the benefit of capturing the consumer dollar. Do I have proof? Hardly. Only a semi-educated guess. But I have seen many data studies, where data has been thrown out for being “outside of the statistical means”. In other words, the outliers have been removed from the equation, so as to smooth the data results. For those that may not remember their bell curve mathematics, outliers are those data points that fall outside of the primary population that is being studied. And in the case of the CNN article, I would posit that non-mainstream faiths and their current, as well as their new adherents, are considered to be outliers for this study.

What does it mean? Nothing much, really. Unless you take this particular study as empirical data, and its conclusions as concrete fact. Then, there’s a problem. But…to be perfectly blunt and honest, no one really cares about that. No one is going to dig into the particulars of a study to see where the source data came from, as well as what data was removed as being “inconsistent” or “skewed” — or even an explanation on the “whys” of this. And thus, statistical analysis remains a “dark art” — not one cloaked in evil, just one shrouded in the darkness of a forgotten corner.

ADDITIONAL

As I read through parts of the PEW Report, I discovered that the bias is not in the report, but rather in the CNN article itself. Paganism seems to be lumped into the “Folk Religions” area, or possible the “Other Religions” part of the study. It will certainly be interesting to read, at least for me.