What can or cannot be data?

After two classes of my Text Mining for History and Literature class this semester at Cornell I am starting to question the limits of the word “data.”  In class we were posed with the task to define data and decide if literature is data. Initially, the answer appeared to be an overwhelming consensus that yes, literature is data but, then things began to become more murky when the discussion began to attempt to define what data actually is. This is something I had never really considered before, whether some objects or concepts could be data or not. There were many phrases thrown around about how data must be able to be read by a computer, be read algorithmically, or that it must inherently be quantifiable.

This concept of data that was created in our class confused me at the start, I began to think about emotions and thought and whether they were quantifiable aspects of life. They are caused by electrical impulses in our brains that can be measured, but can we get to this same quantifiable entity through the study of an author’ word choice. Does this fit into a typical definition of data?

At this point I think the more prudent question to ask is “what can or cannot be data?,” rather than “what is data?” We know what already is data, but if the aim is to create new knowledge and research it would be better to ask what can become data and be analyzed. Rather than focusing on what aspects of literature, history, human nature, and life fit into the preconceived notion of “data,” might it be better to see where the farthest boundaries of data lie through experimentation and attempting to understand non-traditional datasets as data? Hopefully I will come up with an answer eventually.