The data presentation graphic at right, and many of those sprinkled throughout this article (click them for better resolutions), are highlights from a gallery which have been produced by the R statistical software package. R is an open-source programming language and environment which has gained much popularity among academics who want to apply statistical methods
This gallery contains 10 photos.
May 4, 2013 12:10 PM / Leave a Comment
March 31, 2013 2:02 PM / Leave a Comment
If you have some experience, the topic of deleting consequential data automatically brings to mind optional safety measures in case things go wrong. The recent article concerning data deduplication is a good example. What if — I didn’t test my DELETE code enough and a snafu occurs? What if — I have to retrieve some or all of the deleted rows after the fact? What if — I want to keep the deleted rows (or partial rows) available on the side for inspection for awhile? Continue…
March 11, 2013 4:00 PM / Leave a Comment
The premise is as follows: You’ve got useful data lying in an Excel or Google spreadsheet which you’d like translated into relational table(s). This is likely going to involve normalizing the denormalized spreadsheet data on the way into the database; you are going to end up with (and want) more rows in your table than the corresponding quantity within the spreadsheet. The SQL tool for this is the UNPIVOT keyword. Continue…
Just a roundup of some recent takes on what people are up to in the area of data visualization. Also, some interesting places to look for more and a few notable blogs where you can keep posted about related topics.
This gallery contains 6 photos.
March 3, 2013 2:27 AM / Leave a Comment
February 20, 2013 1:09 PM / Leave a Comment

soybean gene mapping
Plenty of swirling news items connected with large-scale data efforts in Health research of late. There’s a wild west nobody-in-charge feeling to some of it. This recent NY Times piece describes a process in which centralized data collectors who were early to the game and on top of their lobbying and bill tailoring skills are reaping massive rewards, but those providers who bought their pitch are having a harder time realizing the benefits. Continue…
February 16, 2013 5:19 PM / Leave a Comment

dedup these hay-bogarting heffers for me!
Often the need arises to locate and remove duplicated rows, or partially duplicated rows from a table. Queries using the DISTINCT keyword (or it’s synonym UNIQUE) will retrieve and display rows without duplicates. But actually identifying and then removing the unwanted duplicates within a table represents a different level of work, and is also likely to involve greater resource consumption. Continue…
February 6, 2013 10:23 AM / Leave a Comment

E.F. ‘Ted’ Codd
Relational database design has strong roots within Set Theory, as can be seen in the seminal work of E.F. Codd more than 40 years ago. Most people recall encountering Venn diagrams during their school days, a pictorial excursion into this realm. Codd codified his relational ideals into a series of well-known rules, which represent an extreme case, not taking account of the often necessary pragmatic or performance reasons for selective denormalization. But his principles are still instructive and serve as a starting point for designing business schemas. Continue…
February 1, 2013 1:03 PM / Leave a Comment
Netflix has been open and unrepentant about publicizing it’s strategy for cultivating value from all the event data at it’s fingertips due to it’s streaming service user base. Their data miners have been taking things far beyond the level of Amazon’s ‘smart’ book recommendations. This new drama, premiering tonight, has been statistically vetted regarding timeslot, programming, script, and even cast with heavy input from their S3/Hadoop cloud (they use AWS for their platform) of user preference and activity data. Continue…
January 27, 2013 5:00 PM / 1 Comment

Gartner Hype Cycle graphic cited by Sicular
I’m no market analyst, but I have to admit that among my first early reactions to noticing all the BD buzz afoot back in 2010 was to wonder: exactly who besides the odd Google, Amazon, Facebook or Twitter with their godzillabytes datapile cultures would need this stuff for real? I’m exaggerating, since I can see plenty of research opportunities within BioTech, communications analysis, financial, and other sectors as possibilities. But the real question is whether the average corporate data heap’s need for BD and NoSQL is actual, or just hype. Continue…
January 23, 2013 11:24 AM / Leave a Comment

Lake Malawi
Suppose some quixotic HR evangelist fresh off a stint of social aid work in Malawi is pushing the bombastically non-Libertarian idea of normalizing compensation patterns within departments as follows: employees who have high tenure seniority contrasted with low salary ‘seniority’ will receive pay adjustments. This is the sort of question that can be explored with Oracle’s analytic functions. Continue…