R with Oracle

The data presentation graphic at right, and many of those sprinkled throughout this article (click them for better resolutions), are highlights from a gallery which have been produced by the R statistical software package. R is an open-source programming language and environment which has gained much popularity among academics who want to apply statistical methods

Deletion Insurance

If you have some experience, the topic of deleting consequential data automatically brings to mind optional safety measures in case things go wrong. The recent article concerning data deduplication is a good example. What if — I didn’t test my DELETE code enough and a snafu occurs? What if — I have to retrieve some or all of the deleted rows after the fact? What if — I want to keep the deleted rows (or partial rows) available on the side for inspection for awhile?     Continue…

Unpivoting Spreadsheets into Oracle

phingeThe premise is as follows: You’ve got useful data lying in an Excel or Google spreadsheet which you’d like translated into relational table(s). This is likely going to involve normalizing the denormalized spreadsheet data on the way into the database; you are going to end up with (and want) more rows in your table than the corresponding quantity within the spreadsheet. The SQL tool for this is the UNPIVOT keyword.     Continue…

Beautiful Data Gallery

Just a roundup of some recent takes on what people are up to in the area of data visualization. Also, some interesting places to look for more and a few notable blogs where you can keep posted about related topics.

Muddy Picture in BD/Health Care

soybean gene mapping

soybean gene mapping

Plenty of swirling news items connected with large-scale data efforts in Health research of late. There’s a wild west nobody-in-charge feeling to some of it. This recent NY Times piece describes a process in which centralized data collectors who were early to the game and on top of their lobbying and bill tailoring skills are reaping massive rewards, but those providers who bought their pitch are having a harder time realizing the benefits.     Continue…

De-duplicating Rows

dedup them!

dedup these hay-bogarting heffers for me!

Often the need arises to locate and remove duplicated rows, or partially duplicated rows from a table. Queries using the DISTINCT keyword (or it’s synonym UNIQUE) will retrieve and display rows without duplicates. But actually identifying and then removing the unwanted duplicates within a table represents a different level of work, and is also likely to involve greater resource consumption.     Continue…

Set Operators in SQL

150px-Edgar_F_Codd

E.F. ‘Ted’ Codd

Relational database design has strong roots within Set Theory, as can be seen in the seminal work of E.F. Codd more than 40 years ago. Most people recall encountering Venn diagrams during their school days, a pictorial excursion into this realm. Codd codified his relational ideals into a series of well-known rules, which represent an extreme case, not taking account of the often necessary pragmatic or performance reasons for selective denormalization. But his principles are still instructive and serve as a starting point for designing business schemas.     Continue…

Netflix’s Big Adventure

MV5BMTQ4MDczNDYwNV5BMl5BanBnXkFtZTcwNjMwMDk5OA@@._V1_SX214_Netflix has been open and unrepentant about publicizing it’s strategy for cultivating value from all the event data at it’s fingertips due to it’s streaming service user base. Their data miners have been taking things far beyond the level of Amazon’s ‘smart’ book recommendations. This new drama, premiering tonight, has been statistically vetted regarding timeslot, programming, script, and even cast with heavy input from their S3/Hadoop cloud (they use AWS for their platform) of user preference and activity data.     Continue…

Big Data Comeuppance Sightings

Gartner Hype Cycle graphic cited by Sicular

Gartner Hype Cycle graphic cited by Sicular

I’m no market analyst, but I have to admit that among my first early reactions to noticing all the BD buzz afoot back in 2010 was to wonder: exactly who besides the odd Google, Amazon, Facebook or Twitter with their godzillabytes datapile cultures would need this stuff for real? I’m exaggerating, since I can see plenty of research opportunities within BioTech, communications analysis, financial, and other sectors as possibilities. But the real question is whether the average corporate data heap’s need for BD and NoSQL is actual, or just hype.     Continue…

Exploring Analytic Functions : an Iterative Approach

Lake Malawi

Lake Malawi

Suppose some quixotic HR evangelist fresh off a stint of social aid work in Malawi is pushing the bombastically non-Libertarian idea of normalizing compensation patterns within departments as follows: employees who have high tenure seniority contrasted with low salary ‘seniority’ will receive pay adjustments. This is the sort of question that can be explored with Oracle’s analytic functions.     Continue…

Follow

Get every new post delivered to your Inbox.