Monday, May 19, 2008

Stupid Excel tricks - extracting standard release designations

I spent part of my morning wrestling with a problem.

I'm working on a project in which I have to analyze some data, including software release data. At my employer MegaCorp, we are effectively working with four different types of releases, which we define as follows:

  • Product releases (e.g. "9").

  • Standard releases (e.g. "9.10").

  • Supplemental releases (e.g. "9.10.1").

  • Builds (e.g. "9.10.1.SP1").
It should be noted that the product, standard, and supplemental release numbers can have either one or two digits (e.g, "1.1.1," "99.99.99"), and that the build numbers may be numeric of alphanumeric. In addition, the database that I'm using stores standard releases, supplemental releases, and builds. So my data may look like this:


So, here's my problem. For purposes of my project, I only care about standard releases. As far as I'm concerned, 9.10.1.SP1,, 9.10.1, and 9.10 are all just 9.10. However, because of the variable length of the elements of the fields, I can't simply take the first three, first four, or first five characters of the text string. And I haven't figured out a way to take everything before the SECOND period.

Initially, I had a problem even forming my question. I couldn't just go out and ask "How do you convert a build to a standard release?" because our numbering system is not used across the entire industry. While we use product.standard[.supplemental[.build]], the Wikipedia article on software versioning documents several other numbering systems, including:


And there are some other strange variants that I won't even go into here. Suffice it to say that there wasn't a standard way to phrase a Google question "How do you convert a four-part software version to a two-part software version?"

After some false starts, enter to the rescue:

The most obvious method might be to parse the data. The menu option Data | Text to Columns starts a wizard which will guide you through the process of splitting up a range of data into columns. The wizard prompts you to describe the character which separates each piece of data within the string....The data is then rewritten on your worksheet, broken up over as many columns as required.

However, the Text to Columns wizard wipes out existing data in the column in question, as well as one or more columns to the right. So I had to work out some steps to make the conversion without wiping out the original data. For this example, assume your Excel spreadsheet has 14 columns of data (A - N) and that the data of interest is in the 13th column (M).

  • Copy the contents of column M into column P.

  • Select the contents of column P.

  • Using Excel's "Text to Columns" item in the Data menu, indicating that the text is delimited with a period (.). If you have four-part build numbers, then columns P through S will now have data.

  • Since columns P and Q may have one or two digit numbers, place the equation "=P1+Q1/100" into cell O1, then fill the equation down column O. Note that releases such as 9.1 will be rendered as "9.01," which allows this release to be distinguished from release "9.10."
Once that's done, then the textual/numeric data


will become the numeric data


which is what I need for my purposes.

There are probably more automated ways to do this, and applications other than Excel may provide more elegant ways to do this, but this quick and dirty method seems to work for me.

Now on to data analysis...

Sphere: Related Content
blog comments powered by Disqus