Thoughts on Programming - by Varun
12/02/2007 @ 01:28:55
Haven't posted in a long time...Found this comic quite entertaining.
So, I want to discuss some time-saving tricks with VBA-Excel. Mainly:
- When to use VBA
- Data From Workbook
- Arrays
- Hashes
- Error Handling
- File Input/Output (I/O)
- Etiquette
When to use VBA
So many times over I have thought, "I want to use VBA for this thing!" I'll explicitly say, that is the exact opposite of what you should be thinking. Try as hard as you can to not use VBA with your Excel sheets. Why? While VBA is very clever, many people do not trust other's code and will Disable VBA on the opening of a workbook and will attempt to check it to ensure the safety of their computer. Especially in business, most individuals do not know coding and it's always easier to trace things just on spreadsheets (for bankers) than to delve into code. KISKIS - Keep it Simple, Keep it Safe. Now, if you had to resort to VBA, below are some tips that will make life easier for you and for other's who read your code.
Getting Data From Workbook
This section is pretty much "borrowed" from Dermot Balson's (my favorite VBA guru) website.
Resizing
You will always have a bunch of stuff on your spreadsheet, and invariably will eventually have to write some code to deal with that data (Example: you have two columns of numbers and you want to find a weighted sum of the first column using the second column as the weights). Sure, you can do that example by hand, but let's assume you have to use that elsewhere and compute it often, so you have to use some code. What is the best way to deal with that data? What first comes to mind is something like:
Public Function WeightAverage(rng1 As Range, rng2 As Range)
Dim i!, wgtAvg#
rng2Sum = Application.WorksheetFunction.Sum(rng2)
For i = 1 to rng1.Count
wgtAvg = rng1.Cells(i,1).Value * rng2.Cells(i,1).Value / rng2Sum _
+ wgtAvg
Next i
WeightAverage = wgtAvg
End Function
But let's say you had 65536 numbers in each column, do you really want to hit the worksheet 2*65536 times just to calculate the weighted average? This ties back to my first point about what you really need VBA for; however, for the sake of this example we should find a solution to hitting the worksheet less than that many times. For this, we have the
Resize element:
Public Function WeightAverage(rng1 As Range, rng2 As Range)
Dim i!, wgtAvg#, r1r!, r1c!, r2r!, r2c!, V1 As Variant, V2 As Variant
r1r = rng1.Rows.Count: r1c = rng1.Columns.Count
r2r = rng2.Rows.Count: r2c = rng2.Columns.Count
V1 = rng1.Resize(r1r, r1c)
V2 = rng2.Resize(r2r, r2c)
rng2Sum = Application.WorksheetFunction.Sum(rng2)
For i = 1 To rng1.Count
wgtAvg = V1(i, 1) * V2(i, 1) / rng2Sum + wgtAvg
Next i
WeightAverage = wgtAvg
End Function
You can actually use the Resizing tool to read AND write from the worksheets. If you have an array of a certain size and you know it should go into a certain sized named range:
Range("NamedRange").Resize(100,20) = Array
which is SO much faster than one's first instinct:
For i = 1 to Array.Rows.Count
For j = 1 to Array.Columns.Count
Range("NamedRange").Cells(i,j).Value = Array(i,j)
Next j
Next i
Don't Mix Excel & VBA
What does that mean? If you are working with data and you need to perform a function, like summing a range, do NOT use
Application.WorksheetFunction. It is very slow compared to an efficient algo you can write using VBA. Rather than using
Application.WorksheetFunction.Min(a,b)
to find the minimum of two numbers, write your own:
Function Min(a as Long, b As Long) As Long
If a<b Then Min = a Else Min = b
End Function
Memory Management
You have a limited amount of RAM and that memory will be used for allocating variables into the memory blocks when you run VBA. There are a two major things you have to keep in mind when you deal with memory management: scope and type. Let me explain, scope is how long an initiated variable lasts. What does this mean? Since VBA uses, to some extent an object-oriented model (it's not truly object-oriented because VBA doesn't allow polymorphism [inheritence of upper-level properties] with classes...), we have objects in VBA. These include all the things you use in VBA: Range, Application, Worksheet...even Integer, Double, Single. So, when you create an instance of a variable,
Dim i As Integer, you have created a copy of an Integer and called it
i. Now this variable has a scope (lifetime). When the top-level routine ends, the variable (and thus, the memory block to which it is allocated) is freed up (in C++ you are used to
malloc, free, realloc, calloc, mallopt, mallinfo, alloca, or valloc) automatically. Enough of the techno-babble; what does this mean for us?
If you have to create a subroutine and inside you need to pull data from a worksheet that is ONLY used in that subroutine,
Dim(ension) your variable inside that subroutine so that when it ends, the block will be freed:
Sub INeedAnArray
Dim i%, A%()
For i = 1 to 1000
ReDim Preserve A(1 to i)
A(i) = i
Next i
End Sub
When the
Sub ends, both
i and
A will be freed up. Below is a list of variable types in VBA and how much space they take up in memory:
Bit - 1 bit (0 or 1)
Byte - 1 byte (0 to 255) = 8 bits
Boolean - 2 bytes ("True" or "False")
Integer - 2 bytes (-32768 to 32767)
Long (int) - 4 bytes (-2,147,483,648 to 2,147,483,648)
Single (fp) - 4 bytes (-3.402823e38 to -1.401298e-45 and 1.401298e-45 to 3.402823e38)
Double - 8 bytes (e308 to 3-324)
Currency - 8 bytes (-922 Trillion to 922 Trillion)
Decimal - 14 bytes
Date - 8 bytes
Object - 4 bytes
String (var) - 10 bytes + length of string (can be from 0 to ~2 billion)
String (fix) - length of string (1 to ~ 65,400)
Variant (num) - 16 bytes (any numeric up to double)
Variant (chr) - 22 bytes + length of string (can be from 0 to ~2 billion)
User-Defined - Number required by elements
Arrays - 20 bytes + 4 bytes for each dimension
Dermot Balson also did a study to find out which variable types were fastest with calculations in VBA. While one would expect the smaller variable types (Integer and Byte) to be fastest, it is actually the native processor-type variable (example: For most computers, 32-bit processor = 4 bytes = Long, Newer computers, 64-bit processor = 8 bytes = Double) that is the fastest because the computer can allocate the variables into the memory blocks without any down/up conversion into different non-native types:

The last thing I want to say is regarding the shortcuts you can use when defining variables. You will see that I do it often because I'm a prick...Actualy, I'm just lazy and it makes my VBA code look 10x cooler:
Dim i As Integer - Dim i%
Dim i As Double - Dim i#
Dim i As Single - Dim i!
Dim i As Long - Dim i&
Dim i As String - Dim i$
Dim i As Currency - Dim i@
Dim i() As Integer - Dim i%()
...
Arrays
Size
Let's get the basics out of the way. Imagine that an array is like a matrix, it has rows and columns and to reference a particular entry, you have to reference the position the entry is in the matrix (i.e. for the 2nd row and 3rd column's entry, you say
Array(2,3)). The Array can start at any numbering system, but it has to be indexed by whole numbers (i.e. you can say,
Dim A(-1 to 5) As Integer will give you 7 entries that must be integers and are indexed from -1, 0, ..., 5). The Array itself needs 20 bytes + 4 bytes for each dimension added + the number of entries and their corresponding memory requirements (i.e.
Dim A(1 to 6, -2 to 4) As Double will be 20 for the array + 4 * 2 for 2 dimensions + [ (6-1+1) + (4--2+1) ] * 8 for all the entries we're storing as Doubles = 132 bytes). As you can see, the size of an array can grow fast, but no need to worry because if you use the variable types that you need, you should be fine.
Nesting
You can store anything inside an array, even another array:
Sub ShowTwoArrays()
Dim A(1 to 1) As Variant
Dim B(1 to 2) As Integer
B(1) = 5: B(2) = 6
A(1) = B
Debug.Print A(1)(1) & vbNewLine & A(1)(2)
End Sub
Storing
As In the previous section, I had some code that put numbers 1 to 1000 into an array (when I was talking about scope).
For and
While loops are particularly useful for going through lists (arrays) and doing something with each entry (like summing, or moving averages). You can also loop through arrays when you need to pull data out by using the
LBound() and
UBound() functions, which give you the lower and upper index numbers, respectively:
Sub StorePrintArray()
Dim i%, c%, A%()
c = 500
ReDim A(1 to c)
For i = 1 to c
A(i) = i
Next i
For i = LBound(A) to UBound(A)
Debug.Print A(i) & ", "
Next i
End Sub
Dim, ReDim, ReDim Preserve
Dim(ension) initiates an instance of a variable type. If none is indicated, Variant (20+ bytes) is used -- not very efficient. You can ReDim(ension) an array to a different size, but that will ERASE THE DATA inside the array unless you use ReDim(ension) Preserve (data). Confused? Look:
Sub GinAndTonic()
Dim A%()
ReDim A(1 to 100)
For i = 1 to 100
A(i) = i
Next i
''--> Don't know if I've mentioned it yet, but ' starts a comment
''--> Now we have an array with 100 elements
''--> Let's say I wanted to add a 101st element...
ReDim A(1 to 101)
A(101) = 101
''--> Guess what? You lost entries 1 to 100
That line that said "
ReDim A(1 to 101)" should've read "
ReDim Preserve A(1 to 101)". That saves everything in A and adds on a 101st entry.
Option Base
You can put
Option Base 1 at the top of your module to start your arrays' indices at 1 instead of the standard 0. This also affects the arrays when you use the
.Resize method.
Hashes
Ahh, my favorite trick. Ok, let's look at the following example which presents a bunch of common problems. Let's say you have an array of 10,000 names (it's a string array,
A$(1 to 10000)). Here's the crux: You want to find out whether a name exists in the array before adding it to the end. What choice do you have but:
'' Assume that we have an array, A defined at a scope higher than the Sub
Sub AddNewName(name$) As Boolean
Dim i!, exists As Boolean
exists = False
For i = LBound(A) to UBound(A)
If (A(i) = name) Then exists = True
Next i
If Not (exists) Then
ReDim Preserve A(LBound(A) to UBound(A) + 1)
A(UBound(A)) = name
End If
End Sub
So what's wrong with this? Well for one, it's highly inefficient. You have to loop through the entire array to find a name each time. Well maybe this isn't so bad...We're just adding one name, right? *wrong* What if you have a list of names in a separate array, 100 names, that you want to check and add? Ahh...Now you must be thinking, "shit...". Well let's take a step back. What if you could
index an array by strings?. Then all you would have to do is check if the index exists, and if not, add a new index? That almost "reverses" the role of what we used the array for before --> Instead of storing names, we store indices and instead of checking through all the entries, we just check for a single index! This reduces our run-time from O(n) or worse to a constant, 1 (for our purposes at least). So how do we do this?
Dim d As Dictionary
Set d = New Dictionary
What? you don't see this as a variable type when you type? Ok, into your VBA pane and go to Tools --> References, and enable Microsoft Scripting Runtime. It uses the scrrun.dll library, which has the Dictionary element. This is basically the hash we are talking about. It is comparable to a hash element in Perl (
my %h).
'' Again, assume we have a higher-level dictionary, d, defined already
Sub AddName(name$)
If Not (d.Exists(name)) Then d.Add Key:=name, Item:=1
End Sub
Error Handling
Why?
When you run into errors, you can either see this retarded screen (which is highly unprofessional, semi-useful though it may be):

or you can see this nice MsgBox:

Not only for professionalism, but it's a cleaner error message and if you choose to press "End" on the default one, your code will stop at the breakpoint, even if you had some environment variables changed (I will talk more about that in Etiquette & Efficiency).
How?
Now the ideal error handling code (almost ideal, Dermot would add in an environment state for CalculationMethod as well...):
Sub WhateverSubYouHaveHere()
'------------------------------------------------------------------
Dim EE As Boolean, SU As Boolean, DA As Boolean
EE = Application.EnableEvents: Application.EnableEvents = False
SU = Application.ScreenUpdating: Application.ScreenUpdating = False
DA = Application.DisplayAlerts: Application.DisplayAlerts = False
AC = Application.Calculation: Application.Calculation = xlCalculationManual
On Error GoTo err_hdl
'------------------------------------------------------------------
...
[Rest of the code in your sub goes here]
...
'------------------------------------------------------------------
Application.EnableEvents = EE
Application.ScreenUpdating = SU
Application.DisplayAlerts = DA
Application.Calculation = CA
Application.StatusBar = "Ready"
Exit Sub
err_hdl:
Application.EnableEvents = True
Application.ScreenUpdating = True
Application.DisplayAlerts = True
Application.Calculation = xlCalculationAutomatic
Application.StatusBar = "Ready"
Call ErrMsg
End Sub
'' When an error occurs, Excel sets Err.Number 0, and this will trigger a message
Sub ErrMsg()
If Err.Number 0 Then MsgBox "Error " & Err.Number & vbNewLine _
& Err.Description & vbNewLine & Err.Source
End Sub
File Input/Output (I/O)
This is for reading a text file and/or writing to a text file. This is very useful with XML text files, or if you want to download some data and open it / parse it with Excel (although I suggest Perl for it), or plethora of other reasons.
How?
Best Way
Etiquette & Efficiency
Why?
Syntax