The array datatype in LotusScript supports arrays containing up to 216 elements (215 if you don’t use negative index values). This is fine for most purposes, but what if you need a larger indexed collection?
The NotesJSONArray class will work if the data you want to store can be represented in JSON — primitive types and structures created from primitive types. It’s about half the speed of an array, and doesn’t support arrays of objects. I’m not sure how many elements it can hold — I tested it up to a million number values. The coding is also more complex than corresponding array-based code.
In this first of a series of posts about different specific data structures in LotusScript, I show how to create super large arrays via custom classes. I did two implementations, each able in theory to contain more elements than you actually have memory to store — a billion or more. I was able to store around 200 million Integers before my client crashed.
Note: an array variable of fixed size is further limited in size because local variables in a given “scope” can’t exceed 32KB — so it depends on the sizes of the elements. To get around this, use a dynamic array with the Redim statement.
You can also, without defining a special class, use the List datatype to manage large collections. The index of a List is a string rather than a number, but you can just convert your numeric index to a string and use that as the list key. You also have to do your own tracking of which is the next unused index. For convenience, I made a custom class which does all this for you.
So, below I show both an array-based and a list-based implementation. The array-based is 3 to 5 times faster.
If you find this website helpful and want to give back, may I suggest buying, reading, and reviewing one of my excellent books? More are coming soon!
If you want email whenever there’s a new post, you can subscribe to the email list, which is 100% private and used only to send you information about stuff on this site.
Specification
The implementation will consist of a class, LargeArray, whose array elements are Variants (may be any value, including Object values). Its methods and properties as follows:
- property Ubound as Long – the index of the highest-numbered array element. -1 if array is empty.
- property value(ByVal index As Long) as Variant – get/set the value of an element. Unassigned elements have value EMPTY. When setting an element beyond the current end of the array, the array grows to include that position. This may leave unassigned elements.
- sub GetValue(target as Variant, ByVal index As Long) – used by the caller when they to retrieve the element at position index when don’t know whether the element is an object — hence they don’t know whether to use Set or Let. The target parameter is passed by reference so the method assigns it to the requested value.
- sub Append(valu as Variant) – Add to the end of the array. This is the same as assigning property value(ubound+1).
- New — the constructor takes no arguments. An initial size isn’t specified — we will grow the array as needed.
In addition, the ListArray class, with a similar interface, uses a List-based implementation to do the same task. This is less code because all we’re doing is storing a list and keeping track of which is the highest-numbered element. It’s slower than LargeArray, but maybe uses less memory since it’s just allocating what it needs one element at a time as opposed to big chunks. Another nice thing about this approach is you can iterate through the internal list directly with a Forall loop (if you don’t mind a frown from the Object Oriented Programming Police for insufficient encapsulation).
I haven’t bothered to create a class based on JSON, because of its inability to store Object values and because even though it’s a built-in class, there’s no performance advantage compared to the array-based implementation.
Implementation comments
I chose to use arrays for this internally (as opposed to a linked list or tree or…) because access to them via an index is fast. The LargeArray datatype internally is a class containing an array where each element is an object containing an array. So if we allow up to 32000 elements in each array, we can store 32000^2, or a billion or so elements.
It’s easier to use a class that lets you append elements to a collection as needed, instead of having to specify the size up front. So I wanted to make it possible to just assign an element with an arbitrary index, and the array will grow as required. Behind the scenes, this is done with ReDim statements. Resizing an array while preserving its contents is fairly efficient, but we don’t want to have to resize arrays of thousands of elements each time an element is added. So we’ll allow for wasting some space for the sake of performance, and increase the array sizes in increments of 1000 elements.
For the list-based implementation, this is not a concern since the List code takes care of memory management for you.
Note: During testing, I found that accessing a List sequentially by key is much faster than accessing it in random order. Interesting.
Code
%REM Class LargeArrayNode Internal class used by LargeArray. There's not a lot of error checking here because we expect well-conditioned data from the caller. Constructor: New LargeArrayNode(initialIndex) The internal array is initialized to contain elements indexed 0 to at least initialIndex. %END REM
Private Class LargeArrayNode Private z_data() As Variant Sub New(ByVal initialIndex%) Initialindex = 999 + (Initialindex \ 1000) * 1000 ReDim z_data(0 To InitialIndex) End Sub%REM Property value (get or set) Description: read and write values from the internal array. A set operation will grow the array in 1000-element increments if it doesn't already contain an element with the specified index. %END REM
Public Property Set value(ByVal ind%) As Variant If UBound(z_data) < ind Then ReDim Preserve z_data(0 To (ind \ 1000) * 1000 + 999) End If If IsObject(value) Then Set z_data(ind) = value Else z_data(ind) = value End Property Public Property Get value(ByVal ind%) As Variant If ind <= UBound(z_data) Then If IsObject(z_data(ind)) Then Set value = z_data(ind) Else value = z_data(ind) End If End Property%REM Sub GetValue Description: Assign target argument the value at position index. The caller can also reference me.value(index), but this function lets them do it without needing to know whether the value is an object (which needs a Set statement to assign). %END REM
Sub GetValue(target, ByVal index%) Dim empty If index <= UBound(z_data) Then If IsObject(z_data(index)) Then Set target = z_data(index) Else target = z_data(index) Else target = empty End If End Sub End Class%REM Class LargeArray
by Andre Guirard Description: Like a standard zero-based array, but with ability to hold up to a billion elements in theory. Note: memory limitations will probably limit you to about 200 million elements. %END REM
Class LargeArray Private z_data() As LargeArrayNode' number of elements in array - indexes are zero-based.
Public Ubound As Long Sub New ReDim z_data(0 To 999) me.Ubound = -1 End Sub%REM Property value get, set Description: Assign or read an element of the array Arguments: ind: zero-based index of element being referenced. Need not be contiguous with existing elements. %END REM
Public Property Set value(ByVal ind As Long) As Variant Dim block%, blockLim&, offset%, aNode As LargeArrayNode If ind < 0 Then Error 9 block = ind \ 32000 offset = ind Mod 32000 If block > UBound(z_data) Then blockLim = block + 1000 If blockLim >= 32000 Then blockLim = 31999 ReDim Preserve z_data(0 To blockLim) End If Set aNode = z_data(block) If aNode Is Nothing Then Set aNode = New LargeArrayNode(offset) Set z_data(block) = aNode End If If IsObject(value) Then Set aNode.value(offset) = value Else aNode.value(offset) = value If me.Ubound < ind Then me.Ubound = ind End Property Public Property Get value(ByVal ind As Long) As Variant Dim block%, blockLim&, offset%, aNode As LargeArrayNode If ind < 0 Or ind > me.Ubound Then Error 9 block = ind \ 32000 offset = ind Mod 32000 Set aNode = z_data(block) If Not aNode Is Nothing Then aNode.GetValue me.value, offset End If End Property%REM Sub GetValue Description: Use this to get a value from the array if you're not sure whether the value you're getting is an object. The target argument is passed by reference, and will be assigned to the value at the specified index. Same as target = obj.value(ind) %END REM
Sub GetValue(target, ByVal ind As Long) Dim block%, blockLim&, offset%, aNode As LargeArrayNode If ind < 0 Or ind > me.Ubound Then Error 9 block = ind \ 32000 offset = ind Mod 32000 Set aNode = z_data(block) If Not aNode Is Nothing Then aNode.GetValue target, offset End If End Sub%REM Sub Append Description: Add a value to the end of the array. %END REM
Sub Append(valu) If IsObject(valu) Then Set value(me.Ubound + 1&) = valu Else value(me.Ubound + 1&) = valu End Sub End Class
And the ListArray class:
%REM Class ListArray Description: Array with Long numeric index, with a List as back-end storage. Constructor: New ListArray %END REM
Class ListArray' elements stored here, can be accessed using Forall
Public Data List As Variant Private z_ubound As Long Sub New z_ubound = -1& End Sub%REM Property Get Ubound Description: Return the largest index in use. %END REM
Public Property Get UBound As Long me.UBound = z_ubound End Property%REM Sub Append Description: Add to end of array, increasing Ubound by one. %END REM
Sub Append(valu As Variant) z_ubound = z_ubound + 1& If IsObject(valu) Then Set Data(z_ubound) = valu Else Data(z_ubound) = valu End Sub Private Sub Assign(a, b)' variable assignment that doesn't care whether value is an object.
If IsObject(b) Then Set a = b Else a = b End Sub Public Property Get value(ByVal index As Long) As Variant On Error Resume Next' return EMPTY if no such element.
Assign me.value, me.Data(index) End Property Public Property Set value(ByVal index As Long) If index < 0 Then Error 9' Subscript out of range
If index > z_ubound Then z_ubound = index If IsObject(value) Then Set Data(z_ubound) = value Else Data(z_ubound) = value End Property%REM Sub GetValue Use Call obj.GetValue(target, i) instead of target = obj.value(i) to assign target in cases where you're not sure whether the value is an object. %END REM
Sub GetValue(target, ByVal index&) Assign target, value(index) End Sub End Class
Fun fact: when calling NotesStream.Read, you have to be prepared that the returned array can have a lowerbound of -32768 if more than 32767 bytes are requested or remaining in the stream when no &length param is supplied.
Makes for interesting code when chomping through large Streams in LotusScript and you want to maximize blocksize for performance.
Interesting tidbit. Of course you realize I’m going to have to test how much difference using larger blocks actually makes now.
Watch it with long lines of accented Unicode. NotesStream corrupts it (SPR is open) My current “fix” in the JSON turbo class is to not write too large pieces of text to a NotesStream object. I use it in there to convert tfrom strings to array of Byte, which traverses faster than string operations.