Skip to content
Home » Blog » Creating random names for test data

Creating random names for test data

Notes/Domino applications don’t just have code — they also store data, often a lot of it. But it can take years for them to accumulate enough documents for any performance issues to start seriously impacting users.

When designing an application, especially a brand new one, it’s important to performance test it with an unreasonable amount of sample data so any performance issues become evident immediately.

Or unexpected hard limits! Say your code depends on an array, and hasn’t taken into account array size limits. Or there’s an Integer variable somewhere that really should be a Long.

If you don’t learn about these problems before deployment, the original developers are long since scattered and forgot what they did on this project anyway, and meanwhile the app is unusable while you wait for a fix.

Lots and lots of names

list of autogenerated names

The best test data looks like real data. It’s not just for the looks — having a limited number of different values in your fields can fail to exercise some aspects of the application — such as a keyword field with a calculated selection list reading from a categorized view column, which will start to fail when it reaches 65KB of return data.

So let’s say your test documents need to contain a person’s name, and you want to generate 50,000 documents each with a different name — or maybe nearly all different. How do you automate that?

The random number generator is your friend here. You can always just have a list of first names, a list of last names, and randomly combine them. The number of combinations is the product of the number of choices for each part, so 50 first and 50 last names would give you… 2,500 unique names. Hm. Not quite hitting the 50,000 name target. Adding a random middle initial gets you to 65,000 — barely enough.

Picking 50,000 random names from a pool of 65,000 is likely to result in a lot of duplicates. If the names need to be unique, you have to keep track of which ones you’ve used. A List variable can do this. Using the names as key values, it’s easy to tell whether a name is already in it and keep trying until you get one that isn’t.

And you might have other strings you need to randomly generate also. It would be nice to have a systematic way to do this and not have to code it every time.

Well, rejoice! Here’s a class that incorporates all that logic in a reusable form.

As a bonus, it contains functions to anonymize real data by replacing names with randomly generated other names in a consistent way (i.e. the first time a name is encountered it’s randomly replaced, then subsequent occurrences of the same name are replaced with the same string).

This class can also be used to generate random IDs of other sorts, with a different list of “parts” to combine together, but generally it’s fine to just use sequential numbers for those, which makes uniqueness easy.

Book covers.

If you find this website helpful and want to give back, may I suggest buying, reading, and reviewing one of my excellent books? More are coming soon!

If you want email whenever there’s a new post, you can subscribe to the email list, which is 100% private and used only to send you information about stuff on this site.

Specification

The RandomNameGenerator class has these members:

  • Parts (write only): A string containing a newline-separated list of comma-separated lists of parts that will be used to construct names. For instance if Parts is set to “a,b<newline>1,2,3”, the class can generate up to six different strings by combining a or b with 1, 2, or 3. The default value can generate around 21 million names of people. Lines can contain a single value, e.g. “-“. “(space)” is interpreted as a single space if it’s on a line by itself. You may repeat values on the same line to adjust the likelihood of a particular value being selected. You can also mix in as many blank values (consecutive commas) as you like to control the likelihood of that part being excluded altogether.
  • Spacer (write only): A string to be inserted between the parts, default empty string.
  • Repeatable (boolean, read/write): applies to the MaskName function, which see.
  • PossibleCombinations (double, read only): returns the number of unique names possible under the current configuration. This may be an overestimate if there’s some overlap in your Parts values such that different parts may combine to generate the same string. It does, however, take into account duplicate part values within the same line. It’s recommended that you provide enough parts to combine into at least 10 times the number of unique names you’re likely to want.
  • Substitutions (list as string, read/write): If using MaskName, the list of substitutions that have been returned on previous calls. E.g. if Substitutions(“George”) = “Fred”, a previous call to MaskName(“George”) returned “Fred”.
  • Function GetName As String: returns a random name generated from the Parts supplied. The name is not stored, so if you call again you might get a duplicate value.
  • Function GetUnique(limit) As String: tries limit times to generate a random name that hasn’t already been used in this session. If that fails, returns “”, else the name. NOTE: a call to GetName doesn’t mark the name as used for this purpose.
  • Function GetUniqueName As String: Like getUnique, but will never return “” — it’ll come up with something unique, but it might add a hexadecimal number to the end to make sure of it. If you’ve provided enough Parts to combine into ten times the number of unique values you’ll need, the odds of this being necessary are very low — on the order of a random meteorite obliterating your winning PowerBall ticket.
  • Function MaskName(name) As String: Given a name, return a random made-up name to replace it with, for purposes of anonymizing data. If called again with the same input, it’ll return the same output (so records belonging to the same person can still be grouped even though you don’t know the person’s real name). If the Repeatable property is set True, MaskName will return the same name given the same input even between different sessions (the random name is generated using a hash of the input as the randomizer’s key value).

Note: If using MaskName with the Repeatable option, the name is less securely anonymized, since you can come back later and try a real name to see how it would appear in the output data. But someone who doesn’t have your Parts value, will not be able to reproduce this transformation.

Source code

%REM
	Library RandomNameGenerator
	
	Create random names and other strings for automating the creation of test data records.
	Also includes a function to transform names to random other names, for anonymizing data.
	
	© 2022 Andre Guirard

	Licensed under the Apache License, Version 2.0 (the "License");
	you may not use this file except in compliance with the License.
	You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
%END REM
Option Public
Option Declare

Public Const ERR_PACKED = 30440
Public Const MSG_PACKED = "Too difficult to find a unique name. Add more parts to allow for more combinations."
'Begin DNT
%REM
	Class RandomNameGenerator
	Description: Generate random multi-word or -syllable names from a set of values supplied by the caller, or from a default set.
	Constructor: New RandomNameGenerator
%END REM
Class RandomNameGenerator
	z_nameparts() As Variant
	z_used List As Integer
	z_inited As Boolean
	z_spacer As String
	z_times As Integer
	z_count As Long
	z_combinations As Double
	
	' if True, use a technique that will "mask" a name the same way every time.
	Public Repeatable As Boolean
	' for the MaskName function to remember the original names it masked.
	Public Substitutions List As String 
	
	%REM
		Property Set Spacer
		Description: Set what character is inserted between parts of the name (default, nothing).
	%END REM
	Property Set Spacer As String
		z_spacer = spacer
	End Property
	
	%REM
		Property Count (read only)
		Description: How many unique names have been generated
	%END REM
	Public Property Get Count As Long
		Count = z_count
	End Property
	
	%REM
		Property Set Parts
		Description: The Parts property is a string containing multiple lines of text, each line being a comma-delimited list of
			parts you want to paste together into a name. For instance if the value is "Anne,Ben<NL>Ertz,Took" then the possible names
			generated will be Anne Ertz, Ben Ertz, Anne Took, and Ben Took (assuming spacer is " ").
	%END REM
	Public Property Set Parts As String
		Dim lines, i%
		
		lines = Split(parts, {
})
		ReDim z_nameparts(0 To UBound(lines))
		For i = 0 To UBound(lines)
			If lines(i) = "(space)" Then lines(i) = " "
			z_nameparts(i) = Split(lines(i), ",")
		Next
		z_inited = True
		z_combinations = 0
	End Property
	
	%REM
		Sub UseDefaults
		Description: The caller didn't supply any name parts, so use the default lists.
	%END REM
	Private Sub UseDefaults
		Const DEFAULTPARTS = _
{Alexis,Andy,Anita,Arnold,August,Autumn,Bella,Ben,Bill,Carmen,Carol,Cheryl,Chloe,Chris,Dan,Dana,Dean,Delores,Denise,Dennis,Dexter,Elizabeth,Emile,Ethan,Evelyn,Ferris,Frank,Fred,Fritz,George,Greg,Gus,Hal,Hank,Helga,Hiro,Holly,Howard,Isaac,Ivana,James,Jennifer,Joan,John,Joseph,Juan,Judy,Julia,Justin,Karl,Keiko,Kelly,Kim,Kirk,Laura,Lex,Lily,Lisa,Lorraine,Manny,Maria,Mario,Mark,Martha,Mary,Michelle,Miriam,Mitch,Naomi,Ned,Nicole,Nita,Olga,Ozgur,Patti,Paul,Peter,Phil,Pippy,Rebecca,Rex,Richard,Samuel,Sanjay,Sarah,Sean,Sean,Sigmund,Sven,Tanita,Tate,Ted,Tip,Tony,Umberto,Vanessa,Vera,Vijay,Wei,Wendy,Xagra,Yasuko,Yentl,Yoshi,Zach,Zelda 
(space)
Pre,Fro,Des,El,Re,Dwo,Min,Cis,Xan,Bub,Chu,Ki,Zen,Quet,Asa,Non,Bre,Nim,Fez,Zek,Lop,Ek,Op,Um
boosi,a,ni,we,foo,jumi,resa,gero,ki,lu,tumi,jipy,free,nu,too,velu,pone,kro,hipi,re,fana
,ter,gen,berg,ly,man,son,chek,ski,pul,bur,vitch,zen,mar,tex,lit,kony,ther,plop,ver,ster
,,oni,gon,obu,etsi,akoi,ader,ettu,nivu,flar,jip,ikle,ings,oden,oopsi,ynds,akol,li,len}
		Parts = DEFAULTPARTS
	End Sub
	
	%REM
		Property PossibleCombinations (read only)
		Description: Figure out the number of possible unique names we can get from this system.
	%END REM
	Public Property Get PossibleCombinations As Double
		If Not z_inited then UseDefaults
		If z_combinations = 0. Then
			z_Combinations = 1
			ForAll thing In z_nameparts
				Dim tmp
				tmp = ArrayUnique(thing, 1)
				z_Combinations = z_Combinations * (UBound(tmp)+1)
			End ForAll
		End If
		PossibleCombinations = z_combinations
	End Property
	
	%REM
		Function randomelement
		Description: return a randomly selected element of a string array.
	%END REM
	Private Function randomelement(x) As String
		If UBound(x) = 0 Then randomelement = x(0) Else randomelement = x(Fix(Rnd * (1+UBound(x))))
	End Function
	
	%REM
		Function GetName
		Description: make up a name that's random but not necessarily unique.
	%END REM
	Function GetName As String
		On Error GoTo oops
		If Not z_inited Then UseDefaults
		Dim i%
		GetName = randomelement(z_nameparts(0))
		For i = 1 To UBound(z_nameparts)
			GetName = GetName & z_spacer & Randomelement(z_nameparts(i))
		Next
		Exit Function
oops:
		Error Err, Error & { //} & TypeName(Me) & {.} & GetThreadInfo(1) & {:} & Erl & (Erl-Getthreadinfo(0))
	End Function
	
	%REM
		Function GetUnique
		Description: Retrieve a unique name
		Arguments:
			tries: the number of randomly generated names to try before giving up finding a unique one.
		Returns:
			the name, or "" if it wasn't possible to find one that's unique
	%END REM
	Function GetUnique(ByVal tries As Integer) As String
		Dim tmp$
		do
			tmp = GetName
			If Not IsElement(z_used(tmp)) Then
				GetUnique = tmp
				z_used(tmp) = 1
				z_count = z_count + 1
			End if
			tries = tries - 1
		Loop While tries > 0
	End Function
	
	Function GetUniqueName As String
		On Error GoTo oops
		Dim i%, k%, tmp$
		GetUniqueName = GetUnique(20)
		If GetUniqueName <> "" Then Exit Function
		For i = 1 To 10000
			tmp = GetName & Hex(i) ' add a number to make it more likely to be unique
			If Not IsElement(z_used(tmp)) Then
				z_used(tmp) = 1
				z_count = z_count + 1
				GetUniqueName = tmp
				Exit Function
			End If
		Next
		Error ERR_PACKED, MSG_PACKED
		Exit Function
oops:
		Error Err, Error & { //} & TypeName(Me) & {.} & GetThreadInfo(1) & {:} & Erl & (Erl-Getthreadinfo(0))
	End Function
	
	%REM
		Function MaskName
		Description: Make up a random name to replace a name we're given, and remember in case we're asked to mask the same name again.
	%END REM
	Function MaskName(ByVal orig$) As String
		On Error GoTo oops
		If IsElement(Substitutions(orig)) Then
			MaskName = Substitutions(orig)
		Else
			If Repeatable Then
				Randomize fletcher32(orig)
			End If
			MaskName = getUniqueName
			Substitutions(orig) = MaskName
		End If
		Exit Function
oops:
		Error Err, Error & { //} & TypeName(Me) & {.} & GetThreadInfo(1) & {:} & Erl & (Erl-Getthreadinfo(0))		
	End Function
	
	%REM
		Function Fletcher32
		Description: Compute a position-dependent checksum or hash code of a string of unicode text.
			Fletcher is a common checksum algorithm, adapted here into LotusScript and treating each
			character as a word.
	%END REM
	Private Function Fletcher32(ByVal strdat$) As Long
		Dim ind%, limit As Long, sum1 As Long, sum2 As Long, pos As Long, tlen%
		limit = Len(strdat)
		sum1 = &hffff
		sum2 = &hffff
		While limit > 0
			If limit > 359 Then tlen = 359 Else tlen = limit
			limit = limit - tlen
			For ind = 1 To tlen
				sum1 = sum1 + Uni(Mid$(strdat, pos+ind, 1))
				sum2 = sum2 + sum1
			Next
			sum1 = (sum1 And &hffff&) + (sum1 \ &h10000)
			sum2 = (sum2 And &hffff&) + (sum2 \ &h10000)
		Wend
		Fletcher32 = CLng("&h" & Right(Hex(sum2), 4) & String(4, {0})) Or sum1
	End Function
End Class

Sample usage

Here’s the code to generate a list of twenty names such as in the above image:

Use "RandomNameGenerator"
Const NEWLINE = {
}
Sub Initialize
	Dim ans$, rg As New RandomNameGenerator, i%
	For i = 1 To 20
		ans = ans & NEWLINE & rg.getUniqueName()
	Next
	MsgBox Mid$(ans, 2), 0, "Possible: " & rg.PossibleCombinations
End Sub

1 thought on “Creating random names for test data”

Leave a Reply

Your email address will not be published. Required fields are marked *