The making of Australia's Pioneers, Heroes and Fools

ISBN 9781741960488, written by Peter Macinnis, published November 1, 2007 by Pier 9 in Australia

Cover, Australia's Pioneers, Heroes and Fools This book owes a lot to a lot of people. Most of them gave their time selflessly to contribute to Australian scholarship. I used what they did to make a book, but I believe it is worth acknowledging and explaining what I did. This explanation is for people comfortable with macros and jumping text from Word to Excel and back again. If this is not you, don't go past the fourth paragraph!

In the first place, I owe thanks to the people who collected and collated a large number of historical records which were printed in limited editions, a few of which I found in libraries. Then there were the volunteers who transcribed some of the mss, saving me from having to puzzle through unfamiliar handwriting.

Most of all, though, I owe a debt of gratitude to the people at Project Gutenberg Australia. These lovely people have scanned and proofed most of the journals of the explorers, and that was my starting point for a database. I added text from other sources, but my major source was Project Gutenberg Australia.

If I describe what I did with their pieces, that effectively covers what I did with the mss and other sources as well, so read this with a bit of imagination.

First, I downloaded the text files, then I trimmed the Project Gutenberg end matter and pasted the rest into Microsoft Word. I have a standard macro which takes all double carriage returns and converts them to &&##, then it converts all single carriage returns to a space, before converting &&## to a carriage return. That fixed almost all of the paragraphs, but next, the macro replaced any double carriage returns with singles (3 times) and replaced double spaces with singles (5 times).

This left me with fairly neat text, which I was ready to work on. My aim was to select paragraphs to go into a database. I faced a major problem, in that spelling often differed: a crocodile in one account might be am alligator in another, or even an alegator! So I decided to add keyword codes to the entries.

I also needed to attach a date, the name of the explorer and other details, and I knew that with some 5300 entries, mistakes would creep in, unless I let the machine do the hack work. I needed to be able to sort by day of month, month and year, so these were separate fields.

I will let you in on a secret: I have been living with repetitive strain injury since 1981, and I maintain a gruelling writing schedule. I survive because I know how to make the machine do the work. I decided that rather than enter into Excel by pasting, I would do the initial pasting in Word, so I set up a couple of macros to add tab-delimited fields to a paragraph. Later, these fields would distribute into different columns.

I created a paragraph header like this: 21{TAB}July{TAB}1847{TAB}John{TAB}Smith{TAB} -- in this case, the first entry in Smith's journal was 21 July 1847.

I copied this, and then set up a macro that added that on the front or a paragraph, highlighted the paragraph and changed the text to blue (to make it stand out -- my colour vision is deficient, and blue is better than red) and then added a string of tabs at the end. At the right point for the keyword codes, I added a single backslash -- I will come back to this later.

Then I started to go through the journal. If I chose a paragraph, I just ran the macro: if the passage I wanted was less than a paragraph, I selected the portion I wanted, made it into a paragraph, and ran the macro.

If there was more than one selection from a given date, I had a field, just after date, where I manually added a sequential number -- in the sorting that would follow, I wanted to make sure paras could always be returned to their order.

When the date rolled over, I pasted the old header, changed the date, and cut it to the clipboard.

When I had been right through the journal, I sorted the paragraphs. Then I cut the ones at the top, all the ones starting with dates, and pasted them to a new file before closing the old file (I did not want to save THOSE changes!).

Now I was ready to add keywords, but I did not need massive amounts of typing and occasional typos, so I used mnemonic codes. \ta was transport animals, \tac and \tah were transport using camels and transport using horses and so on. So now I could add labels like \dac for dangerous animals crocodiles or \das which was snakes, and so on. Now I went to the leading backslash and added the appropriate codes. There were no spaces anywhere in the codes.

Now I was ready to paste the results into an Excel scratchpad where the dates and sequential numbers were converted to a unique value that I could sort on to get the entries back in order. Then I could add the journal name in the top row and copy that down, and add a unique record accession number before pasting back into Word as unformatted text.

There usually seemed to be a few extra clean-ups needed to remove extraneous spaces, then I was ready to add the data to the database. While I sometimes ported the results into Access, for most purposes, a flatfile database in Excel was all I needed, and I made that the native format.

I repeated this for each volume, and then I was ready to convert the keyword codes, which were in a separate Excel worksheet. The codes were in column A, the meaning was in column B.

Creating a macro to convert 200 codes was a bit hairy. First, I had to sort the codes in Excel, according to length: it was essential to convert \tahf before \tah before \ta -- think about it! Now macros in Word (yes, Word -- bear with me!) are large and sloppy. Trial and error revealed that macros converting more than 50 codes fell over. I also knew that macro conversion is tedious, so I wrote a macro to convert just three codes and broke into it.

Once you have some repetition, parsing the code is easy. I identified the whole sequence, then using the =CONCATENATE command in Excel, I put the entire code for a conversion in column C. There were a few problems like line breaks and quote marks, but I used stray symbols for those. Then I copied all the conversion codes into Word, replaced the stray symbols with the necessary line breaks and quote marks. Now I was ready: I selected and copied the first 50 codes, opened my macro with the editor and pasted over the codes.

I now opened a new file, and carefully copied the key codes column and pasted it into that file, ran a search and destroy to remove ANY spaces and ran the macro. Then I went back to the codes file, moved the second fifty codes into the macro, went to the other file and ran the macro, and so on. At the end, I ran a check for any outstanding back-slashes. These were the invalid codes, and each was then converted manually, once I worked out what it should be.

Next, I took the file of converted codes, and pasted them back into Excel. Now I had the tool I needed, ready to group the events of a given day, or to find the instances of dangerous insects, or whatever. Everything was in Excel, and could be manipulated into other formats as needed.

And that's enough for one night. It ought to be -- it took me two and a half years!

But I came back later and added some more information about macros that you can find here.

And now for something completely different

Something about my methods for handling bibliographies.

bibliography (365K) The figure on the right shows how I set my bibliographies out in a spreadsheet. The important thing to notice is Column G, where all of the information stored in the other columns is pulled together by an =CONCATENATE command.

There are three of these: one for books, one for journals and one for web resources, and they read as follows:

Book resources 9from G2)

=CONCATENATE(B2&", "&C2&". "&D2&": "&E2&", "&F2&".")

Journals and periodicals (from G425)

=CONCATENATE(B425&", '"&C425&"'. "&E425&", "&F425&".")

Web resources (from G433)

=CONCATENATE(B433&", "&C433&". "&D433&", "&F433&", last viewed "&E433&".")

From there, you are on your own. Note that Excel is very unforgiving about the syntax in these strings. Extra spaces, missing brackets, forgotten ampersands will all make it baulk.

Sp why do I bother, when there is brilliant software that does this sort of thing? It wasn't around when I started, and because I always use the same format and often come back to the same references in other works, it suits me to be able to port stuff across, rather than use Endnotes or whatever. That's just me.

One of the good things about this is that I can assign priorities in another column, sort by that or by the "seen" field, and within a given library, sort by call number, making shelf searches easier. A file name for a PDF or a GIF is my short-hand that the file in question is on my computer. There is also a comments field for me to note the references which proved to be absolutely useless.


This file is http://members.ozemail.com.au/~macinnis/writing/pioneers2.htm

It was created on October 31, 2007 and last revised on June 30, 2011.

The home page of this set is here.


Since I started this site, it has drawn visitors.