9  Collections of values

Prerequisites

Before reading this chapter, you are recommended to have read Chapter 5.

While we’ve seen how to perform many computations so far, the amount of time and effort it takes to get them done is often more than if we’d just sat down and worked them out by hand. This is because computers (and Julia in particular) are not very well suited to doing a single complicated calculation, as the instructions we’d have to give it would be as long as the calculation itself. What they’re much better at is anything that boils down to doing the same simple thing repeatedly, which we can express in much shorter terms.

Note

This is an example of Kolmogorov complexity, which defines the complexity of an algorithm as the length of the shortest possible program that could implement it. In general, we’d consider shorter to be better, as it should take us less time to program.

This may sound quite limiting, but in fact it isn’t really, in fact it often serves as more of a benefit for streamlining some or all of the problem we’re trying to solve rather than a detriment where such repetition is infeasible. We’ve already met some of the tools needed to achieve this, namely loops from Chapter 5 and custom functions from Chapter 6. We’ll now look at how we can store this data, into several different types that we’ll term collections, and some tricks to operating on our data that are better optimised than the loops and functions that we may write. In particular, we’ll look at some of the subtypes of AbstractArray, all of which perform this purpose.

Figure 9.1: The subtypes of AbstractArray

We’ll also demonstrate our first example of computing something that would be slow and difficult to do by hand, by looking at Julia sets.

9.1 Tuples

The first type we’ll look at doesn’t actually appear on this diagram due to the way it is implemented, although it behaves similarly to many that do, and this is the Tuple. A Tuple brings several values together under one variable name, each of which can be individually retrieved if needed. We do this by separating the values with commas, and enclosing the whole collection in parentheses:

sthings = ("sassy", 6, Char(128013), typeof("s"))
("sassy", 6, '🐍', String)
tthings = ("tatty", 2, Char(128047), typeof(sthings))
("tatty", 2, '🐯', Tuple{String, Int64, Char, DataType})

We query a particular element of a Tuple by following the variable name (or indeed a Tuple literal) with square brackets enclosing the position of the element we want as a number (the index of the element):

(2, 4, 6, 8)[3]
6
Caution

In many programming languages, tuples or similar structures are indexed starting at 0 (so the equivalent of x[0] would return the first element of x). Julia takes the other approach, and starts indexing from 1 (so x[1] gives the first element), which tends to be more natural for beginners anyway.

Multiple elements can be returned at once, by indexing with some of the types we’ll meet later, such as using the Vector of Int64s [1, 4] to get the first and fourth elements, or the UnitRange 1:4 to get the first four elements.

sthings[[2,4]]
(6, String)
tthings[1:3]
("tatty", 2, '🐯')

The index : returns all elements, while end returns the last one (and end-1 the second last, etc.).

sthings[:]
("sassy", 6, '🐍', String)
tthings[end-2]
2

Also, there are functions that will return elements of a Tuple, such as first and last, which do exactly as you’d expect:

first(sthings)
"sassy"
last(sthings)
String

The function length gives the number of elements in a Tuple:

length(sthings)
4

Additionally, we can search through a Tuple with functions like findall, findmin, findnext. We’ll demonstrate findall, which applys a function to each element of the Tuple, and returns a Vector of indices where the function returns true:

findall(x -> x isa String, tthings)
1-element Vector{Int64}:
 1

If the elements all have the same type, other function may be used. For example, a Tuple consisting solely of Bools can be used with the functions any and all, which act much like || and && respectively:

any((4 isa Integer, "4" isa Integer, :four isa Integer))
true
all((4 isa Integer, "4" isa Integer, :four isa Integer))
false

Many other such functions exist, and all of these functions, as well as the means of indexing, will work on any other ordered collection such as those we describe below.

9.2 Arrays

An Array is a collection of many values of a given type (or subtypes thereof). These can be arranged all in a line (like a Tuple), or indexed in multiple dimensions, such as a two-dimensional table, or a three-dimensional set of coordinates in 3D space. Array is a parametric type, with parameters of the type of the elements, and the number of dimensions. For example, an Array could exist with type Array{Int64, 2}, which would mean that it consists of Int64s arranged in two dimensions.

Two “types” that don’t appear in the diagram above are Vector and Matrix, which are actually just aliases for varieties of Array: Vector{T} is a one-dimensional Array of type T, and Matrix{T} is a two-dimensional Array of type T. Most Arrays that we define will end up being one of these, unless more dimensions are specifically necessary.

9.2.1 Defining and indexing Arrays

To define a Vector (i.e. a one-dimensional Array), we seperate the elements we want with either commas or semi-colons:

numbers = [1, 5, 3, 8, -6]
5-element Vector{Int64}:
  1
  5
  3
  8
 -6
countries = ["India"; "Sweden"; "Brazil"; "Ghana"]
4-element Vector{String}:
 "India"
 "Sweden"
 "Brazil"
 "Ghana"

The type of a Vector can be specified by preceding the square brackets with the type name. This can construct Vectors with zero length, such as:

v = Int64[]
Int64[]

Vectors can be indexed just like Tuples, but using the same syntax, we can also change a particular element, treating it as we would a variable:

numbers[5] = 6
numbers
5-element Vector{Int64}:
 1
 5
 3
 8
 6
Note

The reason that we can do this is that, unlike Tuples, Arrays are mutable, allowing part of the data to be changed without having to rewrite the whole lot. For reasons of efficiency, this is not true for most types in Julia. Mutable types have some other different behaviour from normal immutable types, but we won’t discuss it here.

To define a Matrix, we can write its columns out as we would Vectors, and then concatenate them horizontally with spaces inside a new set of square brackets:

morenumbers = [[1, 2, 3] [-1, -2, -3]]
3×2 Matrix{Int64}:
 1  -1
 2  -2
 3  -3

Alternatively, we can use double semi-colons to mark a new column:

morecountries = ["Peru"; "Colombia";; "Egypt"; "Zambia"]
2×2 Matrix{String}:
 "Peru"      "Egypt"
 "Colombia"  "Zambia"

Finally, we may use a combination of spaces and line breaks to construct a matrix, with spaces separating elements in the same row, and line breaks separating rows from each other:

I = [ 1 0 0 
      0 1 0 
      0 0 1 ]
3×3 Matrix{Int64}:
 1  0  0
 0  1  0
 0  0  1

To index elements of a Matrix, we use two numbers, the first for the row number, the second for the column number, separated by a comma:

morecountries[2,1]
"Colombia"

It is also possible to use single numbers as indices for a Matrix, the order being to go down the columns first (so-called column-major ordering). This can be seen through the LinearIndices type, which gives each the numerical index of each element of the Matrix used in its construction:

LinearIndices(I)
3×3 LinearIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}:
 1  4  7
 2  5  8
 3  6  9

Similar principles apply to higher dimensions, in particular the use of semi-colons to denote dimension number to concatenate along, and the means of indexing the elements.

An Array specific relative of length is the size function, which returns a Tuple of the number of elements along each dimension of the Array (column-major ordering meaning that the number of elements in each column comes first):

size(morenumbers)
(3, 2)

9.2.2 Construction of arrays by functions

Another way we can construct an Array is by the use of a function which returns an Array. To get a Vector of type T and length n, we can use the Vector{T} constructor, with arguments undef (which defines an uninitialised Array, with arbitrary values that we’ll need to replace later) and the desired length n:

Vector{String}(undef, 3)
3-element Vector{String}:
 #undef
 #undef
 #undef

We can do the same with a Matrix, specifying two dimensions, or a general Array, specifying as many dimensions as we need:

Matrix{ComplexF64}(undef, 2, 4)
2×4 Matrix{ComplexF64}:
 1.5e-323+6.90729e-310im  2.0e-323+6.90729e-310im  …  2.5e-323+6.90729e-310im
 1.5e-323+6.90729e-310im  2.0e-323+6.90729e-310im     1.0e-323+6.90729e-310im
Array{Bool}(undef, 2, 2, 2)
2×2×2 Array{Bool, 3}:
[:, :, 1] =
 0  0
 0  0

[:, :, 2] =
 0  0
 0  0

We can also initialise Arrays with specified values, with functions such as fill, zeros, ones:

fill("ostrich", 2, 2)
2×2 Matrix{String}:
 "ostrich"  "ostrich"
 "ostrich"  "ostrich"
zeros(Rational{Int64}, 3)
3-element Vector{Rational{Int64}}:
 0//1
 0//1
 0//1

The rand function fills an Array of the given dimensions with (as best it can) uniformly random Float64s between 0 and 1:

rand(4,3)
4×3 Matrix{Float64}:
 0.481067  0.236753  0.70008
 0.124266  0.478081  0.493695
 0.647206  0.130313  0.519266
 0.252547  0.686053  0.521782

It can also randomly choose from a collection:

rand(1:100, 6)
6-element Vector{Int64}:
 25
 72
 26
 70
 39
 13

The collect function allows an Array to be created with the values of another sort of collection. We can see this demonstrated with a UnitRange and a String, which we discuss more later:

collect(1:5)
5-element Vector{Int64}:
 1
 2
 3
 4
 5
collect("mongoose")
8-element Vector{Char}:
 'm': ASCII/Unicode U+006D (category Ll: Letter, lowercase)
 'o': ASCII/Unicode U+006F (category Ll: Letter, lowercase)
 'n': ASCII/Unicode U+006E (category Ll: Letter, lowercase)
 'g': ASCII/Unicode U+0067 (category Ll: Letter, lowercase)
 'o': ASCII/Unicode U+006F (category Ll: Letter, lowercase)
 'o': ASCII/Unicode U+006F (category Ll: Letter, lowercase)
 's': ASCII/Unicode U+0073 (category Ll: Letter, lowercase)
 'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)

9.2.3 Adding and removing elements from a Vector

The mutability of Vectors allows another operation that Tuples can’t do, which is to add onto and take away elements from Vectors that we’ve already got. Let’s start with an empty Vector{Int64};

v = Int64[]
Int64[]

The append! function add a given element or elements to the end of a Vector:

append!(v, 6)
1-element Vector{Int64}:
 6
append!(v, [4, 2])
3-element Vector{Int64}:
 6
 4
 2

The prepend! function does the same at the start:

prepend!(v, 3)
4-element Vector{Int64}:
 3
 6
 4
 2

The functions push! and pushfirst! work similarly to append! and prepend! respectively. To insert an element at a specific index, we can use insert!:

insert!(v, 4, -1) # First number is the index, second is the value to insert
5-element Vector{Int64}:
  3
  6
  4
 -1
  2
Convention

Mutating functions, that is functions on mutable inputs which mutate one of the inputs (usually the first), are named with an ! at the end. For example, the functions above mutate the input v, changing the value of the variable and not just calculating an output.

To remove elements, we can use pop!. This gets returns the last element of the Vector, but also gets rid of it:

pop!(v)
2
v
4-element Vector{Int64}:
  3
  6
  4
 -1

Other elements can be removed by popfirst! and popat!:

popat!(v, 3)
4
v
3-element Vector{Int64}:
  3
  6
 -1

The function splice! combines popat! and insert!, allowing us to remove an element and replace it with one or more new ones at the same time:

splice!(v, 2, [2, 1, 0])
6
v
5-element Vector{Int64}:
  3
  2
  1
  0
 -1

For higher dimensional Arrays, such as a Matrix, this is not as simple (for example you can’t just remove the last element). In general, for reasons of cleanness of code and efficiency, Arrays should be initialised as the size that they need to be to begin with, having their values filled in later, but sometimes you won’t be able to do this, in which case the mutating functions are your best option.

9.2.4 Array comprehension

There’s a third way of filling Arrays with even more control over the initial values, which is array comprehension. This acts much like a for loop, indeed it uses much of the same syntax, filling an Array with some function of the elements in a collection. Instead of writing the elements of the Array between the square brackets, we write a rule for generating them:

[i^2 for i  1:10]
10-element Vector{Int64}:
   1
   4
   9
  16
  25
  36
  49
  64
  81
 100
[first(c) for c  countries]
4-element Vector{Char}:
 'I': ASCII/Unicode U+0049 (category Lu: Letter, uppercase)
 'S': ASCII/Unicode U+0053 (category Lu: Letter, uppercase)
 'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)
 'G': ASCII/Unicode U+0047 (category Lu: Letter, uppercase)

These give Vectors, as the for loop iterates over one-dimensional collections. We can get a two-dimensional Matrix either by iterating over two one-dimensional collections:

[i*j for i  1:10, j  1:10]
10×10 Matrix{Int64}:
  1   2   3   4   5   6   7   8   9   10
  2   4   6   8  10  12  14  16  18   20
  3   6   9  12  15  18  21  24  27   30
  4   8  12  16  20  24  28  32  36   40
  5  10  15  20  25  30  35  40  45   50
  6  12  18  24  30  36  42  48  54   60
  7  14  21  28  35  42  49  56  63   70
  8  16  24  32  40  48  56  64  72   80
  9  18  27  36  45  54  63  72  81   90
 10  20  30  40  50  60  70  80  90  100

or by iterating over one two-dimensional collection, such as another matrix:

[first(c) for c  morecountries]
2×2 Matrix{Char}:
 'P'  'E'
 'C'  'Z'

Following this with if inside the square brackets allows us to skip elements which don’t meet a condition:

[i for i  1:10 if iseven(i) && i < 7]
3-element Vector{Int64}:
 2
 4
 6

9.3 Other ordered collection types

9.3.1 Ranges

We’ve already met a type of range in Chapter 5, which used the syntax 1:10 to represent the numbers from 1 to 10 (or similar). This notation extends further:

  • a:b means the numbers a, a+1, a+2, continuing to add 1 until we would exceed b by adding 1 more
  • a:d:b means the same, but the gap between successive numbers is now d instead of 1

It is also possible to specify ranges using the range function, which allows four keyword parameters to be set, start, stop, step, and length. Only three of these are necessary (indeed you get an error if you try to specify all four), and they allow for options of evenly spaced ranges, including using Complex numbers:

range(start = 1, step = 1im, length = 5)
1 + 0im:0 + 1im:1 + 4im
Warning

Despite how this is displayed, 1 + 0im:0 + 1im:1 + 4im will not work if you try to type it out, because it tries to call range with the start, stop, and step keywords. For Complex values, where a notion of < doesn’t exist, having a stopping point is difficult to work with, so length should be specified instead of stop.

The exact type of the range returned depends on the values you start with, but any parametric subtypes of UnitRange, StepRange, or StepRangeLen are possible.

typeof(1:6)
UnitRange{Int64}
typeof(1:2:7)
StepRange{Int64, Int64}
typeof(range(start = 1, step = 1im, length = 5))
StepRangeLen{Complex{Int64}, Int64, Complex{Int64}, Int64}

The remaining subtype of AbstractRange is LinRange, which is a performance optimised but error prone alternative to specifying a range with start, stop, and length.

9.3.2 CartesianIndices

A CartesianIndex is a type that acts much like a Tuple, but is specifically meant as an index for an Array.

ci = CartesianIndex(1,2)
CartesianIndex(1, 2)

Indeed, we can use a CartesianIndex as an index:

A = [ 1 3
      2 4 ]

A[ci]
3
A[1,2]
3

Despite being Tuple-like, CartesianIndex is not considered a collection of items (there’s a specific error if you try to index one of them). However, a collection of all possible CartesianIndex indices of an Array does exist as its own type, CartesianIndices, which is a subtype of AbstractArray, and so does have the expected indexing behaviour. We can either give it an Array:

indicesA = CartesianIndices(A)
CartesianIndices((2, 2))
collect(indicesA)
2×2 Matrix{CartesianIndex{2}}:
 CartesianIndex(1, 1)  CartesianIndex(1, 2)
 CartesianIndex(2, 1)  CartesianIndex(2, 2)

or a Tuple of ranges, to form the grid of possible indices:

rangeindices = CartesianIndices((1:2:5, 3:-1:0))
CartesianIndices((1:2:5, 3:-1:0))
collect(rangeindices)
3×4 Matrix{CartesianIndex{2}}:
 CartesianIndex(1, 3)  CartesianIndex(1, 2)  …  CartesianIndex(1, 0)
 CartesianIndex(3, 3)  CartesianIndex(3, 2)     CartesianIndex(3, 0)
 CartesianIndex(5, 3)  CartesianIndex(5, 2)     CartesianIndex(5, 0)

9.3.3 Strings

Strings are again not a subtype of AbstractArray, and for good reason, they perform a wildly different function most of the time (see Chapter 4), and don’t share behaviour like broadcasting that we’ll see later. However, it is worth mentioning them, as they exhibit similar indexing properties as we’ve seen before, and in this way can sometimes behave as if they were a Tuple consisting of Chars:

"daffodil"[3:4]
"ff"
length("daffodil")
8

If we want to use Strings like Tuples or Vectors of Chars, it may be more useful to convert them into that form. Luckily, these are both easily done, using the constructor Tuple or the function collect respectively:

Tuple("daffodil")
('d', 'a', 'f', 'f', 'o', 'd', 'i', 'l')
collect("daffodil")
8-element Vector{Char}:
 'd': ASCII/Unicode U+0064 (category Ll: Letter, lowercase)
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
 'f': ASCII/Unicode U+0066 (category Ll: Letter, lowercase)
 'f': ASCII/Unicode U+0066 (category Ll: Letter, lowercase)
 'o': ASCII/Unicode U+006F (category Ll: Letter, lowercase)
 'd': ASCII/Unicode U+0064 (category Ll: Letter, lowercase)
 'i': ASCII/Unicode U+0069 (category Ll: Letter, lowercase)
 'l': ASCII/Unicode U+006C (category Ll: Letter, lowercase)

One reason you may want to do this is that indexing of Strings doesn’t quite work like you might expect (due to the way that they are stored, as we investigated in Chapter 4). Instead of the first character having index 1, and the second having index 2, it’s the first byte that has index 1, and the second that has index 2. As there are more possible Chars than one byte can distinguish, some require multiple bytes to store (due to the way String is implemented, this happens whenever the numeric value of the Char is at least 128), in which case the indices used for the letters are not always simply 1, 2, 3. For example, "αa" is a three byte string, where 'α' takes up bytes 1 and 2, and 'a' takes up byte 3.

When we index a String, we want to use the index of the first byte of the character we care about. Any other index will give a StringIndexError. We can demonstrate this in the case of "αa":

"αa"[1]
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
"αa"[2]
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'α', [3]=>'a'
"αa"[3]
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

To avoid these complications, you can either turn the String into a Tuple, or you can use an iterator such as eachindex which contains all the valid indices:

eachindex("αa")
Base.EachStringIndex{String}("αa")
collect(eachindex("αa"))
2-element Vector{Int64}:
 1
 3

9.4 Unordered collection types

All of the above types we’ve encountered are ordered, in the sense that it’s meaningful to ask what the first element is, what the second is, etc. However, sometimes we don’t even need this information, and Julia provides types that ignore this for the sake of efficiency.

9.4.1 Set

A set is a collection of values, much like a Vector or a Tuple, but without an order, and without duplicates. We can construct a set by giving our collection of values in one of the other formats we’ve seen above:

numbersset = Set(1:6)
Set{Int64} with 6 elements:
  5
  4
  6
  2
  3
  1
magicset = Set("abracadabra")
Set{Char} with 5 elements:
  'a'
  'c'
  'd'
  'r'
  'b'

Although these have an order of sorts when displayed, this is merely for the purpose of displaying them, and indexing doesn’t work.

numbersset[1]
ERROR: MethodError: no method matching getindex(::Set{Int64}, ::Int64)

This doesn’t make them useless, however. Instead, they serve a different purpose, being optimised for searching and set operations. Searching can be done with the in function, returning true or false depending on whether the queried value is found in the Set:

in(2, numbersset)
true

The in function can also be written inline, or replaced by its alias (written by tab-completing \in):

7 in numbersset
false
'b'  magicset
true

Set operations are also defined, such as union/ (A ∪ B is the Set containing all elements that appear in A or B, or both), intersect/ (A ∩ B is the Set containing all elements that appear in A and in B), and setdiff (setdiff(A,B) is the Set containing all elements of A that don’t appear in B).

numbersset  Set([10, 11, 12])
Set{Int64} with 9 elements:
  5
  4
  6
  2
  11
  10
  12
  3
  1
magicset  Set("abcdef")
Set{Char} with 4 elements:
  'a'
  'c'
  'd'
  'b'
setdiff(magicset, "abcdef")
Set{Char} with 1 element:
  'r'

These functions also work for ordered collections such as Tuples or Vectors, but on larger scales are quicker with Sets.

9.4.2 Dict

A Dict (short for dictionary) is a little like a Tuple or a Vector, but the indices can be whatever we choose. This again means that there isn’t necessarily any order to a Dict, since there isn’t necessarily an obvious order to the indices. There are several ways to initialise a Dict, but we’ll start with an empty one, Dict(). Before we run this, we may wish to choose the types that will be used for the keys (that is, the indices), and the values (the data which we store for each key). These go in curly brackets like a parametric type (indeed, that’s what they are!), for example below, we have chosen Chars as keys, and Int64s as values:

frequencies = Dict{Char, Int64}()
Dict{Char, Int64}()

Alternatively, we can give a list of initial values, as a Vector of Tuples, with each tuple containing a key followed by the desired value:

capitals = Dict([("France", "Paris"), ("Nigeria", "Abuja"), ("Chile", "Santiago")])
Dict{String, String} with 3 entries:
  "Nigeria" => "Abuja"
  "Chile"   => "Santiago"
  "France"  => "Paris"

Finally, we can do the same with a sequence of Pairs. Pair is a type storing exactly two values, written with => between them, and they can be used to construct a Dict by putting them as inputs into the Dict constructor, each following the format key => value:

shades = Dict(
    "red" => ["scarlet", "blood", "ruby"],
    "green" => ["lime", "forest", "british racing"],
    "blue" => ["sky", "ocean", "navy"]
)
Dict{String, Vector{String}} with 3 entries:
  "blue"  => ["sky", "ocean", "navy"]
  "green" => ["lime", "forest", "british racing"]
  "red"   => ["scarlet", "blood", "ruby"]

To add values to a Dict, we can use the indexing syntax that we’ve seen before. If the key doesn’t exist, we’ll get an error when trying to query a value, but setting a value works for any key, and will either add or overwrite that value into the Dict.

lipsum = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

for c  lipsum
    frequencies[c] = haskey(frequencies, c) ? frequencies[c] + 1 : 1
end

frequencies
Dict{Char, Int64} with 28 entries:
  'n' => 24
  'f' => 3
  'd' => 18
  'E' => 1
  'e' => 37
  'o' => 29
  'D' => 1
  'h' => 1
  's' => 18
  'i' => 42
  'r' => 22
  't' => 32
  ',' => 4
  'q' => 5
  ' ' => 68
  'a' => 29
  'c' => 16
  'p' => 11
  'm' => 17
  '.' => 4
  'U' => 1
  'L' => 1
  'g' => 3
  'v' => 3
  'u' => 28
  ⋮   => ⋮

We can list out the keys and the values, although as before, the order will be meaningless:

keys(capitals)
KeySet for a Dict{String, String} with 3 entries. Keys:
  "Nigeria"
  "Chile"
  "France"
values(capitals)
ValueIterator for a Dict{String, String} with 3 entries. Values:
  "Abuja"
  "Santiago"
  "Paris"

We can also delete keys (and the values that go with them) with pop! or delete!. These have the same effect on the Dict, but produce different outputs, pop! returns the value corresponding to the deleted key, while delete! returns the new Dict:

pop!(shades, "green")
3-element Vector{String}:
 "lime"
 "forest"
 "british racing"
delete!(shades, "blue")
Dict{String, Vector{String}} with 1 entry:
  "red" => ["scarlet", "blood", "ruby"]

Dicts are an efficient way to store values under meaningful names without having to define a large number of variables. For this reason, one of their most common usages is as a lookup table, where a common calculation can be precomputed for the possible values that it could take, and then its answer looked up instead of doing the calculation all over again every time. For example, instead of counting the number of times a particular character appears in the text lipsum, we can look up the count in frequencies:

frequencies['s']
18
count(==('s'), lipsum) # ==('s') returns the anonymous function x -> x == 's'
18

9.5 Interaction with functions

These types are already quite useful in storing data in their own different ways, but where they really come in handy is using them in combination with functions. One example of this is that the standard linear algebra rules for multiplying matrices by scalars, vectors, or other matrices, work seamlessly with the usual operations like *, /, ^, and the Vector and Matrix types:

A = [ 2 3 
      5 1 ]

2A
2×2 Matrix{Int64}:
  4  6
 10  2
x = [-1,4]

A*x
2-element Vector{Int64}:
 10
 -1

More widely applicable are the ., ..., and ... operations (those last two are different!) that allow for functions to act on collections, and collections to be input into functions. These are named broadcasting, splatting, and slurping.

9.5.1 Broadcasting

When introducing the many different types above, we’ve seen some examples of functions which take collections as an input. However, often we don’t want to apply a function to the collection as a whole, instead we want to apply it to each element individually. This is achieved by broadcasting a function.

The broadcast function provides the first way of doing this. Its first argument is the function that we want to broadcast, and the rest of the inputs will serve as the arguments to that function. For example, compare the following:

A * A
2×2 Matrix{Int64}:
 19   9
 15  16
B = broadcast(*, A, A) # B[i,j] = A[i,j] * A[i,j]
2×2 Matrix{Int64}:
  4  9
 25  1

Broadcasting will break if the sizes of the inputs don’t match. However, if the sizes do match, but some dimensions are missing, Julia will smartly account for the missing dimensions and index the inputs accordingly:

C = broadcast(*, A, x) # C[i,j] = A[i,j] * x[i]
2×2 Matrix{Int64}:
 -2  -3
 20   4
D = broadcast(^, A, 2) # D[i,j] = A[i,j] ^ 2
2×2 Matrix{Int64}:
  4  9
 25  1

There are two further ways of broadcasting, with shorter syntax. If a function has a name (i.e. isn’t anonymous), following its name with a dot . has the effect of broadcasting:

sinpi.(0:0.2:2)
11-element Vector{Float64}:
  0.0
  0.5877852522924731
  0.9510565162951536
  0.9510565162951536
  0.587785252292473
  0.0
 -0.587785252292473
 -0.9510565162951535
 -0.9510565162951535
 -0.587785252292473
  0.0

Most infix operators support a similar syntax, with the . coming before the operator this time:

A .* A
2×2 Matrix{Int64}:
  4  9
 25  1

Also, we can start the line with @. (which calls the macro @.), and this broadcasts all of the functions in that line, without us having to specify each individually:

@. sin(2^(-0.5:0.2:0.5) + 0.4)
6-element Vector{Float64}:
 0.8944084355923738
 0.9364087671697013
 0.9718672058520753
 0.99510124105
 0.9981796067983775
 0.9705200191672874
Convention

Note that while array comprehension can achieve the same outcome, broadcasting tends to be more efficient. For example, the @. broadcast above could be equally calculated as the following:

[sin(2^t + 0.4) for t  -0.5:0.2:0.5]
6-element Vector{Float64}:
 0.8944084355923738
 0.9364087671697013
 0.9718672058520753
 0.99510124105
 0.9981796067983775
 0.9705200191672874

but this is noticeably slower for operations on larger collections.

9.5.2 Splatting and slurping

Prerequisites

For this section, you should know how to write a custom function (Chapter 6).

Another way we may want to apply a function to an collection is to have the elements serve as individual arguments to the same function call. We do this by splatting, effectively emptying the elements in the order that they are indexed as arguments.

Using the vector x = [-1, 4] above, we can splat this into the - function to subtract the second value from the first.

-(x...)
-5

This isn’t particularly useful at the moment, although it could be situationally handy. Splatting tends to be more useful for certain functions that take an arbitrary number of inputs, such as +:

+(A...)
11

How do we write a function that has arbitrarily many inputs then? The answer is the visually identical but conceptually distinct syntax of slurping. If the last argument in the function definition is followed by ..., then it will collect together any further inputs into the function into a Tuple with that variable name.

inputcounter(args...) = length(args)
inputcounter (generic function with 1 method)
inputcounter(4, 12, "Friday", :cos)
4

We can see how splatting and slurping can work together, first emptying the values into the function call, and then collecting them back up into a Tuple:

inputcounter(A)
1
inputcounter(A...)
4

9.6 Example: Julia sets

Since we’re working in Julia, it would be remiss not to use it to generate its namesake mathematical objects: Julia sets (technically filled Julia sets). These are fractal sets, composed of the points where iterating some function again and again does not lead you off towards infinity. They are related to the more famous Mandelbrot set, but their name happens to be shared with a programming language instead of a type of biscuit. They will also serve as a useful way to demonstrate some of the Array operations seen above. Before we start programming, we’ll look at the definition, so we know what to draw.

9.6.1 Mathematical definition of the Julia set

First, we need a function to iterate. We will use the same functions as used by the Mandelbrot set, which have one complex parameter \(c\), and are defined by: \[ F_c(z) = z^2 + c \]

The (filled) Julia set with parameter \(c\) is then the set of points in the complex plane where repeated application of \(F_c\) is bounded (i.e. the magnitude of the answers doesn’t go off to infinity): \[ \mathcal{J}(c) = \left\{ z \in \mathbb{C} : \left| (F_c)^n(z) \right| \not\to \infty \text{ as } n \to \infty \right\} \]

Here, the power of \(n\) on the function \(F_c\) means applying that function repeatedly \(n\) times (so \(F_c(F_c(\dots(F_c(z))))\)).

As an example, we’ll consider \(\mathcal{J}(0)\). The function \(F_0\) is simply \(F_0(z) = z^2\), so: \[ (F_0)^n(z) = (((z^2)^2)\dots)^2 = z^{2^n} \]

Therefore, \(\mathcal{J}(0)\) is the set of points where \(z^{2^n}\) is bounded as \(n \to \infty\), which is the unit disc, so \(\mathcal{J}(0) = \{ z \in \mathbb{C} : |z| \leqslant 1 \}\).

To demonstrate how close this is to the Mandelbrot set, here is its definition: \[ \mathcal{M}(c) = \left\{ c \in \mathbb{C} : \left| (F_c)^n(0) \right| \not\to \infty \text{ as } n \to \infty \right\} \]

or equivalently: \[ \mathcal{M}(c) = \left\{ c \in \mathbb{C} : 0 \in \mathcal{J}(c) \right\} \]

9.6.2 Drawing Julia sets

Prerequisites

For this section, you should know how to write a custom function (Chapter 6) and use a package (Chapter 8).

It’s all well and good to have this definition, but as we saw in Chapter 3, translating mathematics to code is not straightforward, and in this case, we run into the same issues of having to represent infinite concepts finitely. In particular, there are two approximations that we have to make:

  • We can’t take a limit as \(n \to \infty\) as we would like. Instead, we’ll set a limit of the number of times we iterate, and assume that if \((F_c)^n(z)\) doesn’t get too large in that many iterations, it never will. If this limit on \(n\) is large enough, this will be a decent approximation

  • We can’t check whether every point in the complex plane is or isn’t in the Julia set (or even a finite area within it). Instead, we’ll choose ranges of \(x\) and \(y\) values, which will give us a lattice of points \(z = x + yi\) that we can check individually

For this purpose, we’ll define two parameters to balance accuracy with speed of computation. n will be the maximum number of applications of our function before we consider it not to have diverged, and h will be the distance between successive x or y-values, and so controls how fine the lattice will be.

n = 1000
h = 0.005
0.005

Why do we define these values outside of our functions, instead of just hard-coding the values where they are needed in the function code? Because it’s much easier to tweak and change them to get the outputs that we’re looking for, all we need to do is redefine a variable, or change the inputs to a function call, rather than rewrite and recompile a function.

The iteration function is as simple as a for loop, using z = z^2 + c to have the effect of applying the function repeatedly. We stop the loop and return a result early if the absolute value of z is ever 2 or more (we can prove that when abs(c) < 2, if abs(z) is ever greater than 2, it will never be less than 2 again and will diverge to infinity). The returned value of the function could just be true or false, but we’re actually returning the number of the iteration if it breaks early, which we’ll use to make a more interesting visualisation.

function iterateF(z, c, n)
    for i  1:n
        z = z^2 + c
        abs(z) < 2 || return i
    end
    return 0
end
iterateF (generic function with 1 method)

Here, we’ve included n as an input, instead of using the variable n that we’ve defined, even though the value of n will end up being given as that input. This is good practice, as it’s easier to change the inputs to a function than redefine a variable and then rerun a function. Also, if this function was in a different package or module, the variable n may not be in scope, so it wouldn’t work anyway.

Convention

As a rule of thumb, the only variables that should be referred to within a function are the inputs, those defined in the function, and any const values that will be in scope.

Naturally, we want a visual output for the Julia set, and there are multiple ways of doing this, although all will require the use of one or more packages. The most elegant solution uses the Colors package, which includes the RGB type to represent colours in the familiar RGB format. The package displays RGB types not with text, but as a small image of that colour, and moreover a Matrix (i.e. a 2-dimensional Array) of RGBs is displayed as an image with pixels of those colours arranged as they are in the Matrix. As with all packages (see Chapter 8), this will first need to be installed using Pkg, then added to our toolkit with using.

using Colors

Now all we need is a function which can output an image. We begin by setting up a Matrix of values for z₀ (the initial values for z), which we’ll call z₀s. The Matrix is constructed by array comprehension, using a range of x and y values between -2 and 2 (as that is where the Julia set lies) graduated by the parameter h as described above. The ordering of the indexing sets needs to be carefully chosen so that the top left corner is -2.0 + 2.0im, the bottom left is -2.0 - 2.0im etc.

z₀s = [x₀ + y₀*im for y₀  2:-h:-2, x₀  -2:h:2]

We then want to run the function iterateF on each of these, which is simple enough to do by broadcasting using the . syntax. Since the other inputs c and n are not Arrays, they will be the same for every time the function is run with a different value of z.

iterations = iterateF.(z₀s, c, n)

This gives us a new Matrix of values, which we want to interpret and turn into colours. The usual way of doing this for Julia sets is to colour the inside of the set in black, and colour the outside of the set as a gradient, with the colour depending on how long it took to get to abs(z) > 2 and for the iteration to stop. Hence, we chose the output to iterateF that we did – the more iterations needed, the higher the output value, but if we run out of iterations, we return 0 to delineate a sharp boundary.

To define the gradient, we know the smallest value in the Matrix iterations will be 0 (which will be the black of the inside of the Julia set), but we need to know the largest value to pin the other end of the gradient to. The max function will do this, but it takes an arbitrary number of individual numeric inputs, not a single Array as we currently have, so we need to splat iterations to empty its values out into the max function. Since it is conceptually similar to the variable n, we’ll overwrite that and call this new value n as well (although there’s no real need to).

n = max(iterations...)

The RGB constructor we’ll use takes three floating-point inputs between 0.0 and 1.0 as proportions of the maximum amount of red, green, and blue present in the colour. By dividing each entry of iterations by n, we’ll get such a number, and raising these to various powers will change the scale on which the intermediate values lie, allowing us to give the gradient a coloured tint. For instance, sending the entry i to RGB((i/n)^0.5, (i/n)^1, (i/n)^0.5) gives a purple hue, as we’ll see in the output below. To apply this to the whole Matrix, we’ll using broadcasting again, this time with the broadcast function and an anonymous function to encode the operation.

broadcast(i -> RGB((i/n)^0.5, (i/n)^1, (i/n)^0.5), iterations)

Putting this all together, we have the function juliaset:

function juliaset(c, n, h)

    z₀s = [x₀ + y₀*im for y₀  2:-h:-2, x₀  -2:h:2]

    iterations = iterateF.(z₀s, c, n)

    n = max(iterations...)
    return broadcast(i -> RGB((i/n)^0.5, (i/n)^1, (i/n)^0.5), iterations)
    
end
juliaset (generic function with 1 method)
Note

Since the function juliaset always returns the same calculated value, the return keyword is redundant here, as functions will always return the last calculated value if no return is specified. It’s included only for clarity.

Before we run this, we’ll need another package, called ImageShow. While Colors allows for a Matrix of RGBs to be displayed as an image, it is limited to smaller matrices than we’re using (our value of h being H = 0.005 results in an 801 by 801 Matrix). If you were to run this, it suggests installing and using the ImageShow package to help with this, which is what we will do.

using ImageShow
Warning

In some environments (such as Pluto), the ImageIO package may be required as well. This can be added simply by installing through Pkg as usual, and then running:

using ImageIO

Now, we can run the function, and see what the Julia set looks like for some different values of c:

juliaset(0.3531 - 0.3165im, n, h)
juliaset(-0.2252 + 0.7049im, n, h)
juliaset(-0.2359 - 0.9173im, n, h)

9.6.3 Drawing the Mandelbrot set

We noted earlier how similar the mathematical definitions for the filled Julia set and the Mandelbrot set are, and indeed the same is true to generate images of them. The only change we need to make is to start with a Matrix of values of c, not z, and change the inputs of iterateF accordingly. We also change the ranges of x and y values to better frame the Mandelbrot set, and tweak the colour function for a different shade:

function mandelbrotset(n, h)

    cs = [x + y*im for y  1.5:-h:-1.5, x  -2.5:h:1.5]

    iterations = iterateF.(0, cs, n)

    n = max(iterations...)
    colours = broadcast(i -> RGB((i/n)^0.3, (i/n)^0.7, (i/n)^1), iterations)
    
end
mandelbrotset (generic function with 1 method)
mandelbrotset(n, h)

You could, of course, go further with this. Here are some ideas that you may wish to try:

  • The Julia set has two-fold symmetry, since \(z\) and \(-z\) have the same result whenever they are input (as the first operation is squaring). Hence, to make the code (approximately) twice as efficient, you need only compute half of the diagram, and rotate it to get the other half.

  • Both the Julia set and the Mandelbrot set are known for their fractal nature, where zooming in to an area of their boundary gives similar complexity. Therefore, you could change the functions to accept different ranges of x and y values so that you could examine parts of the sets in closer detail.

  • You could change the colouring to have a more colourful gradient than we’ve done here, as can be seen in many images online. All that would be required for this would be a more complicated anonymous function (or perhaps you would prefer to write an actual function for this purpose).