6 Creating custom structures

Prerequisites

Before reading this chapter, you are recommended to have read Chapter 5.

In Chapter 2, we already met variables, functions, and types at a basic level. We’ll now look at them in a little more detail, in particular focusing on creating our own custom functions and types to morph Julia’s behaviour to our needs. We’ll also endeavour to better understand variables, primarily their scope.

6.1 Custom functions

6.1.1 Defining a new function

Functions allow preset algorithms to written and referred to by a name, much like variables allow for a value to be referred to by name instead of remembering it. Julia has plenty of inbuilt functions, for all sorts of task, and the ecosystem of packages (see Chapter 8) only adds to that. However, these functions are written to serve as the ingredients, not the end product, and most of the time they aren’t specialised enough to do what we want to do in one simple call. Instead, we need to build up these ingredients into a new function of our own.

Note

Technically, what we’re writing here are not functions, but methods for functions. Functions are the object with the name, and are what are accessed when we call them, but each function can have many methods which dictate the code that will actually run. Methods don’t have names as such, instead they’re distinguished by different patterns of inputs, and choosing which method to run when is the point of multiple dispatch, which we’ll meet in Chapter 7. For instance, range is a function, with four methods:

methods(range)

# 4 methods for generic function "range" from Base:
 [1] range(; start, stop, length, step)
     @ range.jl:147
 [2] range(start; stop, length, step)
     @ range.jl:142
 [3] range(start, stop; length, step)
     @ range.jl:144
 [4] range(start, stop, length::Integer)
     @ range.jl:145

For now, we’ll use the words function and method mostly interchangeably.

First, we need to give the function a name (the same as we would a variable), as well as give variable names to the inputs to the function that we expect. To tell Julia that we want a function, we use the keyword function, followed by the name that we want to give it. Then, we list out the number of inputs (also called arguments) that we want, giving each of them a name as well to be referred to later. If there is more than one input, the they need to be separated by commas, in the form of a Tuple, just as multiple inputs to a function do when we call it.

function spherevolume(radius)

Convention

In Julia, function names are usually written in lowercase with no spaces. They should also be named to not conflict with existing functions, so as not to overwrite them. We can, however, add onto existing functions, as we’ll see in Chapter 7.

Next comes the code, doing whatever calculations we need it to do on the inputs. This works much the same as code anywhere else, so we won’t really focus on it here. Indeed, for some later examples, you may not recognise all the syntax being used, but you can feel free to ignore it if so. All that’s important is that something fills in this gap.

    volume = 4/3 * π * radius^3

Convention

This code is indented once compared to the level of the function keyword, for readability purposes to clearly show which code belongs to the function.

Finally, we want to return our answer, which will be what we get back when the function is run. Using the return keyword before a value tells the function to return that as the answer (this is particularly useful when combined with conditionals from Chapter 5), but if return is never specified, the function will default to returning the last value it calculated. We then finish off the block with end.

    return volume
end

Note

In this case, the volume is the last thing calculated, so we don’t need return. However, it’s clearer what’s going on if we use it to begin with.

Putting this all together, we have a function:

function spherevolume(radius)
    volume = 4/3 * π * radius^3
    return volume
end

spherevolume (generic function with 1 method)

We can use this just as we would any other function:

spherevolume(1)

4.1887902047863905

spherevolume(0.781592641796772)

1.9999999999999996

Alternatively, functions can be written using more algebraic syntax. We do away with the function-end block, instead writing in one line how to calculate the output from the inputs

cubevolume(length) = length^3

cubevolume (generic function with 1 method)

cubevolume(1)

cubevolume(1.2599210498948732)

2.0

Convention

For longer functions, combining this syntax with a begin-end block can give the same effect as using function-end. However, the latter is preferable in most cases; the algebraic-style syntax is intended as convenient shorthand for the normal function-end syntax for quick functions.

It’s possible to make functions that return multiple arguments, simply by following return with a list of comma-separated values. For example, we can mimic the inbuilt function minmax, which given two inputs, returns the smallest followed by the largest:

function minmaxagain(x, y)
    if x > y
        return y, x
    else
        return x, y
    end
end

minmaxagain (generic function with 1 method)

minmaxagain(4,5)

(4, 5)

minmaxagain(5,4)

(4, 5)

If we provide one variable as an output, it will take the value of the Tuple containing all outputs:

x = minmaxagain(5,4)

(4, 5)

(4, 5)

However, if we provide two, the first will get the first value, and the second the second:

y, z = minmaxagain(5,4)

(4, 5)

Using certain symbols as function names (particularly those in the Symbol, Math category in Unicode) allows them to be used automatically as infix operations, as we are used to with the likes of +, -, *, etc.:

±(x, y) = (x + y, x - y)

± (generic function with 1 method)

0.8 ± 0.03

(0.8300000000000001, 0.77)

6.1.2 Prescribing the inputs

Not all functions should accept all inputs, in fact, there are very few that should! What’s more, we may want the function to do different things depending on the inputs, via multiple dispatch. To remedy this at least partially, Julia provides type declarations, allowing us to prescribe what types are allowed for specific inputs.

To declare the type of a specific argument, we follow the variable name we’ve given it by two colons :: and the type name:

function spherevolume(radius::Float64)

Often, there may be many types which would work as the input. If there is a natural supertype that encompasses all of these, we can use that, even if it is an abstract type:

function spherevolume(radius::Real)

More unusual possibilites can be realised with the Union type, which takes other types as parameters, and acts as a supertype for all of them:

# Can't use this for `Bool`, as `true + one(true)` gives `2`, which isn't a `Bool`
function nextinteger(x::Union{Signed,Unsigned})
    x + one(x)
end

nextinteger (generic function with 1 method)

Here, the use of Union is justified, since we do the same thing for Signed and Unsigned inputs, and we can’t use Integer because we want to exclude Bool. However, if the algorithm of your function changes significantly depending on the exact types it gets as inputs, then multiple methods for the function with each of those cases covered should be written. To see this, and more complicated type declarations, see Chapter 7.

What we can’t do in this way is restrict to only certain values within a type. For example, we can’t declare that the radius in spherevolume needs to be positive, because multiple dispatch sees only types, not values. This will require a check in the body of the function, for example:

radius < 0 && throw(DomainError(radius, "sphere must have positive radius."))

Another way we may wish to prescribe the inputs is by giving them default values, making specifying them optional. The following function cuts a String s after the nth character, if it is long enough. Here, it’s combined with type specification, but it needen’t be.

function cutstring(s::String, n::Int64 = 1)
    length(s) < n ? s : s[1:n]   
end

cutstring (generic function with 2 methods)

cutstring("hydrogen", 6)

"hydrog"

cutstring("boron", 6)

"boron"

cutstring("gold")

"g"

Note

When we defined cutstring, we can see that it defined a function with 2 methods, instead of 1 method like the other functions we’ve defined. This is because, in the background, it’s defined a second method:

cutstring(s::String) = cutstring(s, 1)

with only one input, corresponding to what happens if you don’t give n a value.

Functions with lots of inputs can be clunky to call, as you need to remember exactly what order the inputs need to go in. Instead, we can write functions with keyword arguments, which differ from normal positional arguments in that you specify them by name, not by order.

The volume of a cone of radius \(r\) and height \(h\) is given by \(\frac{1}{3} \pi r^2 h\). However, writing a function for this, it’s not clear what order radius and height should go. We could make a choice, but for the sake of example, let’s make them keyword arguments, so that someone using the function can’t get them the wrong way around. Arguments are usually separated by commas, but after a semi-colon, all arguments become keyword arguments:

# No positional arguemnts, since the semi-colon is before all arguments
conevolume(; radius, height) = 1/3 * π * radius^2 * height

conevolume (generic function with 1 method)

If we try to call the function as normal, we’ll get an error:

conevolume(1, 1)

println("ERROR: MethodError: no method matching conevolume(::Int64, ::Int64)")

ERROR: MethodError: no method matching conevolume(::Int64, ::Int64)

Instead, we need to specify which argument is which by naming them:

conevolume(radius = 1, height = 1)

1.0471975511965976

Since they have names, the order is now irrelevant:

conevolume(height = 1, radius = 1)

1.0471975511965976

Just as with positional arguments, keyword arguments can have types declared or default values given. Function can also have a combination of positional and keyword arguments:

function friedeggs(eggs; people = 2)
    println("This recipe serves $(people) people.")
    println("You will need $(eggs * people) eggs.")
    println("To make fried eggs, simply crack the eggs into a pan and wait.")
end

friedeggs (generic function with 1 method)

friedeggs(5)

This recipe serves 2 people.
You will need 10 eggs.
To make fried eggs, simply crack the eggs into a pan and wait.

friedeggs(2; people = 12)

This recipe serves 12 people.
You will need 24 eggs.
To make fried eggs, simply crack the eggs into a pan and wait.

6.1.3 Anonymous functions

We’ve already seen two different syntaxes for defining functions, but in fact, there’s a third:

f = x -> x^2 - 3x + 2

#15 (generic function with 1 method)

This function behaves just as others do:

f(1)

f(10)

However, it’s a little different. This is no longer a function called f, it’s a variable called f whose value is a function. Using the methods functon to list the methods, we get:

methods(f)

# 1 method for anonymous function #15:

(::var"#15#16")(x) in Main at In[31]:1

As we can see here, f is what is called an anonymous function, which is a fitting description as they don’t have a name like normal functions do. Anonymous functions don’t participate in multiple dispatch, and so can only have one method, but can be used more easily as a variable. Their main use is for functions that themselves take functions as arguments, such as minimum, which finds the minimum value that a function takes on a given set of inputs:

minimum(f, 0:0.01:3)

-0.25

Indeed, we don’t even need to give an anonymous function a variable name, and we can enter it as a literal:

minimum(x -> x^2 + 4x - 3, -3:0.01:-1)

-7.0

6.1.4 Piping and composing functions

A common occurrence in programming with functions is the need to apply several functions one after the other to the same value. The problem is, this can lead to a mess of parentheses. Let’s say we have a variable called capital:

capital = "Antananarivo"

"Antananarivo"

We want to do two calculations on it. First, we want to count the number of unique letters (ignoring upper and lower case), which we can do as follows:

length(unique!(sort!(collect(lowercase(capital)))))

Also, we want to find if the last appearance of the letter 'a' is an even number of letters from the end:

iseven(findfirst('a', reverse(lowercase(capital))))

false

Both of these are a bit of an eyesore. Julia provides two ways of helping with this, each with their own benefits. The first is piping, which uses the |> operator to successively apply functions to the output of the previous step:

capital |> lowercase |> collect |> sort! |> unique! |> length

This can also include anonymous functions as arguments, which can be useful when we need to add an input to one of the functions along the way:

capital |> lowercase |> reverse |> (x -> findfirst('a', x)) |> iseven

false

This is much cleaner, as it better separates the many function applications into a readable format. The second option we have is composition, which is done with the ∘ operator (typed by tab-completing \circ):

(length ∘ unique! ∘ sort! ∘ collect ∘ lowercase)(capital)

Under the hood, ∘ creates a special type of function called a ComposedFunction, which works just like a normal function, but calling each of its component parts one at a time in the prescribed order. This means we can give it a variable name, and use it as we would a normal function:

contrivedfunction = (iseven ∘ (x -> findfirst('a', x)) ∘ reverse ∘ lowercase)

iseven ∘ var"#21#22"() ∘ reverse ∘ lowercase

typeof(contrivedfunction)

ComposedFunction{ComposedFunction{ComposedFunction{typeof(iseven), var"#21#22"}, typeof(reverse)}, typeof(lowercase)}

contrivedfunction(capital)

false

Note

While we aren’t thinking of them as such here, all functions are actually just a special type of variable. Specifically, that special type is a subtype of Function, named typeof([function-name]), with the property that preceding a Tuple with a function’s name starts multiple dispatch on the methods stored under that name. They are also const values (see later), so cannot be redefined as any other type.

6.2 Custom types

6.2.1 Types of types

Types are all important in letting Julia know how to deal with the data we give it. As we’ve seen above, they are invaluable in determining what values a method of a function can accept, and Chapter 3 gives an idea of why we might choose a particular format to store our data in (e.g. Float64 for speed, Rational{BigInt} for accuracy) Much as we can add to Julia’s many functions with our own, we can do the same with types. First though, let’s cover the different flavours of types which Julia affords us.

The most basic types are primitive types, where the data is stored primitively, meaning it is just zeroes and ones in memory (of course, everything is just zeroes and ones eventually, but most types have intermediate structure). For instance, Int64 is a primitive type, where any data in this format is 64 consecutive bits of memory that exactly correspond to the value we mean by it.

Most commonly, types are composite types, with various fields, each with a given value. We’ll see what this actually means when we come to define one, but an example for now is ErrorException, which is the type you get when using the function error (we’ll see more of Exception types in Chapter 12).

e = ErrorException("an error")

ErrorException("an error")

# The throw function causes an error object to turn into an error event
throw(e)

ERROR: an error

It has a single field msg of type AbstractString, which stores the error message to be displayed. These can be listed by the function fieldnames applied to a type, or propertynames applied to an instance of that type:

fieldnames(ErrorException)

(:msg,)

propertynames(e)

(:msg,)

To access the fields, we use a . followed by the name of the field:

e.msg

"an error"

This can also tell us what types the fields have, by getting the types field of ErrorException:

ErrorException.types

svec(AbstractString)

Related to the composite type is the parametric type, which behaves much the same, but it also requires one or more other types in curly braces after its name to specify exactly what data it represents. An example of this from Chapter 3 is Rational, which requires a type parameter to tell us what format the numerator and denominator are in:

typeof(4//5)

Rational{Int64}

typeof(BigInt(4)//BigInt(5))

Rational{BigInt}

Parameters are usually other types, but don’t need to be, for example Array (see Chapter 9) has two parameters T and N, which define the type of the elements T, and the number of dimensions N. The first of these paramters is a type, but the second is an Int64 value. The possibilities for what can be a parameter are:

Any type may be used as a parameter
Any primitive type may be used as a parameter
Any bits type may be used as a parameter. This is a type whose data is stored entirely in primitive types, for example Rational{Int64}, as this has two fields num and den both of the primitive type Int64. To check if a value can be used as a parameter, you can use the isbits function:

isbits(4//5)

true

isbits(BigInt(4)//BigInt(5))

false

However, there may be reasons why we wouldn’t allow every possible parameter. For instance, Rational needs to ensure that its parameter is an Integer, and we’ll enforce this same constraint in an example below when considering inner constructors.

A mutable type can be either composite or parametric, but it has the property that its fields can be edited after it is instantiated. The data corresponding to a variable with a mutable type is stored not as values, but as pointers which serve as addresses to where the values are stored, and therefore these values can be changed without changing any of the data defining the variable. The prototypical example of a mutable type is Array (which is also parametric), as described in Chapter 9.

If a composite type or a parametric type has no fields, and is immutable, then there is no way to distinguish any one instance of that type from any other. This makes it a singleton type, examples of which include function types such as typeof(sin), and Nothing which represents the absence of a value.

typeof(Nothing())

Nothing

fieldnames(Nothing)

()

FInally, an abstract type cannot take a value, but instead serves as a label collecting together many related types, allowing them to be referred to as a collective (such as for the purposes of multiple dispatch). Again, we return to Chapter 3 and numeric types for an example of this, with Number encompassing all possible numeric types, Real excluding complex numbers, Integer excluding fractional numbers, and so on.

Some inbuilt types don’t fit nicely into these categories, particularly those more fundamental to the inner workings of Julia. For example, String isn’t primitive, but still doesn’t have any fields; it’s mutable, but you can’t change its characters. However, any types that we define will follow the usual rules.

6.2.2 Defining a new type

To define a new composite type, we use the keyword Struct, followed by the name we want to give the type. Then, on

struct Animal
    name::String
    symbol::Char
    legs::Int64
end

Convention

Types are named in upper camel-case, meaning every word is capitalised, with no spaces separating words. For examples, consider types that we have already met, like String, BigInt, and Function.

We can instantiate a variable of this type by using the name of the type like a function, with the arguments being the values we want to give to the fields in the same order that we defined them:

elephant = Animal("Elephant", '🐘', 4)

Animal("Elephant", '🐘', 4)

flamingo = Animal("Flamingo", '🦩', 2)

Animal("Flamingo", '🦩', 2)

We use dot syntax to query the value of one of the fields, following the variable with a ., and then the name of the field we want to know:

elephant.symbol

'🐘': Unicode U+1F418 (category So: Symbol, other)

If these were mutable types, we could use the same syntax followed by an = to change their values. However, Animal is immutable, so we get an error:

flamingo.legs = 1

ERROR: setfield!: immutable struct of type Animal cannot be changed

It is also possible to declare where our new type belongs in the type graph, by giving it an abstract type as a parent node (called the supertype). This can be done by following the type name with <:, and then the abstract type:

struct Prime <: Integer
    p::BigInt
end

p = Prime(BigInt(5))

Prime(5)

p isa Integer

true

If no parent is specified, it defaults to Any, which is the abstract type at the top of the tree, which is fine for most purposes. Indeed, specifying a supertype will likely oblige you to implement methods for telling Julia how to treat it for functions common to that supertype, for instance anything that comes under Number should be able to be added, subtracted, multiplied, divided, etc. with other Numbers.

We can also define our own parametric types, by following our type name with curly braces, inside which we can give variable names to our list of parameters.

struct Doublet{T}
    first::T
    second::T
end

Doublet{Int64}(10, 20)

Doublet{Int64}(10, 20)

We don’t need to specify the parameter T if it can be implied:

Doublet(:ten, :twenty)

Doublet{Symbol}(:ten, :twenty)

We can restrict these parameters if needs be, for example if we needed the parameter T to be numeric, we could have instead started with:

struct Doublet{T <: Number}

Note

The other varieties of types can also be defined:

If you want to create an abstract type to add to the type graph, replace struct with abstract type (note that you can’t specify any fields, since an abstract type can never take a value)
If you want to make your type mutable (see Chapter 9), replace struct with mutable struct
If you (for some reason) want to create your own primitive type, replace struct with primitive type, and add the number of bits you want your type to take up in place of fields. However, be warned that most problems are made more complicated by using primitive types instead of composite types

6.2.3 Inner constructors

For some composite and parametric types we create, it may suffice to simply specify the fields and their types. However, we may want to have more flexibility, or further restrictions, in defining instances of our new type. To do this, we’ll want to make use of constructors.

Constructors looks very much like a method of a function, and can be thought of as such, behaving the same way when to calling it with a Tuple of inputs, and participating in multiple dispatch. What’s special about them, however, is that the function’s name is the type which they construct, for instance calling a constructor for the type Rational{Int64} might look like Rational{Int64}(4, 5). In fact, we’ve already used constructors unwittingly in Chapter 3, to convert between the various numeric types in Julia, and in Chapter 5, to create exceptions.

The first type of constructor we’ll look at is the inner constructor. These are defined inside the struct block (hence inner), after the fields are listed, and their primary purpose is to impose further restrictions on the fields than simply their types. To create an inner constructor, we use the same syntax as we did when creating a function (either the function-end block or the algebraic f(x) = syntax will work, but not the anonymous function syntax as we don’t want to create an anonymous function).

Convention

The arguments of this function should be the fields, in the same order as listed before, with the same names and type declarations. While this function will be called just like any other, with the variable names strictly local to the function, it’s pointlessly more difficult to read if the field names are changed or reordered.

The body of the function will consist of whatever checks we need to do, with either the fields corrected if possible, or Exceptions thrown when the values are irretrievably wrong. However, since there doesn’t exist a function to create a new instance of our type to return (that’s what we’re writing with the inner constructor), we need to use the special function new. The new function is exclusive to inner constructors, and simply instantiates a variable of the type in question with fields as listed in the arguments. For example, a type with two fields that we wanted to have values 2 and "two" would be created by new(2, "two"). Mostly, this will be the value you want the function to return, so we can finish the function with a call to new, creating and returning our new object back to us.

For a familiar example, let’s create a type called Password, in which we’ll store a single field of type String to represent the password. This will be a parametric type, with a single parameter N which will be an Integer. Using values as parameters is another task for inner constructors, as we can’t do it with type declarations. We also want to impose some restrictions on the password:

It must contain at least N characters in total (so if N is zero or negative, then any length will be allowed)
It must contain at least two letters and two numbers
It must contain at least one character than isn’t a letter or a number

The tools from Chapter 5, plus some simple errors that should be understandable but we’ll cover in more detail in Chapter 12, suffice to make these checks, so we’ll use them to build our inner constructor:

struct Password{N}
    word::String

    function Password{N}(word::String) where N
        # Checks that N is an integer
        N isa Integer || throw(TypeError("parameter for `Password` must be an `Integer`."))

        # Checks length
        length(word) < N && throw(ArgumentError("password must contain at least $N characters."))
        
        # Counts the number of each character type
        letters = numbers = others = 0
        for char ∈ word
            isletter(char) ? (letters += 1) : (isnumeric(char) ? (numbers += 1) : (others += 1))
        end

        # Raises errors if any character types are not represented enough
        letters < 2 && throw(ArgumentError("password must contain at least 2 letters."))
        numbers < 2 && throw(ArgumentError("password must contain at least 2 numbers."))
        others < 1 && throw(ArgumentError("password must contain at least 1 character other than letters and numbers."))

        # If no error, creates the new `Password`
        new{N}(word)
    end
end

Note

The where keyword used here is required when defining constructors of parametric types. In this instance, we’ve only needed to use it to tell Julia to define N locally as the name of a parameter, rather than trying to substitute in the value of some variable called N. More uses of where are demonstrated in Chapter 7.

Let’s try this out:

Password{8}("great")

ERROR: ArgumentError: password must contain at least 8 characters.

Password{8}("fantastic")

ERROR: ArgumentError: password must contain at least 2 numbers.

Password{8}("fanta5t1c")

ERROR: ArgumentError: password must contain at least 1 character other than letters and numbers.

Password{8}("fanta5t1(")

Password{8}("fanta5t1(")

There is more that can be done with inner constructors, including writing functions with different names to change how the type can be constructed, or leaving mutable types with some fields uninitialised, to be added in later, but we’ll leave it there for now.

6.2.4 Outer constructors

The other type of constructor is the outer constructor, so called because they are defined outside of the struct block. These each take a different pattern of inputs, so that multiple dispatch knows which to call, and will use another constructor (possibly initially a sequence of outer constructors, but ultimately the inner constructor has to be called eventually) to return an instance of the desired type.

We’ll use this to add onto our Password type. Instead of coming up with our own password, let’s say that giving an Integer instead of a String as an argument to the Password{N} constructor will generate a password of that length for us:

const LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklnopqrstuvwxyz"
const NUMBERS = "0123456789"
const SYMBOLS = "!@#\$%^&*"

using Random # for the function `randperm`

function Password{N}(n::Integer) where N
    n < 5 && throw(ArgumentError("cannot have a password of length $n."))
    word = *(
        rand(LETTERS, 2)...,
        rand(NUMBERS, 2)...,
        rand(SYMBOLS, 1)...,
        rand(LETTERS*NUMBERS*SYMBOLS, n-5)...
    )
    # This use of `randperm` shuffles the characters
    # Otherwise, we'd always start with two letters, two numbers, and one symbol
    Password{N}(word[randperm(n)])
end

Note

The using keyword we have used here is to be able to make use of a module, which is an add-on to Julia with extra functionality, in this case Random. More discussion of modules and packages can be found in Chapter 8.

Nowhere in this constructor have we:

Said that N needs to be an Integer
Said that n needs to be at least N

These are already handled by the inner constructor, which is what we call at the end when we give Password{N} a single String argument, so we don’t need to do these checks here. We’ve also built the random password to ensure that it contains the requisite character, meaning that it will pass those tests. The only check we have done, checking that n is at least 5, is to ensure that the code within the outer constructor doesn’t break, specifically when we try to choose n-5 random items later.

We can now generate random passwords:

Password{8}(10)

Password{8}("#Q5qHf#6uO")

Password{8}(15)

Password{8}("Fkc2huo255@FOuG")

Password{8}(6)

ERROR: ArgumentError: password must contain at least 8 characters.

Outer constructors need not refer directly to inner constructors, instead we can chain them together. For example, let’s suppose that we don’t specify any arguments to Password{N}, and we want that to generate a password of length N (the minimum allowed by Password{N}). We can do this simply by calling our first outer constructor:

Password{N}() where N = Password{N}(N)

Password{10}()

Password{10}("wIkaM7\$0a\$")

Password{15}()

Password{15}("9@L47iO1K8qLeCC")

Convention

Inner constructors should be used minimally, only where an outer constructor could not perform the same task.

Note

Why are there two different types of constructors anyway? As similar as they might seem, the two types of constructors perform quite different roles.

Inner constructors are intended as intrinsic parts of the type, perhaps enforcing conditions upon the parameters and fields that can’t be done by type declarations as we’ve seen, although they do have some other uses particularly with mutable types. Once the type has been defined, they are set in stone, and cannot be altered or amended.

Outer constructors, meanwhile, are just like methods are to functions. They can be changed, overwritten, and new ones added at will (although doing this to inbuilt types might cause problems!) They are more limited than inner constructors, but their greater flexibility means that anything possible with an outer constructor should be done with an outer constructor.

6.2.5 Adding a display style

When the value of a new type is displayed, the default appearance looks very much like the inner constructor we use to create it, as we can see from the Password outputs earlier on. This is deliberate, automatically allowing for string interpolation, interpretation, and evaluation by Julia if needed. However, it’s not always the most useful way of showing the value, and in certain self-referential cases, can break entirely. Therefore, it’s valuable to be able to customise this look, which we can do by overloading (i.e. writing a new method for) the inbuilt function show.

Before we write a new method, we need to import the old ones, so that we don’t redefine show accidentally and then nothing but our new type can be displayed by Julia. show comes from the module Base, so we use:

import Base.show

In Chapter 7, we’ll make further use of import, while in Chapter 8 we’ll understand it a little better.

Now, we can write our own method. The best way to do this is to write a method with two inputs, one of type IO, and one of our new type that we want to display. The reason for this is that the IO argument determines where the output will go, allowing us to display our value wherever it needs to be displayed.

function show(io::IO, x::[type])

When outputting the type, there are two function that you’ll likely want to use: print and println. Both take the IO argument first, followed by a String that you want printed. Both output this as text to wherever IO tells them to. However, println also adds a newline character '\n' (the equivalent of pressing Enter ⮠ on your keyboard) after this message, while print doesn’t, allowing you to keep adding to the same line. You can use these in combination with other string manipulations (see Chapter 4).

Unlike other functions, we’re not interested in a value being returned by show. Indeed, the last thing your method will likely do is call one of print or println, which have no output (or more accurately, their output is nothing, the value of the singleton type Nothing), so the same will be true of show. This is fine, however, as the output that we need is the printed text, which happens in the middle of the function anyway regardless of what is returned at the end.

For a quick demonstration of this, let’s change the output of our Password type. Of course, Passwords should be secret, so we don’t want to show their value to the world whenever they enquire! Instead, we’ll output "•" in place of each of the characters:

show(io::IO, x::Password) = print(io, "•"^length(x.word))

show (generic function with 380 methods)

Password{10}()

••••••••••

Password{15}()

•••••••••••••••

Note

Much like functions, types are really just another type of variable. Their type is DataType, and preceding a Tuple with their name calls a constructor. Similar to functions, once defined, they cannot be redefined as anything else.

6.3 Scope of variables

Suppose, somewhere deep in the Julia codebase, someone has defined the variable x (this isn’t much of a supposition, it happens countless times!). When Julia initialises, and this code is run, the name x is, at some point, used to refer to some value. Wouldn’t it be really annoying if, because of this, no-one was allowed to use the name x ever again? Or, every time that data is given a name somewhere in the code, its value is stored in perpetuity, waiting to be overwritten? This sounds ridiculous, but if it weren’t for the system of variable scope, it would be a reality.

The scope of a variable defines where and when its name may be used to reference its value. Outside of scope, it may refer to a different defined value (acting as a different variable, but happens to have the same name in a different context), or give an error saying that it’s not defined.

The scope always starts where the variable in question is first defined, but pinning down where it ends is trickier. To illustrate this, consider the following examples:

x = 0

for i ∈ 1:2
    x = 1
    i = 1
end

x

We start by setting x = 0, and then run a for loop. First, it runs with i = 1, setting the value of x to 1, and then setting i to be 1. Then, it runs with i = 2, again setting the values of x and i to 1. The loop concludes, and we display the value of x. Unsurprisingly, perhaps, its value is 1. What about i?

ERROR: UndefVarError: `i` not defined

i doesn’t have a value at the end of this section of code, even though the loop finished with i = 1. This is because the variable i is defined inside the for loop, so its scope stops at the keyword end that marks the end of the loop.

Now let’s tweak the code slightly, changing the iterated variable from i to x:

x = 0

for x ∈ 1:2
    x = 1
    i = 1
end

x

This starts again by setting x = 0, but then the loop runs differently. This time, we begin the loop with x = 1, set the values of x and i to 1, and do the same with x = 2. So why isn’t the final value of x 2? By defining x as the iterating variable in the for loop, we’ve unwittingly created a new variable called x that belongs only within the loop (a variable that exists only within a certain block of code like this is local to that block). Therefore, the original x is entirely unaffected. Just as before, i is only defined within the for loop, so Julia won’t recognise it outside of that:

ERROR: UndefVarError: `i` not defined

6.3.1 Code blocks

By default, every code block (that is, one of the pieces of code that begins with a codeword such as if or function, and ends with end) has its own behaviour with respect to the scope of variables defined outside/inside of it. We’ll examine the ones we’ve met so far, as well as a couple that we’ll meet later. ::: callout-note The diagrams in this section are generated in Julia, and the code used is given in Appendix B. To read them, all you need to know is:

Each dot marks where a variable is defined or overwritten
Solid lines show where a variable is in scope
Dotted lines show where a variable is temporarily out of scope, with the variable name having been reused to define a local variable in a new block :::

begin-end is the simplest code block, and has perhaps the simplest scope behaviour, in that it has no effect on scope. Anything defined inside it can be accessed on the outside, and vice versa.

Figure 6.1: Scope interaction with begin-end

Similarly, if-statements have no special effect on scope. The same is true for its replacements "? :", "&&", and "||", although these aren’t really code blocks in and of themselves.

Figure 6.2: Scope interaction with if-else

However, since if-statements give branching paths, it’s possible to miss the definition of a variable, like y in the example below, so the query of the value of y later will cause an UndefVarError.

Figure 6.3: Undefined variable caused by branching paths

while and for blocks can reference and update any variable that was defined before they began. Anything defined for the first time within them, however, is local to the loop, and is lost as soon as the loop ends.

Figure 6.4: Scope interaction with for

A special case exists for the variable or variables that are iterated through many values by the for loop. These are considered new local variables regardless of whether they have been defined before or not.

Figure 6.5: Scope of local variable defined by for

Note here that the period when the original x is temporarily out of scope exactly corresponds to when the other x is in scope (perhaps unsurprisingly).

For a function, any variables defined within the function are local to the function, but simply referencing a variable from outside the function works fine because it remains in scope.

Figure 6.6: Scope interaction with function

Meanwhile, for a struct, variables defined outside can’t be referenced inside, and no variables defined inside can be referenced outside.

Figure 6.7: Scope interaction with struct

Scope works in a nested fashion, for example a variable defined inside a begin-end that itself is inside a for loop will not be accessible outside the for loop, like y below.

Figure 6.8: Scope in nested blocks

6.3.2 `local` and `global`

For more control over the scope in which your variables exist, you may wish to use the keywords local and global. These keywords go before a definition or reassignment of a variable, marking its scope explicitly:

Using local x creates a new local variable with the name x, with scope restricted to the block it is defined in. This is how the iterating variable of a for loop is defined (even though we didn’t type local), and so is why it behaves differently to any other variable in a for loop
Using global x declares that this variable name is a global variable, which means that whenever the name is used, it refers to the same variable (except where an explicitly local variable called x exists). If, in a large file, global x is written anywhere, it applies to the whole file, not just the references further down the page

Convention

Both of these are situationally useful, but particularly the use of global should be avoided most of the time, as it can have unexpected effects on other bits of supposedly unrelated code. Instead, the value can be passed around between functions as an additional argument, or for unchanging values, a constant (const) variable can be used.

There are some differences in behaviour between the REPL and code run from .jl files when it comes to local and global variables. In general, the REPL allows more flexibility, while .jl code will produce warnings about clumsy use of global variables. If code is mostly contained within functions in .jl files (as is advised), no variable will be automatically global, so this isn’t a major worry.

We’ve skimmed over local and global quite quickly, because they aren’t particularly useful in basic use cases. However, this wasn’t always the case, and if you look at some older Julia code, you may see them used far more. This is because the way that scope is defined (particularly in loops) has changed since Julia’s release, and specifying local and global is no longer necessary in all circumstances. This is another place where REPL and .jl behaviours differ though, so you should pay attention to any warnings that you get.

Note

Technically, global variables are not truly global, instead their scope is the module in which they lie. Modules are a way to group related code together under a single name, and can be convenient for sharing code.

If no module is ever declared, code will run in Main, so variables will be functionally global. However, global variables from other modules would not be accessible, and any global variables you declare won’t affect identically named variables in these other modules.

For more discussion of modules, see Chapter 8.

6.3.3 Constants

Constants are a special type of variable, intended to have a single unchanging value which can be accessed from anywhere (i.e. a global scope). For example, you could be running a business that want to make a 4.3% profit on anything it sells, so you might declare:

const PROFIT_MARGIN = 1.043

1.043

Then, anywhere else in your program, you could refer to PROFIT_MARGIN, instead of having to remember what its value is (provided that the variable name PROFIT_MARGIN isn’t taken by another value). ::: callout-tip ## Convention

const variables are named differently from normal variables, instead using capital letters with words separated by underscores. One good reason to do this is ensure it keeps its global scope, for instance, if we called our constant p, then it wouldn’t be accessible inside a function that had an input called p. :::

To demonstrate how constants differ from normal variables, consider the following example:

N = 3

addN(x) = x + N

addN (generic function with 1 method)

addN(2)

N = 4

addN(2)

With the non-constant value N, the function addN looks up the value of N in order to add it each time it is needed. However, if we use constants:

const M = 3

addM(x) = x + M

addM (generic function with 1 method)

addM(2)

const M = 4

WARNING: redefinition of constant M. This may fail, cause incorrect answers, or produce other errors.

addM(2)

Now, the const value M is included in the function verbatim, and since we never redefined the function after that, it’s still expecting M to be the same as it was originally. Notice that we were warned of exactly this issue when we changed the value of the constant.

In fact, most of the time, trying to change the value of a const won’t give a warning, it will just result in an error message, with the value not being changed. Earlier, we noted that functions and types are actually just const variables of a form, and you’ll note that you won’t be able to change their values:

import Core.Int64
Int64 = 12

ERROR: cannot assign a value to imported variable Core.Int64 from module Main

There are good reasons you may want to use consts, for values that you want to define programmatically but never change; some examples built into Julia are the mathematical constants π, ℯ, etc. As we’ve seen though, it’s necessary to ensure that these constants are never redefined, they really should be constant!

6.4 Example: Unit conversion

A common problem to come across is the need to convert some quantity between units. There are many online tools that do this, but let’s put some of our new knowledge to the test and create our own crude tool to do the same. There are many ways to approach this, but we’ll be creating a type to represent a unit, as well as a function to convert between them.

We’ll start with creating a Unit type to represent the units we want to convert between.

struct Unit
end

What fields do we need? We need a conversion factor to be able to convert between units, which will be relative to some standard unit, such as the SI units, and this will be some sort of number. Since we don’t know exactly what type it will be, and we don’t particularly mind, we can use the abstract type Real as an umbrella term. Also, we need to know what quantity the unit measures, as we can’t convert between a unit of length and a unit of mass, for example! This could come in various forms, but the simplest will be just to store this as a String. For the purposes of this example, we won’t need any more fields, but for more functionality you may wish to add others.

struct Unit
    factor::Real
    quantity::String
end

We haven’t used an inner constructor here, but we will add some outer constructors to allow for easier construction of new units. First, we’ve mentioned the idea of a base unit, to which all the factors are relative. We would represent this as a Unit with a factor of 1, so let’s add a constructor where if the factor isn’t specified, it’s assumed to be 1, giving the base unit:

Unit(quantity::String) = Unit(1, quantity)

Unit

When we think of the way that units are usually defined to us, it’s generally in terms of another unit that measures the same thing (i.e. 1 kilometre is 1000 metres). We can add this as a constructor too, using the factor of an old Unit to calculate the new factor:

# Creates the new unit corresponding to x lots of u
Unit(x::Real, u::Unit) = Unit(x * u.factor, u.quantity)

Unit

Now that our type is defined, we can create some variables of this type. For example, here are some units of length:

metre = Unit("length")
kilometre = Unit(1000, metre)
centimetre = Unit(1//100, metre)
inch = Unit(2.54, centimetre)
foot = Unit(12, inch)
yard = Unit(3, foot)
mile = Unit(1760, yard)

Unit(1609.3440000000003, "length")

some of mass:

kilogram = Unit("mass")
gram = Unit(1//1000, kilogram)
pound = Unit(453.59237, gram)
ounce = Unit(1//16, pound)
shortton = Unit(2000, pound)
longton = Unit(2240, pound)
metricton = Unit(1000, kilogram)

Unit(1000, "mass")

some of time:

second = Unit("time")
minute = Unit(60, second)
hour = Unit(60, minute)
day = Unit(24, hour)
julianyear = Unit(365.25, day)
gregorianyear = Unit(365.2425, day)
tropicalyear = Unit(365.24219, day)

Unit(3.1556925216e7, "time")

and some of angles:

radian = Unit("angle")
fullcircle = Unit(2π, radian)
degree = Unit(1//360, fullcircle)
arcminute = Unit(1//60, degree)
arcsecond = Unit(1//60, arcminute)

Unit(4.84813681109536e-6, "angle")

As is often done with Units, we may wish to combine them together to make new ones, such as combining metre and second to get metrepersecond measuring "speed". Although our implementation doesn’t allow for this automatically (we’d have to tell it that dividing a unit of "length" by a unit of "time" gives a unit of "speed", etc.), we can do this manually:

metrepersecond = Unit("speed")
# Speed of light in a vacuum
const C = Unit(299792458, metrepersecond)

# One lightyear is the distance travelled by light in a vacuum in one Julian year
# Calculated by distance = speed * time
lightyear = Unit(C.factor * julianyear.factor, metre)

Unit(9.4607304725808e15, "length")

# One astronomical unit (au) is approximately the average distance between the Earth and the Sun
au = Unit(149597870700, metre)

# One parsec is approximately the distance to an object of parallax angle 1 arcsecond (1//3600 degrees)
parsec = Unit(1/arcsecond.factor, au)

Unit(3.085677581491367e16, "length")

Now let’s write a function to convert between units. We need three inputs, the amount to convert, the Unit that this amount is in, and the Unit to convert it into. We will do that with convertunits, which will take the following form:

function convertunits(x::Real, u₁::Unit, u₂::Unit)
    [...]
end

The natural choice of function name here would be convert, but this is a crucial function used by Julia to convert between types, so we don’t want to overwrite that. Theoretically, we could use multiple dispatch to write our own method, but that would be bad practice, as the convert function is specifically meant for converting between types and nothing else.

First, we need to check that the Units entered are compatible, namely that they measure the same quantity. This can be done by an if-statement, or even simpler, short-circuited, with an error displayed if the quantities do not match:

u₁.quantity == u₂.quantity || throw(ArgumentError("units measure different quantities."))

Now we just need to do the conversion. A little thinking (or experimenting) tells us that the correct formula for this is to multiply x by the factor of u₁, and then divide by the factor of u₂:

x * u₁.factor / u₂.factor

This is the value we want, so we return it, finishing the function.

function convertunits(x::Real, u₁::Unit, u₂::Unit)
    u₁.quantity == u₂.quantity || throw(ArgumentError("units measure different quantities."))
    x * u₁.factor / u₂.factor
end

convertunits (generic function with 1 method)

Now we can put this function to the test:

convertunits(110, kilometre, mile)

68.35083114610673

convertunits(45, degree, radian)

0.7853981633974483

convertunits(28, day, minute)

40320.0

convertunits(1, parsec, lightyear)

3.2615637771674333

convertunits(1, metricton, longton)

0.9842065276110605

convertunits(12, inch, pound)

ERROR: ArgumentError: units measure different quantities.

As mentioned, there are other ways of approaching this problem, and ways of improving this method further. Some ideas for improvement that you might want to try yourself are:

The quantity field of Unit is used to check if two Units measure the same thing, but perhaps could do with an inner constructor to constrain the values that we’re allowed to put in it (e.g. only "length", "mass", "time", etc)
Since this type only uses multiplicative factors, it won’t work for unit conversion between degrees Celsius and degrees Fahrenheit. You could alter the Unit type to account for such an offset
Using multiple dispatch (see Chapter 7), we can write our own methods for inbuilt Julia functions. For example, a method for show can make the displayed output nicer as demonstrated earlier, a method for * would allow clean syntax like inch = 2.54 * centimetre, and a method for + could seamlessly allow quantities like 2 years and 5 months
Instead of using types, we could use a different structure to represent units. If you’ve read Chapter 9, you may want to consider using a Dict for a similar purpose

6.1 Custom functions

6.1.1 Defining a new function

6.1.2 Prescribing the inputs

6.1.3 Anonymous functions

6.1.4 Piping and composing functions

6.2 Custom types

6.2.1 Types of types

6.2.2 Defining a new type

6.2.3 Inner constructors

6.2.4 Outer constructors

6.2.5 Adding a display style

6.3 Scope of variables

6.3.1 Code blocks

6.3.2 local and global

6.3.3 Constants

6.4 Example: Unit conversion

6.3.2 `local` and `global`