Introduction

1. What is documentation

2. Why do we document

3. Why you should document

4. How you can document

5. The Good, the Bad, and the Ugly

6. Wrap-up

What is documentation

Documentation is just a method of recording information

Specifically it records

1. How a process was performed

What did you do
What steps did you take
Why were the steps taken in that order
What did you use to do it

2. Comments on how the process went

Troubleshooting processes you used
Errors you encountered
Alternative methods you may have tried

3. Outputs/products of the process

If you ran code show the output
If you made plots include them
If you built an app show where to find it

4. How to perform the same process

Step-by-step instructions
Definition of what is mandatory to do and what is optional
Can someone successfully reproduce the process with no previous experience?
- If not, what experience do they need/is recommended?

In general we want to hit all four of these points

It’s OK to only hit three, but recognize what you’ve provided

[1,2,4] makes a guide, [1,2,3] makes a report

2 or less points hit produce incomplete documentation

Incomplete documentation is as destructive than no documentation
[1,2] doesn’t inform someone how to do what you did, or what the outcome will look like
[2,3] leaves the reader in the dark about what you actually did
[1,3] puts the burden of reproducibility and error troubleshooting on the reader

Why do we document

An informed and careful reader shouldn’t be confused

Someone with all the necessary background of what you’re doing shouldn’t question:

What you did
Why you did it
How you did it
What tools you used

It’s OK but not ideal if they don’t know why you did it that way

Your steps should always be reproducible

If your steps aren’t reproducible then your results aren’t real

Nobody can confirm you did them right
Nobody can confirm you did them

Why you should document

a) Your final project involves documentation; I’m one of the graders

c) Academia is essentially founded on it

\((+)\) We’re in a reproducibility crisis, stand out, stay employed

d) The you of today won’t have the same brain as the you of \(2+\) years from now

And that person may need to know what you were thinking

e) Constantly answering the same question is tedious and wastes your time

How you can document

My preferred method of documentation for programming is actually code comments

dijk_func <- function(matrix,init_node){
  
  ############################
  ### initialize variables ###
  ############################
  
  ## set initial distances for all vertices ##
  
  # variable to store distances
  # inf for all non-source vertices
  dist <- rep(Inf,nrow(matrix))
  
  # 0 for source vertex
  dist[init_node] <- 0
  
  # logic repetition across the matrix
  # provides a binary operation to confirm visit or no visit
  visited <- rep(FALSE,nrow(matrix))
  
  ############################################# 
  ### repeat until all vertices are checked ###
  #############################################
  
  # logical repition of the algorithm
  # until all nodes have been visited
  repeat{
    
    # until any node is checked it is assumed
    # the distance of shortest path (sp) is infinite
    shortest_path = Inf
    
    # index for checking if visits have occurred
    i_v = -1
    
    # for the k^th try, sequence the length of the distances
    for(k in seq_along(dist)) {
      # through all unvisited nodes
      
      # logic statement
      # if the distance at k^th try is less than sp
      # AND has NOT been visited at k^th try
      if(dist[k] < shortest_path && !visited[k]){ 
        # then sp is equal to distance at k^th try
        shortest_path = dist[k]
        # and index is set to k^th try
        i_v = k
      }
  }
    # given i_v is  equal to -1
    if(i_v == -1){
      # all nodes are visted
      # sequence is broken
      # output returned
      return (dist)
    }
    
    ##############################################################
    ### choose the unvisited vertex with the shortest distance ###
    ########## from the start to be the current vertex ###########
    ##### always start with the source as the current vertex #####
    ##############################################################
    
    # for k^th try across the adj matrix
    # in the indexed visits
    for(k in seq_along(matrix[i_v,])) {
      
      ####################################################################
      ### for each of the current vertex's unvisited neighbor vertices ###
      ############## calculate the distance from the source ##############
      ### update the distance if the new calculated distance is lower ####
      ####################################################################
      
      # if the path is NOT equal to 0
      # AND the distrance at k^th try is less than than the indexed distance
      # over the edge at that point
      if(matrix[i_v,k] != 0 && dist[k] > dist[i_v] + matrix[i_v,k]){
        
        # this becomes the new shortest path
        dist[k] = dist[i_v] + matrix[i_v,k]
      }
      
      ############################################
      ###### mark checked vertex as visited ######
      ### marked vertices aren't checked again ###
      ############################################
      
      # replace false values for no visit with true given they've been visited
      visited[i_v] <- TRUE
    }
  }
}

Markdown platforms are a fantastic method

Quarto
Jupyter
R Markdown

Non-programming documentation has analogous platforms

Obsidian Vault
Notion
Overleaf (LaTeX)

The Good, the Bad, and the Ugly

Good

Detailed comments that assume the reader is clueless

# table() will provide a frequency each node occurs in the list
# so long as our assumption on how to calculate node degree holds
degree_dist_f <- data.frame(table(vector_stack_f)) # each table is placed in a data frame
degree_dist_p <- data.frame(table(vector_stack_p)) # so that results are easily referenced

Important figures made as readable as possible

Inclusion of common errors and troubleshooting/resolution

When attempting to log-in to bookdown.org/connect if you experience this error sequence:

- Log-in via Google authentication

- Load back to the original log-in page

- A brief error message is added to the end of the URL

- Dev Tools for your browser shows a 401 error code

The resolution that worked for this instance was:

- Create a new Google account

- Connect all multi-factor authentication available

- Connect to bookdown.org with that new account

If that does not work, the original solution involved making the Google account out of an existing outlook account.

Bad

Assuming the reader understands what your variables and shorthands are

# gen adj mat
for (e in e_f) { 
  u <- e[1]
  v <- e[2]
  F_ij[u + 1, v + 1] <- 1
  F_ij[v + 1, u + 1] <- 1
}

Putting multiple pages of unnecessary output between sections/sentences

This resulted in an intermediary product of shortest path arrays for each network at 115x115x10:

print[sp_F_array]
print[sp_P_array]

Now we can use these arrays in producing the necessary parameters for our K-Nearest-Neighbors algorithm.

The same logic applies to message/error/warning outputs, don’t let those through unless they’re part of what you’re documenting

Off-loading short/medium length explanations of functions to separate documents

What does this mean?

Wikipedia-style documentation does not work as a step-by-step
Citations are good, but don’t make every piece of background a citation rabbit hole
That works in academic papers, it never works in technical documentation
This format is occasionally useful if you’re running manual/blog/book style documentation and the link is within the same document or to something you have written yourself
Try not to disrupt the reader’s information pipeline by frequently changing the author that’s delivering the information

Ugly

Poorly cropped/scaled screenshots
Dependency-required file formats (.docx, .xlsx, .hjsl, .gml, .Rmd, .R, .qmd)
Leaving the rationale for a process up to interpretation
Poor differentiation between code/algorithm and explanation
Code running off the page and becoming inaccessible
Missing seeds for random number generation processes

Warp-Up

Poor documentation is just as bad as no documentation.

It’s always better to start with being overly detailed and scaling back.

Don’t default to what you’re comfortable with, try different methods and platforms whenever you can.

Everything you’ve seen here today was learned by using Google correctly. You don’t need a class to become good at this (or programming) you just need to put in effort.

You may not feel the outcomes of being a poor documenter right away, but eventually you will and it usually comes in the form of a capped salary. On occasion it can lose you a job.

Documentation: A Brief Overview

Robert Sholl

2024-11-06

Introduction

1. What is documentation

2. Why do we document

3. Why you should document

4. How you can document

5. The Good, the Bad, and the Ugly

6. Wrap-up

What is documentation

1. How a process was performed

2. Comments on how the process went

3. Outputs/products of the process

4. How to perform the same process

Why do we document

Why you should document

a) Your final project involves documentation; I’m one of the graders

c) Academia is essentially founded on it

d) The you of today won’t have the same brain as the you of \(2+\) years from now

e) Constantly answering the same question is tedious and wastes your time

How you can document

The Good, the Bad, and the Ugly

Good

Bad

Ugly

Warp-Up

Documentation: A Brief Overview

Robert Sholl

2024-11-06

Introduction

1. What is documentation

2. Why do we document

3. Why you should document

4. How you can document

5. The Good, the Bad, and the Ugly

6. Wrap-up

What is documentation

1. How a process was performed

2. Comments on how the process went

3. Outputs/products of the process

4. How to perform the same process

Why do we document

Why you should document

a) Your final project involves documentation; I’m one of the graders

b) Industry revolves around documentation, especially in any programming related industry

c) Academia is essentially founded on it

d) The you of today won’t have the same brain as the you of \(2+\) years from now

e) Constantly answering the same question is tedious and wastes your time

How you can document

The Good, the Bad, and the Ugly

Good

Bad

Ugly

Warp-Up