Introduction

1. What is documentation


2. Why do we document


3. Why you should document


4. How you can document


5. The Good, the Bad, and the Ugly


6. Wrap-up



What is documentation

Documentation is just a method of recording information

Specifically it records

1. How a process was performed

  • What did you do

  • What steps did you take

  • Why were the steps taken in that order

  • What did you use to do it


2. Comments on how the process went

  • Troubleshooting processes you used

  • Errors you encountered

  • Alternative methods you may have tried


3. Outputs/products of the process

  • If you ran code show the output

  • If you made plots include them

  • If you built an app show where to find it


4. How to perform the same process

  • Step-by-step instructions

  • Definition of what is mandatory to do and what is optional

  • Can someone successfully reproduce the process with no previous experience?

    • If not, what experience do they need/is recommended?


In general we want to hit all four of these points

It’s OK to only hit three, but recognize what you’ve provided

  • [1,2,4] makes a guide, [1,2,3] makes a report


2 or less points hit produce incomplete documentation

  • Incomplete documentation is as destructive than no documentation

  • [1,2] doesn’t inform someone how to do what you did, or what the outcome will look like

  • [2,3] leaves the reader in the dark about what you actually did

  • [1,3] puts the burden of reproducibility and error troubleshooting on the reader


Why do we document

An informed and careful reader shouldn’t be confused

Someone with all the necessary background of what you’re doing shouldn’t question:

  • What you did

  • Why you did it

  • How you did it

  • What tools you used

It’s OK but not ideal if they don’t know why you did it that way


Your steps should always be reproducible

If your steps aren’t reproducible then your results aren’t real

  • Nobody can confirm you did them right

  • Nobody can confirm you did them


Why you should document

a) Your final project involves documentation; I’m one of the graders


c) Academia is essentially founded on it
  • \((+)\) We’re in a reproducibility crisis, stand out, stay employed


d) The you of today won’t have the same brain as the you of \(2+\) years from now
  • And that person may need to know what you were thinking


e) Constantly answering the same question is tedious and wastes your time


How you can document

My preferred method of documentation for programming is actually code comments

dijk_func <- function(matrix,init_node){
  
  ############################
  ### initialize variables ###
  ############################
  
  ## set initial distances for all vertices ##
  
  # variable to store distances
  # inf for all non-source vertices
  dist <- rep(Inf,nrow(matrix))
  
  # 0 for source vertex
  dist[init_node] <- 0
  
  # logic repetition across the matrix
  # provides a binary operation to confirm visit or no visit
  visited <- rep(FALSE,nrow(matrix))
  
  ############################################# 
  ### repeat until all vertices are checked ###
  #############################################
  
  # logical repition of the algorithm
  # until all nodes have been visited
  repeat{
    
    # until any node is checked it is assumed
    # the distance of shortest path (sp) is infinite
    shortest_path = Inf
    
    # index for checking if visits have occurred
    i_v = -1
    
    # for the k^th try, sequence the length of the distances
    for(k in seq_along(dist)) {
      # through all unvisited nodes
      
      # logic statement
      # if the distance at k^th try is less than sp
      # AND has NOT been visited at k^th try
      if(dist[k] < shortest_path && !visited[k]){ 
        # then sp is equal to distance at k^th try
        shortest_path = dist[k]
        # and index is set to k^th try
        i_v = k
      }
  }
    # given i_v is  equal to -1
    if(i_v == -1){
      # all nodes are visted
      # sequence is broken
      # output returned
      return (dist)
    }
    
    ##############################################################
    ### choose the unvisited vertex with the shortest distance ###
    ########## from the start to be the current vertex ###########
    ##### always start with the source as the current vertex #####
    ##############################################################
    
    # for k^th try across the adj matrix
    # in the indexed visits
    for(k in seq_along(matrix[i_v,])) {
      
      ####################################################################
      ### for each of the current vertex's unvisited neighbor vertices ###
      ############## calculate the distance from the source ##############
      ### update the distance if the new calculated distance is lower ####
      ####################################################################
      
      # if the path is NOT equal to 0
      # AND the distrance at k^th try is less than than the indexed distance
      # over the edge at that point
      if(matrix[i_v,k] != 0 && dist[k] > dist[i_v] + matrix[i_v,k]){
        
        # this becomes the new shortest path
        dist[k] = dist[i_v] + matrix[i_v,k]
      }
      
      ############################################
      ###### mark checked vertex as visited ######
      ### marked vertices aren't checked again ###
      ############################################
      
      # replace false values for no visit with true given they've been visited
      visited[i_v] <- TRUE
    }
  }
}


Markdown platforms are a fantastic method

  • Quarto

  • Jupyter

  • R Markdown


Non-programming documentation has analogous platforms

  • Obsidian Vault

  • Notion

  • Overleaf (LaTeX)


The Good, the Bad, and the Ugly

Good

Detailed comments that assume the reader is clueless

# table() will provide a frequency each node occurs in the list
# so long as our assumption on how to calculate node degree holds
degree_dist_f <- data.frame(table(vector_stack_f)) # each table is placed in a data frame
degree_dist_p <- data.frame(table(vector_stack_p)) # so that results are easily referenced

Important figures made as readable as possible

Inclusion of common errors and troubleshooting/resolution

When attempting to log-in to bookdown.org/connect if you experience this error sequence:

- Log-in via Google authentication

- Load back to the original log-in page

- A brief error message is added to the end of the URL

- Dev Tools for your browser shows a 401 error code

The resolution that worked for this instance was:

- Create a new Google account

- Connect all multi-factor authentication available

- Connect to bookdown.org with that new account

If that does not work, the original solution involved making the Google account out of an existing outlook account.


Bad

Assuming the reader understands what your variables and shorthands are

# gen adj mat
for (e in e_f) { 
  u <- e[1]
  v <- e[2]
  F_ij[u + 1, v + 1] <- 1
  F_ij[v + 1, u + 1] <- 1
}


Putting multiple pages of unnecessary output between sections/sentences

This resulted in an intermediary product of shortest path arrays for each network at 115x115x10:

print[sp_F_array]
print[sp_P_array]

Now we can use these arrays in producing the necessary parameters for our K-Nearest-Neighbors algorithm.

The same logic applies to message/error/warning outputs, don’t let those through unless they’re part of what you’re documenting


Off-loading short/medium length explanations of functions to separate documents

What does this mean?

  • Wikipedia-style documentation does not work as a step-by-step

  • Citations are good, but don’t make every piece of background a citation rabbit hole

  • That works in academic papers, it never works in technical documentation

  • This format is occasionally useful if you’re running manual/blog/book style documentation and the link is within the same document or to something you have written yourself

  • Try not to disrupt the reader’s information pipeline by frequently changing the author that’s delivering the information


Ugly

  • Poorly cropped/scaled screenshots

  • Dependency-required file formats (.docx, .xlsx, .hjsl, .gml, .Rmd, .R, .qmd)

  • Leaving the rationale for a process up to interpretation

  • Poor differentiation between code/algorithm and explanation

  • Code running off the page and becoming inaccessible

  • Missing seeds for random number generation processes


Warp-Up

Poor documentation is just as bad as no documentation.

It’s always better to start with being overly detailed and scaling back.

Don’t default to what you’re comfortable with, try different methods and platforms whenever you can.

Everything you’ve seen here today was learned by using Google correctly. You don’t need a class to become good at this (or programming) you just need to put in effort.

You may not feel the outcomes of being a poor documenter right away, but eventually you will and it usually comes in the form of a capped salary. On occasion it can lose you a job.