Documentation is just a method of recording information
Specifically it records
What did you do
What steps did you take
Why were the steps taken in that order
What did you use to do it
If you ran code show the output
If you made plots include them
If you built an app show where to find it
Step-by-step instructions
Definition of what is mandatory to do and what is optional
Can someone successfully reproduce the process with no previous experience?
In general we want to hit all four of these points
It’s OK to only hit three, but recognize what you’ve provided
2 or less points hit produce incomplete documentation
Incomplete documentation is as destructive than no documentation
[1,2] doesn’t inform someone how to do what you did, or what the outcome will look like
[2,3] leaves the reader in the dark about what you actually did
[1,3] puts the burden of reproducibility and error troubleshooting on the reader
An informed and careful reader shouldn’t be confused
Someone with all the necessary background of what you’re doing shouldn’t question:
What you did
Why you did it
How you did it
What tools you used
It’s OK but not ideal if they don’t know why you did it that way
Your steps should always be reproducible
If your steps aren’t reproducible then your results aren’t real
Nobody can confirm you did them right
Nobody can confirm you did them
My preferred method of documentation for programming is actually code comments
dijk_func <- function(matrix,init_node){
############################
### initialize variables ###
############################
## set initial distances for all vertices ##
# variable to store distances
# inf for all non-source vertices
dist <- rep(Inf,nrow(matrix))
# 0 for source vertex
dist[init_node] <- 0
# logic repetition across the matrix
# provides a binary operation to confirm visit or no visit
visited <- rep(FALSE,nrow(matrix))
#############################################
### repeat until all vertices are checked ###
#############################################
# logical repition of the algorithm
# until all nodes have been visited
repeat{
# until any node is checked it is assumed
# the distance of shortest path (sp) is infinite
shortest_path = Inf
# index for checking if visits have occurred
i_v = -1
# for the k^th try, sequence the length of the distances
for(k in seq_along(dist)) {
# through all unvisited nodes
# logic statement
# if the distance at k^th try is less than sp
# AND has NOT been visited at k^th try
if(dist[k] < shortest_path && !visited[k]){
# then sp is equal to distance at k^th try
shortest_path = dist[k]
# and index is set to k^th try
i_v = k
}
}
# given i_v is equal to -1
if(i_v == -1){
# all nodes are visted
# sequence is broken
# output returned
return (dist)
}
##############################################################
### choose the unvisited vertex with the shortest distance ###
########## from the start to be the current vertex ###########
##### always start with the source as the current vertex #####
##############################################################
# for k^th try across the adj matrix
# in the indexed visits
for(k in seq_along(matrix[i_v,])) {
####################################################################
### for each of the current vertex's unvisited neighbor vertices ###
############## calculate the distance from the source ##############
### update the distance if the new calculated distance is lower ####
####################################################################
# if the path is NOT equal to 0
# AND the distrance at k^th try is less than than the indexed distance
# over the edge at that point
if(matrix[i_v,k] != 0 && dist[k] > dist[i_v] + matrix[i_v,k]){
# this becomes the new shortest path
dist[k] = dist[i_v] + matrix[i_v,k]
}
############################################
###### mark checked vertex as visited ######
### marked vertices aren't checked again ###
############################################
# replace false values for no visit with true given they've been visited
visited[i_v] <- TRUE
}
}
}
Markdown platforms are a fantastic method
Quarto
Jupyter
R Markdown
Non-programming documentation has analogous platforms
Obsidian Vault
Notion
Overleaf (LaTeX)
Detailed comments that assume the reader is clueless
# table() will provide a frequency each node occurs in the list
# so long as our assumption on how to calculate node degree holds
degree_dist_f <- data.frame(table(vector_stack_f)) # each table is placed in a data frame
degree_dist_p <- data.frame(table(vector_stack_p)) # so that results are easily referenced
Important figures made as readable as possible
Inclusion of common errors and troubleshooting/resolution
When attempting to log-in to bookdown.org/connect if you experience this error sequence:
- Log-in via Google authentication
- Load back to the original log-in page
- A brief error message is added to the end of the URL
- Dev Tools for your browser shows a 401 error code
The resolution that worked for this instance was:
- Create a new Google account
- Connect all multi-factor authentication available
- Connect to bookdown.org with that new account
If that does not work, the original solution involved making the Google account out of an existing outlook account.
Assuming the reader understands what your variables and shorthands are
# gen adj mat
for (e in e_f) {
u <- e[1]
v <- e[2]
F_ij[u + 1, v + 1] <- 1
F_ij[v + 1, u + 1] <- 1
}
Putting multiple pages of unnecessary output between sections/sentences
This resulted in an intermediary product of shortest path arrays for each network at 115x115x10:
print[sp_F_array]
print[sp_P_array]
Now we can use these arrays in producing the necessary parameters for our K-Nearest-Neighbors algorithm.
The same logic applies to message/error/warning outputs, don’t let those through unless they’re part of what you’re documenting
Off-loading short/medium length explanations of functions to separate documents
What does this mean?
Wikipedia-style documentation does not work as a step-by-step
Citations are good, but don’t make every piece of background a citation rabbit hole
That works in academic papers, it never works in technical documentation
This format is occasionally useful if you’re running manual/blog/book style documentation and the link is within the same document or to something you have written yourself
Try not to disrupt the reader’s information pipeline by frequently changing the author that’s delivering the information
Poorly cropped/scaled screenshots
Dependency-required file formats (.docx, .xlsx, .hjsl, .gml, .Rmd, .R, .qmd)
Leaving the rationale for a process up to interpretation
Poor differentiation between code/algorithm and explanation
Code running off the page and becoming inaccessible
Missing seeds for random number generation processes
Poor documentation is just as bad as no documentation.
It’s always better to start with being overly detailed and scaling back.
Don’t default to what you’re comfortable with, try different methods and platforms whenever you can.
Everything you’ve seen here today was learned by using Google correctly. You don’t need a class to become good at this (or programming) you just need to put in effort.
You may not feel the outcomes of being a poor documenter right away, but eventually you will and it usually comes in the form of a capped salary. On occasion it can lose you a job.
2. Comments on how the process went
Troubleshooting processes you used
Errors you encountered
Alternative methods you may have tried