This assignment and exercises at the bioinformatics bootcamp will require the use of the command line interface (CLI) with a Unix shell. On Unix-like operating systems such as macOS and Linux, this is as easy as launching the Terminal application. If you use a Windows operating system, you can use a terminal emulator (e.g. PuTTY) to connect remotely to a Linux machine, or install the Linux subsystem for Windows 10 (see Windows user guide to accessing the Alabama Super Computer).
Although certainly not required, we recommend that you purchase a copy of Practical Computing for Biologists by Steven Haddock and Casey Dunn. It is a well written book that introduces a number of Unix shell topics and will serve as an excellent reference for scientific computing in general.
Alternatively, there are numerous, excellent Unix learning resources online. A few very good, brief introduction to Unix commands and the shell environment are listed below:
You’ll likely be able to find many more out there (Google is your friend!) in addition to resources on Linux mainstays such as grep
, awk
, sed
and shell scripting, all of which we will be using during the Bootcamp.
Tip: many Unix commands have incredibly complicated options, don’t let that stop you from learning and understanding their simplest use case.
If available, read through Chapters 4 and 5 of the Haddock and Dunn textbook. Following that, go through the online tutorial at this link: http://www.ee.surrey.ac.uk/Teaching/Unix/unix0.html. Complete sections 1-6 and 8. Skip over the “quota” command at the start of section 6 as well as sections 8.5 and 8.6 as they are not that relevant at this point.
As a supplemental activity, please read over this page: Accessing a Remote Computer (additionally, Chapter 20 of Haddock and Dunn)
history -c
Create a directory titled with your first and last name: <NAME>_Bootcamp_Assignment
in the Desktop folder of your user account. Navigate into this directory and work on the project from there.ifconfig
to a file named in the following fashion: <NAME>_Bootcamp_Assignment.ifconfig
(for example:Smith_Bootcamp_Assignment.ifconfig).ps aux
, df .
, du -sh *
and whoami
. For du -sh *
, you will want to: 1) change back to your home directory to get the total sizes of the various folders in your filesystem but 2) write the output to the directory that you are working in for your project. Be sure to name each file as above, changing the extension of the file to match the name of the command that generated that output e.g. Smith_Bootcamp_Assignment.ps, Smith_Bootcamp_Assignment.df.<NAME>_Bootcamp_Assignment.system
<NAME>_Bootcamp_Assignment
directory named NAME_Sysinfo
(for example, Smith_Sysinfo). Move the five smaller files and the combined .system file into this new directory using mv
and wildcards.<NAME>_partA.history
(for example: Smith_partA.history). Place this file in the same directory as the six files from above.history -c
<NAME>_Bootcamp_Assignment
called <NAME>_GenbankData
(for example, Smith_GenbankData). Move into this directory to start Part B.DinoPro.fasta
from the “student” account at the address the-santos-lab.dynu.net using scp
. Specifically, the DinoPro.fasta
file is located in the homework
directory of the “student” account’s home directory. For a refresher on scp, see: Accessing a Remote Computergrep
to extract lines with the pattern >gi
in the file DinoPro.fasta
and direct this output to a file titled AllEntries.output
. Examine the contents of this file with the utility less.grep
to search for the term Symbiodinium
in the file AllEntries.output
and send this output to a file titled SymEntries.output
. Also examine the contents of this file with less.grep
to exclude entries with the term Symbiodinium
in the AllEntries.output
file and send this output to a file titled NonSymEntries.output
. Also examine the contents of this file with less
.wc
to obtain the number of lines in AllEntries.output
, SymEntries.output
and NonSymEntries.output
. Note if the line numbers in SymEntries.output
+ NonSymEntries.output
= AllEntries.output
. Send the output from the three separate wc
commands to a file titled Values_DinoPro_Fasta.output
in the following order: SymEntries.output
, NonSymEntries.output
, AllEntries.output
DinoPro.fasta
file, followed by what information was extracted and parsed among the various result files. Add this text file to the <NAME>_GenbankData
directory (if it’s not already there) and name it <NAME>_Methodology.txt
.<NAME>_partB.history
(for example, Smith_partB.history). Save this file in the same directory as the files you just worked with i.e., <NAME>_GenbankData
.<NAME>_Bootcamp_Assignment
folder for submission by using tar. While in this directory, execute the following: tar -czf <NAME>_Bootcamp_Assignment.tar <NAME>_SysInfo <NAME>_GenbankData
This will create a compressed file of the directories containing your work (for example, Smith_Bootcamp_Assignment.tar).scp
to send your <NAME>_Bootcamp_Assignment.tar
file to the “student” account at the address the-santos-lab.dynu.net and place it in the completed_assignments
directory. Any concern your last name might be the same as another participant? Tack on your initials to your last name in the command you use above during the homework submission.For the sessions relating to Data Visualization, we will be working with software installed directly on your own computer. The three software are all free to you and compatible with all platforms (Mac, Windows, and Linux). In order to participate in the exercises, you will need to download and install each software to your own computer using the instructions below. Please open each once installed to make sure you do not encounter any errors as this will delay your ability to follow along.