This assignment and exercises at the bioinformatics bootcamp will require the use of the command line interface (CLI) with a Unix shell. On Unix-like operating systems such as macOS and Linux, this is as easy as launching the Terminal application. If you use a Windows operating system, you can use a terminal emulator (e.g. PuTTY) to connect remotely to a Linux machine, or install the Linux subsystem for Windows 10 (see Windows user guide to accessing the Alabama Super Computer).
Although certainly not required, we recommend that you purchase a copy of Practical Computing for Biologists by Steven Haddock and Casey Dunn. It is a well written book that introduces a number of Unix shell topics and will serve as an excellent reference for scientific computing in general.
Alternatively, there are numerous, excellent Unix learning resources online. A few very good, brief introduction to Unix commands and the shell environment are listed below:
You’ll likely be able to find many more out there (Google is your friend!) in addition to resources on Linux mainstays such as grep, awk, sed and shell scripting, all of which we will be using during the Bootcamp.
Tip: many Unix commands have incredibly complicated options, don’t let that stop you from learning and understanding their simplest use case.
If available, read through Chapters 4 and 5 of the Haddock and Dunn textbook. Following that, go through the online tutorial at this link: http://www.ee.surrey.ac.uk/Teaching/Unix/unix0.html. Complete sections 1-6 and 8. Skip over the “quota” command at the start of section 6 as well as sections 8.5 and 8.6 as they are not that relevant at this point.
As a supplemental activity, please read over this page: Accessing a Remote Computer (additionally, Chapter 20 of Haddock and Dunn)
history -c Create a directory titled with your first and last name: <NAME>_Bootcamp_Assignment in the Desktop folder of your user account. Navigate into this directory and work on the project from there.ifconfig to a file named in the following fashion: <NAME>_Bootcamp_Assignment.ifconfig (for example:Smith_Bootcamp_Assignment.ifconfig).ps aux, df ., du -sh * and whoami. For du -sh *, you will want to: 1) change back to your home directory to get the total sizes of the various folders in your filesystem but 2) write the output to the directory that you are working in for your project. Be sure to name each file as above, changing the extension of the file to match the name of the command that generated that output e.g. Smith_Bootcamp_Assignment.ps, Smith_Bootcamp_Assignment.df.<NAME>_Bootcamp_Assignment.system<NAME>_Bootcamp_Assignment directory named NAME_Sysinfo (for example, Smith_Sysinfo). Move the five smaller files and the combined .system file into this new directory using mv and wildcards.<NAME>_partA.history (for example: Smith_partA.history). Place this file in the same directory as the six files from above.history -c<NAME>_Bootcamp_Assignment called <NAME>_GenbankData (for example, Smith_GenbankData). Move into this directory to start Part B.DinoPro.fasta from the “student” account at the address the-santos-lab.dynu.net using scp. Specifically, the DinoPro.fasta file is located in the homework directory of the “student” account’s home directory. For a refresher on scp, see: Accessing a Remote Computergrep to extract lines with the pattern >gi in the file DinoPro.fasta and direct this output to a file titled AllEntries.output. Examine the contents of this file with the utility less.grep to search for the term Symbiodinium in the file AllEntries.output and send this output to a file titled SymEntries.output. Also examine the contents of this file with less.grep to exclude entries with the term Symbiodinium in the AllEntries.output file and send this output to a file titled NonSymEntries.output. Also examine the contents of this file with less.wc to obtain the number of lines in AllEntries.output, SymEntries.output and NonSymEntries.output. Note if the line numbers in SymEntries.output + NonSymEntries.output = AllEntries.output. Send the output from the three separate wc commands to a file titled Values_DinoPro_Fasta.output in the following order: SymEntries.output, NonSymEntries.output, AllEntries.outputDinoPro.fasta file, followed by what information was extracted and parsed among the various result files. Add this text file to the <NAME>_GenbankData directory (if it’s not already there) and name it <NAME>_Methodology.txt.<NAME>_partB.history (for example, Smith_partB.history). Save this file in the same directory as the files you just worked with i.e., <NAME>_GenbankData.<NAME>_Bootcamp_Assignment folder for submission by using tar. While in this directory, execute the following: tar -czf <NAME>_Bootcamp_Assignment.tar <NAME>_SysInfo <NAME>_GenbankData This will create a compressed file of the directories containing your work (for example, Smith_Bootcamp_Assignment.tar).scp to send your <NAME>_Bootcamp_Assignment.tar file to the “student” account at the address the-santos-lab.dynu.net and place it in the completed_assignments directory. Any concern your last name might be the same as another participant? Tack on your initials to your last name in the command you use above during the homework submission.For the sessions relating to Data Visualization, we will be working with software installed directly on your own computer. The three software are all free to you and compatible with all platforms (Mac, Windows, and Linux). In order to participate in the exercises, you will need to download and install each software to your own computer using the instructions below. Please open each once installed to make sure you do not encounter any errors as this will delay your ability to follow along.