Week 07 Laboratory Exercises
Objectives
- learning how to access files
- learning how to work with binary data
- learning how to use lseek
Preparation
Before the lab you should re-read the relevant lecture slides and their accompanying examples.
Getting Started
Create a new directory for this lab called lab07
,
change to this directory, and fetch the provided code for this week
by running these commands:
mkdir lab07 cd lab07 1521 fetch lab07
Or, if you're not working on CSE, you can download the provided code as a zip file or a tar file.
Exercise — in pairs:
Create a File of Integers
Write a C program, create_integers_file
, which takes 3
arguments:
- a filename,
- the beginning of a range of integers, and
- the end of a range of integers;
and which creates a file of this name containing the specified integers. For example:
./create_integers_file fortytwo.txt 40 42 cat fortytwo.txt 40 41 42 ./create_integers_file a.txt 1 5 cat a.txt 1 2 3 4 5 ./create_integers_file 1000.txt 1 1000 wc 1000.txt 1000 1000 3893 1000.txt
Your program should print a suitable error message if given the wrong number of arguments, or if the file can not be created.
When you think your program is working, you can use
autotest
to run some simple automated tests:
1521 autotest create_integers_file
When you are finished working on this exercise, you and your lab
partner must both submit your work by running give
:
give cs1521 lab07_create_integers_file create_integers_file.c
Note, even though this is a pair exercise, you both must run
give
from your own account before
Wednesday 01 January 00:00
to obtain the marks for this lab exercise.
Exercise — in pairs:
Print the Bytes of A File
Write a C program, print_bytes
, which takes one
argument, a filename, and which should should read the specifed file
and print one line for each byte of the file. The line should show
the byte in decimal and hexadecimal. If that byte is a an ASCII
printable character, its ASCII value should also be printed.
Assume ASCII printable characters are those for which the
ctype.h
function isprint(3) returns a
non-zero value.
Follow the format in this example exactly:
echo "Hello Andrew!" >hello.txt ./print_bytes hello.txt byte 0: 72 0x48 'H' byte 1: 101 0x65 'e' byte 2: 108 0x6c 'l' byte 3: 108 0x6c 'l' byte 4: 111 0x6f 'o' byte 5: 32 0x20 ' ' byte 6: 65 0x41 'A' byte 7: 110 0x6e 'n' byte 8: 100 0x64 'd' byte 9: 114 0x72 'r' byte 10: 101 0x65 'e' byte 11: 119 0x77 'w' byte 12: 33 0x21 '!' byte 13: 10 0x0a
When you think your program is working, you can use
autotest
to run some simple automated tests:
1521 autotest print_bytes
When you are finished working on this exercise, you and your lab
partner must both submit your work by running give
:
give cs1521 lab07_print_bytes print_bytes.c
Note, even though this is a pair exercise, you both must run
give
from your own account before
Wednesday 01 January 00:00
to obtain the marks for this lab exercise.
Exercise — in pairs:
Create a Binary File
Write a C program, create_binary_file
, which takes at
least one argument: a filename, and subsequently, integers in the
range 0…255 inclusive specifying byte values. It should create a
file of the specified name, containing the specified bytes. For
example:
./create_binary_file hello.txt 72 101 108 108 111 33 10 cat hello.txt Hello! ./create_binary_file count.binary 1 2 3 251 252 253 254 255 ./print_bytes count.binary byte 0: 1 0x01 byte 1: 2 0x02 byte 2: 3 0x03 byte 3: 251 0xfb byte 4: 252 0xfc byte 5: 253 0xfd byte 6: 254 0xfe byte 7: 255 0xff ./create_binary_file 4_bytes.binary 222 173 190 239 ./print_bytes "%02X\n"' 4_bytes.binary byte 0: 222 0xde byte 1: 173 0xad byte 2: 190 0xbe byte 3: 239 0xef
Your program should print a suitable error message if given the wrong number of arguments, or if the file can not be created.
When you think your program is working, you can use
autotest
to run some simple automated tests:
1521 autotest create_binary_file
When you are finished working on this exercise, you and your lab
partner must both submit your work by running give
:
give cs1521 lab07_create_binary_file create_binary_file.c
Note, even though this is a pair exercise, you both must run
give
from your own account before
Wednesday 01 January 00:00
to obtain the marks for this lab exercise.
Exercise — in pairs:
Extract ASCII from a Binary File
We are distributing programs as binaries, and would like to know what if the C compiler is leaving any confidential information in the binaries as ASCII strings.
Only 95 of 256 byte values correspond to printable ASCII characters, so several byte values in a row corresponding to printable characters probably will occur infrequently in non-ASCII data. There is only a 2% chance that four (independent, uniform) random byte values will correspond to ASCII printable characters.
Write a C program, hidden_strings
, which takes one
argument, a filename; it should read that file, and print all
sequences of length 4 or longer of consecutive byte values
corresponding to printable ASCII characters. In other words, your
program should read through the bytes of the file, and if it finds 4
bytes in a row containing printable characters, it should print
those bytes, and any following bytes containing ASCII printable
characters.
Print each sequence on a separate line.
Assume ASCII printable characters are those for which the
ctype.h
function isprint(3)
returns a non-zero value.
Do not read the entire file into an array.
Use the create_binary_file
program from the previous
exercise to create simple test data. For example:
dcc hidden_strings.c -o hidden_strings ./create_binary_file test_file 72 101 108 108 111 255 255 65 110 100 114 101 119 ./hidden_strings test_file Hello Andrew
When you think your program is working, try extracting strings from a compiled binary. For example:
cat secret.c #define secret_hash_define 1 // secret comment int secret_global_variable; int main(void) { int secret_local_variable; char *s = "secret string"; } int secret_function_name() { } gcc secret.c -o binary1 gcc secret.c -g -o binary2 gcc secret.c -s -o binary3 ./hidden_strings binary1 /lib64/ld-linux-x86-64.so.2 libc.so.6 __cxa_finalize __libc_start_main GLIBC_2.2.5 ... ./hidden_strings binary1|grep secret secret string secret.c secret_function_name secret_global_variable ./hidden_strings binary2|grep secret secret string secret.c secret.c secret_global_variable secret_function_name secret_local_variable secret.c secret_function_name secret_global_variable ./hidden_strings binary3|grep secret secret string
The above example shows that, by default, gcc(1) leaves function names, global variables names and the filename in the binary.
If you specify the -g
command line option, variable
names are also left in the binary. This is part of information left
for debuggers such as gdb(1) (which dcc uses). This
information allows debuggers to print the current value of
variables.
If you specify the -s
command line option, all names
are stripped from the binary but the string remains.
When you think your program is working, you can use
autotest
to run some simple automated tests:
1521 autotest hidden_strings
When you are finished working on this exercise, you and your lab
partner must both submit your work by running give
:
give cs1521 lab07_hidden_strings hidden_strings.c
Note, even though this is a pair exercise, you both must run
give
from your own account before
Wednesday 01 January 00:00
to obtain the marks for this lab exercise.
Challenge Exercise — individual:
Print The Last line of Huge Files
Write a C program, last_line
, which takes one argument,
a filename, and which should print the last line of that file. For
example:
dcc last_line.c -o last_line echo -e 'hello\ngood bye' >hello_goodbye.txt cat hello_goodbye.txt hello good bye ./last_line hello_goodbye.txt good bye
You program should not assume the last byte of the file is a newline character.
echo -n -e 'hello\ngoodbye' >no_last_newline.txt hello goodbye./last_line no_last_newline.txt goodbye
Your program should handle extremely large files. It should not read the entire file. As this is a challenge exercise, marks will not be awarded for programs which read the entire file.
For example, it should be able to print the last line of a one-terabyte file:
echo -e 'Hello\nGood Bye'|dd status=none seek=1T bs=1 of=/tmp/gigantic_file$$ ls -l /tmp/gigantic_file$$ -rw-r--r-- 1 z5555555 z5555555 1099511627791 Oct 26 17:27 gigantic_file12345 ./last_line /tmp/gigantic_file$$ Good Bye
The gigantic file created above is a sparse file, consisting almost entirely of zero bytes: it uses little actual disk space, but, to be safe, remove it when you finish the exercise.
The commands above create the sparse file in /tmp to avoid it accidentally being backed up or otherwise copied.
Sparse files can create problems if they are accidentally copied by a program which doesn't handle them specially — and most programs don't.
BTW the $$ in the above command is replaced by the shell process id. This is because /tmp is shared so we'd like to use a filename that is (more or less) unique.
When you think your program is working, you can use
autotest
to run some simple automated tests:
1521 autotest last_line
When you are finished working on this exercise, you must submit your
work by running give
:
give cs1521 lab07_last_line last_line.c
You must run give
before Wednesday 01 January 00:00 to obtain the
marks for this lab exercise. Note that this is an individual
exercise, the work you submit with give
must be
entirely your own.
Submission
give
.
You can run give
multiple times. Only your last
submission will be marked.
Don't submit any exercises you haven't attempted.
If you are working at home, you may find it more convenient to upload your work via give's web interface.
Remember you have until Wednesday 01 January 00:00 to submit your work.
You cannot obtain marks by e-mailing your code to tutors or lecturers.
You check the files you have submitted here.
Automarking will be run by the lecturer several days after the
submission deadline, using test cases different to those
autotest
runs for you. (Hint: do your own testing as
well as runningautotest
.)
After automarking is run by the lecturer you can view your results here. The resulting mark will also be available via give's web interface.
Lab Marks
When all components of a lab are automarked you should be able to view the the marks via give's web interface or by running this command on a CSE machine:
1521 classrun -sturec