1. Which of the following is of the type float
?:
7.4
2. Match the following variables with their type:
var1 = 54
integer
var2 = [1,2,7,1,24]
list
var3 = 2.98
float
var4 = True
boolean
All literals have a type:
type(5.0)
float
Used to store values and to assign them a name.
a = 3.14
a + 2
5.140000000000001
A collection of values.
x = [1,5,3,7,8]
y = ['a','b','c']
type(y)
z = [1, 2, 3, 'a', 'b']
4. What happens if you do [1,2,5,11] + [87,2,43,3]
?
[1,2,5,11,87,2,43,3]
The lists will be concatenated
5. How do you find out if the variable x
is present in a the list mylist
?
Two answers correct:
x in mylist
for l in mylist:
if l == x:
print('Found a match')
6. How do you find out if 5 is larger than 3 and the integer 4 is the same as the float 4? Fill in all the missing code.
5 > 3 and 4 == 4.0
a = 2
b = 5.46
c = [1,2,3,4]
d = [5,6,7,8]
e = 7
c * b
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [11], in <cell line: 6>() 4 d = [5,6,7,8] 5 e = 7 ----> 6 c * b TypeError: can't multiply sequence by non-int of type 'float'
a = [1,2,3,4,5,6,7,8]
b = 5
c = 10
b in a
b < c or c == 1
b not in a
False
7. How do you select the second element in the variable mylist = [4,3,8,10]
?
mylist[1]
8. Pair the following variables with whether they are mutable or immutable
var1 = 'my pretty string'
immutable
var2 = [1,2,3,4,5]
mutable
var3 = "hello world"
immutable
var4 = ['a', 'b', 'c', 'd']
mutable
9. Which of the following types are iterable?
Lists and strings
Lists (and strings) are an ORDERED collection of elements where every element can be access through an index.
a[0]
: first item in list a
REMEMBER! Indexing starts at 0 in python
a = [1,2,3,4,5]
b = ['a','b','c']
c = 'a random string'
c[2]
c[1:4]
' ra'
Lists are mutable object, meaning you can use an index to change the list, while strings are immutable and therefore not changeable.
An iterable sequence is anything you can loop over, ie, lists and strings.
a = [1,2,3,4,5] # mutable
b = ['a','b','c'] # mutable
c = 'a random string' # immutable
c[0] = 'A'
#a[0] = 42
c
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [19], in <cell line: 5>() 2 b = ['a','b','c'] # mutable 3 c = 'a random string' # immutable ----> 5 c[0] = 'A' 6 #a[0] = 42 7 c TypeError: 'str' object does not support item assignment
tuples
¶myTuple = (1,2,3,4,'a','b','c')
myTuple[0] = 42
#print(myTuple)
print(len(myTuple))
#for i in myTuple:
# print(i)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [24], in <cell line: 2>() 1 myTuple = (1,2,3,4,'a','b','c') ----> 2 myTuple[0] = 42 3 #print(myTuple) 4 print(len(myTuple)) TypeError: 'tuple' object does not support item assignment
10. How do you do to print ‘Yes’ if x is bigger than y?
if x > y:
print('Yes')
a = 3
b = [1,2,3,4]
if a in b:
print(str(a)+' is found in the list b')
else:
print(str(a)+' is not in the list')
3 is found in the list b
How do you open a file handle to read a file called ‘somerandomfile.txt’?
fh = open('somerandomfile.txt')
The file in the previous question contains several lines, how do you print each line?
for line in fh:
print(line)
for row in fh:
print(row)
fh = open('../files/somerandomfile.txt','r', encoding = 'utf-8')
for line in fh:
print(line.strip())
fh.close()
just a strange file with some nonsense lines
numbers = [5,6,7,8]
i = 0
while i < len(numbers):
print(numbers[i])
i += 1
5 6 7 8
Problem:
You have a VCF file with a larger number of samples. You are interested in only one of the samples (sample1) and one region (chr5, 1.000.000-1.005.000). What you want to know is whether this sample has any variants in this region, and if so, what variants.
Pseudocode is a description of what you want to do without actually using proper syntax
- Open file and loop over lines (ignore lines starting with #)
fh = open('/mnt/c/Users/Nina/Documents/courses/Python_Beginner_Course/genotypes.vcf', 'r', encoding = 'utf-8')
for line in fh:
if not line.startswith('#'):
print(line.strip())
break
fh.close()
# Next, find chromosome 5
1 10492 . C T 550.31 LOW_VQSLOD AN=26;AC=2 GT:AD:DP:GQ:PGT:PID:PL ./.:0,0:0:.:.:.:. ./.:0,0:0:.:.:.:. ./.:0,0:0:.:.:.:. ./.:0,0:0:.:.:.:. ./.:0,0:0:.:.:.:. 0/1:12,7:19:99:0|1:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:196,0,340 ./.:0,0:0:.:.:.:. ./.:0,0:0:.:.:.:. ./.:0,0:0:.:.:.:. ./.:0,0:0:.:.:.:. 0/1:18,4:22:48:.:.:48,0,504 ./.:0,0:0:.:.:.:. ./.:0,0:0:.:.:.:.
- Identify lines where chromosome is 5 and position is between 1.000.000 and 1.005.000
fh = open('/mnt/c/Users/Nina/Documents/courses/Python_Beginner_Course/genotypes.vcf', 'r', encoding = 'utf-8')
for line in fh:
if not line.startswith('#'):
cols = line.strip().split('\t')
if cols[0] == '5':
print(cols)
break
fh.close()
# Next, find the correct region
['5', '12041', '.', 'A', 'T', '18075.2', 'PASS', 'AN=26;AC=2', 'GT:AD:DP:GQ:PL', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', '0/1:15,6:21:99:142,0,391', './.:0,0:0:.:.', '0/1:16,17:33:99:442,0,422']
fh = open('/mnt/c/Users/Nina/Documents/courses/Python_Beginner_Course/genotypes.vcf', 'r', encoding = 'utf-8')
for line in fh:
if not line.startswith('#'):
cols = line.strip().split('\t')
if cols[0] == '5' and \
int(cols[1]) >= 1000000 and int(cols[1]) <= 1005000:
print(cols)
break
fh.close()
# Next, find the genotypes for sample1
['5', '1000080', '.', 'A', 'T', '2557.1', 'PASS', 'AN=26;AC=2', 'GT:AD:DP:GQ:PL', '0/1:15,18:33:99:489,0,357', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', '0/1:21,19:40:99:481,0,542', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.', './.:0,0:0:.:.']
- Isolate the column that contains the genotype for sample1
fh = open('/mnt/c/Users/Nina/Documents/courses/Python_Beginner_Course/genotypes.vcf', 'r', encoding = 'utf-8')
for line in fh:
if not line.startswith('#'):
cols = line.strip().split('\t')
if cols[0] == '5' and \
int(cols[1]) >= 1000000 and int(cols[1]) <= 1005000:
geno = cols[9]
print(geno)
break
fh.close()
# Next, extract the genotypes only
0/1:15,18:33:99:489,0,357
- Extract the genotypes only from the column
fh = open('/mnt/c/Users/Nina/Documents/courses/Python_Beginner_Course/genotypes.vcf', 'r', encoding = 'utf-8')
for line in fh:
if not line.startswith('#'):
cols = line.strip().split('\t')
if cols[0] == '5' and \
int(cols[1]) >= 1000000 and int(cols[1]) <= 1005000:
geno = cols[9].split(':')[0]
print(geno)
break
fh.close()
# Next, find in which positions sample1 has alternate alleles
0/1
- Check if the genotype contains any alternate alleles
fh = open('/mnt/c/Users/Nina/Documents/courses/Python_Beginner_Course/genotypes.vcf', 'r', encoding = 'utf-8')
for line in fh:
if not line.startswith('#'):
cols = line.strip().split('\t')
if cols[0] == '5' and \
int(cols[1]) >= 1000000 and int(cols[1]) <= 1005000:
geno = cols[9].split(':')[0]
if geno in ['0/1', '1/1']:
print(geno)
fh.close()
#Next, print nicely
0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1
- Print any variants containing alternate alleles for this sample between specified region
fh = open('/mnt/c/Users/Nina/Documents/courses/Python_Beginner_Course/genotypes.vcf', 'r', encoding = 'utf-8')
res = []
for line in fh:
if not line.startswith('#'):
cols = line.strip().split('\t')
if cols[0] == '5' and \
int(cols[1]) >= 1000000 and int(cols[1]) <= 1005000:
geno = cols[9].split(':')[0]
if geno in ['0/1', '1/1']:
var = cols[0]+':'+cols[1]+'_'+cols[3]+'-'+cols[4]
# print(var+' has genotype: '+geno)
res.append(var)
fh.close()
print(res)
['5:1000080_A-T', '5:1000156_G-A', '5:1001097_C-A', '5:1001193_C-T', '5:1001245_T-C', '5:1001339_C-T', '5:1001344_G-C', '5:1001683_G-T', '5:1001755_G-A', '5:1002374_G-A', '5:1002382_G-C', '5:1002620_T-C', '5:1002722_G-A', '5:1002819_C-A', '5:1003043_G-T', '5:1003099_C-T', '5:1003135_G-A', '5:1004648_A-G', '5:1004650_A-C', '5:1004665_A-G', '5:1004702_G-T', '5:1004879_T-C']
→ Exercises Day 2
3 options:
Level of complexity increases with each exercises
New to programming: Do Green exercise and possibly Yellow exercise + ChatGPT
More experienced: Do Yellow exercise and/or Red exercise + ChatGPT
What is the difference between a function
and a method
?
A method
always belongs to an object of a specific class, a function
does not have to. For example:
print('a string')
and print(42)
both works, even though one is a string and one is an integer
'a string '.strip()
works, but [1,2,3,4].strip()
does not work. strip()
is a method that only works on strings
What does it matter to me?
For now, you mostly need to be aware of the difference, and know the different syntaxes:
A function:
functionName()
A method:
<object>.methodName()
len([1,2,3])
len('a string')
'a string '.strip()
[1,2,3].strip()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Input In [41], in <cell line: 5>() 2 len('a string') 4 'a string '.strip() ----> 5 [1,2,3].strip() AttributeError: 'list' object has no attribute 'strip'
abs(-5)
5
sum([1,2,35,23,88,4])
153
sum([1,2,3,4],10)
20
b = round(3.234556, 2)
a = 'my string'
print(b)
3.23
' spaciou sWith5678.com'.strip('mo.c')
' spaciou sWith5678'
' spaciou sWith5678.com\n'.lstrip()
'spaciou sWith5678.com\n'
' spaciou sWith5678.com\n'.rstrip()
' spaciou sWith5678.com'
a = ' split a string into a list '
a.split(maxsplit=3)
['split', 'a', 'string', 'into a list ']
'|'.join('a string already')
' '.join(['a', 'b', 'c', 'd'])
#' '.join([1,2,3])
'a b c d'
'long string'.startswith('ng', 2)
#'long string'.endswith('nt')
True
'LongRandomString'.lower()
'LongRandomString'.upper()
'LONGRANDOMSTRING'
a = [1,2,3,4,5,5,5,5]
a.append(6)
a.pop(2)
a.reverse()
a.remove(5)
b = (1,2,3,4)
c = [1,2,3,4]
c.append(5)
c
→ Exercises Day 2
3 options:
Level of complexity increases with each exercises
New to programming: Do Green exercise and possibly Yellow exercisev + ChatGPT
More experienced: Do Yellow exercise and/or Red exercise + ChatGPT
Download the 250.imdb file from the course website
This format of this file is:
# Votes | Rating | Year | Runtime | URL | Genres | Title
fh = open('../downloads/250.imdb', 'r', encoding = 'utf-8')
best = [0,''] # here we save the rating and which movie
for line in fh:
if not line.startswith('#'):
cols = line.strip().split('|')
rating = float(cols[1].strip())
if rating > best[0]: # if the rating is higher than previous highest, update best
best = [rating,cols[6]]
fh.close()
print(best)