## Aims of this homework

• learn the basics of dealing with text data in R
• compute some text metrics
• build a linguistic representation of your own texts

In this homework, you will use your own text data for some initial linguistic analysis.

Select at least three assignments that you handed in as coursework in your BSc so far. To import the into R, either copy-and-paste them as raw text into a string variable, or save the raw text as a .txt file and read these files in.

Create a corpus of these texts using the quanteda package.

#your code here

Calculate the average number of characters per word and the average number of words per sentence.

#your code here

What is the type-token ratio (TTR) of your texts (each text individually)?

#your code here

Build a term frequency count representation, and retrieve the top features (hint: topfeatures) for each text.

#your code here

Now build a TF-IDF weighted representation of your corpus. Perform this transformation in five different ways: (1) based on the raw texts, (2) removing stopwords, (3) removing punctuation, (4) stemming the words, and (5) combining (2)-(4).

#your code here

#your code here