\(\mathbf{A}_i\) and \(\mathbf{B}_i\) are the i-th pair of vectors from the two sets.
\(\mathbf{A}_i \cdot \mathbf{B}_i\) is the dot product of the vectors.
\(||\mathbf{A}_i||\) and \(||\mathbf{B}_i||\) are the magnitudes of the vectors.
\(N\) is the total number of vector pairs.
Seed words
Select four dimensions based on secondary scholarship + close reading:
Exoticism
Irrationality
Despotism
Eroticism
Expand using gensim of Google News corpus + manual classification.
For each set of words:
Calculate cosine similarity between the word “Arab” and each individual word in each dimension.
S1 <- exotic_strange_wordsA1 <-c("arab")P_all <-data.frame(matrix(ncol=0, nrow=length(S1)))for (i inseq_along(wordvecs.dat)) { embedding <-as.matrix(wordvecs.dat[[i]]) x <-mac(embedding, S1, A1) #this function calculates MAC P <-as.data.frame(x$P)colnames(P) <- decade_names[[i]] P_all <-cbind(P_all, P)}
For each set of words:
P_all <- P_all %>%rownames_to_column("word")P_all_long <- P_all %>%gather(year, value, -c("word"))P_all_long$year <-as.numeric(P_all_long$year)# this section is essentially doing what `mac_es()` would do if called in loopP_all_mean <- P_all_long %>%group_by(year) %>%summarise(mean_sim =mean(value, na.rm = T), #note: averaging across non-NA valuesword ="average")
Results
Results
Results
Results
Intepretation
“I have already lost, Kingdom after Kingdom, province after province, the more beautiful half of the universe, and soon I will know of no place in which I can find a refuge for my dreams; but it is Egypt that I most regret having driven out of my imagination, now that I have sadly placed it in my memory.” (Nerval, cited in Said, 100)
“Unable to recognize”its” Orient in the new Third World, Orientalism now faced a challenging and politically armed Orient.” (Said 1978, 104)