class: center, middle, inverse, title-slide .title[ # News and Market Sentiment Analytics ] .subtitle[ ## Lecture 7: RAGs ] .author[ ### Christian Vedel,
Department of Economics
Email:
christian-vs@sam.sdu.dk
] .date[ ### Updated 2025-12-12 ] --- <style type="text/css"> .pull-left { float: left; width: 44%; } .pull-right { float: right; width: 44%; } .pull-right ~ p { clear: both; } .pull-left-wide { float: left; width: 66%; } .pull-right-wide { float: right; width: 66%; } .pull-right-wide ~ p { clear: both; } .pull-left-narrow { float: left; width: 30%; } .pull-right-narrow { float: right; width: 30%; } .small123 { font-size: 0.80em; } .large123 { font-size: 2em; } .red { color: red } </style> # Last time .pull-left[ **Two tedious common tasks:** - Record Linking - Topic Modeling ] .pull-right-narrow[  ] --- class: middle # Today's lecture .pull-left[ - RAGs: Back to the original purpose - Course evaluation ] .pull-right-narrow[  ] --- # Motivaiton for the course .pull-left[ - *It's all about information* - Text is data. Data is text. - Applications: Business, Finance, Research, Software Development, Psychology, Linguistics, Business Intelligence, Policy Evaluation - In a world of LLMs you want to understand what is happening behind the fancy interface - The techniques live inside crazy sci-fi technologies, that you can learn. E.g. [literal mind reading](https://www.nature.com/articles/s41593-023-01304-9) ] .pull-right[  *Counter-example to what we will be doing* *[[xkcd.com/1838]](xkcd.com/1838)* ] --- # Efficient Market Hypothesis (EMH) .pull-left[ *If you can buy cheaply something which is more valuable than its price, you can profit (arbitrage). But ...* 1. We all have access to the same information 2. Everyone is interested in maximum profits 3. Everyone buys the undervalued assets 4. Prices increases until the advantages is gone ] .pull-right[  ### Implications: - If any 'whole' in the market occurs it is closed within milliseconds - We cannot invest better than anyone else - Investing in what a chicken poops on is better than listening to an 'expert' ] --- # Information Extraction in Financial Markets .pull-left[ - The NLP promise is to be able to extract information - News about firms changes the expected returns on assets - [NVIDIA News](https://youtu.be/U0dHDr0WFmQ?si=EKAQgBiTekQhxl0u) - If the expected returns on an asset increases or decreases there is an opportunity for arbitrage if prices have not changed - NLP tools potentially allow us to be ahead - EMH dissallows open source implementations: You need to improve on the methods you get here - **Timing is key** ] .pull-right[  ] --- class: middle # RAGs - The modern approach to sentiment analysis - Potent context awareness - 'simple' networks of LLMs --- class: middle # RAG basics - **Query**: Question you want answered. E.g. "Is the market bullish or bearish?" - **Augmentation**: We want to augment this query with relevant information - **Retrieval**: We retrieve relevant documents for the query - **Generation**: We use this information to generate an answer. E.g. "bull" - **R A G** --- class: middle # RAG applications - Sentiment analysis - Systematic document review - Legal discovery / case review - LLM magic --- class: middle # Retrieval - **Library:** Need to define and source a library. E.g. all press releases, etc. - For any given query, we need to find a relevant match document in the library - Approaches: Lessons from [Record Linking](https://raw.githack.com/christianvedels/News_and_Market_Sentiment_Analytics/refs/heads/main/Lecture%206%20-%20Record%20linking/Slides.html) + Searching embedding space --- class: middle # Retrieval - **Query**: "How is NVIDIA doing?" - **Library**: Press releases. Among which "NVIDIA releases devastating news" - **Embedding approach**: + Embed the entire library + Embed the query + Find minimum cosine distance - **Other approaches**: + Run entity recognition from e.g. spaCy - **Trick**: Embed everything in the library beforehand and store it in a vector library --- class: middle # Augementation - You simply prompt the LLM with the added documents as context. - You can consider prompt tuning as in [arxiv.org/abs/2310.04027](https://arxiv.org/abs/2310.04027) --- class: middle .pull-left-wide[ # Generation - It is hard to get LLMs to behave: + *"Please for the love of everything you hold dear, just say 'yes' or 'no'. Otherwise I will forever hold you in contempt and fire you in shame. If you write ONLY 'yes' or 'no', you are my favorite employee and I will give you a huge bonus"* - (*Or restrict output space - available in some models.*) ] --- class: middle # A very simple RAG ### ['SDU News RAG'](https://github.com/christianvedels/News_and_Market_Sentiment_Analytics/tree/main/Lecture%207%20-%20RAGs/Code) #### Basic structure: - Query: Questions about the market - Library: News headlines - Retrieval: Base on BERT embeddings - Generation: + Expert (analysis) `\(\rightarrow\)` Censor `\(\rightarrow\)` Expert (conclusion) `\(\rightarrow\)` Zero-shot mapping to 'bear', 'bull', 'neural' .footnote[ Confused about classes? Check [Code/Simple_class_example.py](https://github.com/christianvedels/News_and_Market_Sentiment_Analytics/blob/main/Lecture%207%20-%20RAGs/Code/simple_class_example.py) ] --- class: middle # Course evaluation .pull-left[ - **Keep, Add, Drop (KAD)** - [Link to online form](https://forms.gle/RRnTbEsdBzKYrtrWA) ] .pull-right[  *A 'KAD'* ] --- class: middle # Exam workshop .pull-left[ - Friday December 19th 10:15-12:00 + **Also available:** Tuesday December 16th 10:30-12:00 - Zoom link will be available on itslearning ### Format of the workshop - 5-10 min. presentation on what you are working on - Room for asking questions from your peers - **This will not impact your grade in any way** - My feedback will be limited but I will try to facilitate useful peer feedback. ] .pull-right[ ### What you have to do 1. Have a look at the exam 2. Decide on a project you want to work on 3. Send me an email before Friday [christian-vs@sam.sdu.dk] + "Hi Christian, I will be working on [x]. I would like to give a 5-10 min. presentation on what I have been working on, on Monday". - If noone is interested, we will of course just cancel. + No shame if it is not useful to you. Then I believe, that you know how to spend your exam time most wisely. ]