News
Summary of the deadlines for this monthWritten on 06.12.24 by Annika Hass Dear students,
Thanks for today, we’ll get in touch for a meeting with you soon.
As discussed today, here again the important deadlines before Christmas:
One-pager until Wednesday December 18th to all of us.
Project pitch on December 20th: around 10 min each to have… Read more Dear students,
Thanks for today, we’ll get in touch for a meeting with you soon.
As discussed today, here again the important deadlines before Christmas:
One-pager until Wednesday December 18th to all of us.
Project pitch on December 20th: around 10 min each to have time for the discussion.
Have a nice weekend,
Annika |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Some links re things we discussed todayWritten on 06.12.24 by Ingmar Weber To celebrate St. Nicholas: Here is a bit about German traditions related to Dec 6:
And here is a particularly “interesting” St. Nicholas legend from the… Read more To celebrate St. Nicholas: Here is a bit about German traditions related to Dec 6:
And here is a particularly “interesting” St. Nicholas legend from the French city of Nancy: https://www.nancy-tourisme.fr/en/discover-nancy/the-saint-nicholas-in-nancy/the-legend/ https://frenchmoments.eu/la-legende-de-saint-nicolas/
The chatbot Tay was mentioned. Here is a bit of background on that: https://en.wikipedia.org/wiki/Tay_(chatbot) https://www.bbc.com/news/technology-35902104
And here is the “LLM alone better than doctors” study: https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395 https://qz.com/chatgpt-beat-doctors-at-diagnosing-diseases-1851701953
Have a nice weekend, Ingmar |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Project Idea SubmissionsWritten on 03.12.24 (last change on 03.12.24) by Till Koebe Dear all, a kind reminder to submit your project ideas as soon as possible, ideally by Wednesday, Dec 4th and latest by Friday, Dec 6th. Please send an email with your team members' names, the project idea and a brief project description (incl. research question and the kind of data you intend to… Read more Dear all, a kind reminder to submit your project ideas as soon as possible, ideally by Wednesday, Dec 4th and latest by Friday, Dec 6th. Please send an email with your team members' names, the project idea and a brief project description (incl. research question and the kind of data you intend to use for that) to till.koebe@uni-saarland.de. While we encourage you to develop your own project idea, please find in the materials section of the CMS a list of projects that we propose and that you can choose to work on. In case you opt for a pre-defined project, please share with us your top-three projects, so we can again allocate project ideas based on your stated preferences. Please get in touch with us in case you have any questions.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Some dataset pointers and project commentsWritten on 29.11.24 by Ingmar Weber There are some high-level pointers re datasets here: https://docs.google.com/presentation/d/1CXfA9eWV_GtAcVDwNPcu2hhrQ1FwzWfno_DHGNzMz-s/edit As mentioned, searching at https://datasetsearch.research.google.com/ for existing datasets is a good idea. If you want to build your own then options… Read more There are some high-level pointers re datasets here: https://docs.google.com/presentation/d/1CXfA9eWV_GtAcVDwNPcu2hhrQ1FwzWfno_DHGNzMz-s/edit As mentioned, searching at https://datasetsearch.research.google.com/ for existing datasets is a good idea. If you want to build your own then options include: (i) Using websites with usable APIs. E.g. Wikimedia has APIs that tell you how often a Wikipedia page is viewed/edited and so on. (ii) Using existing tools to download data. E.g. Arcshift (https://arctic-shift.photon-reddit.com/download-tool) has a great tool for downloading Reddit data. And pyTrends (https://pypi.org/project/pytrends/) is a great tool for downloading Google Trends data. (iii) Scrape your own data. Before doing this, make sure to run a search on Google/Github for existing scraping code for your target website. If none exists, write your own. (iv) Use data from one of the projects we're proposing. E.g. we'll propose projects involving review data from Google Maps (which we have collected) or satellite imagery (which we also have collected). (v) Collect your own. This can be done through "data donations" (Google this) and services such as https://takeout.google.com/. You can also run experiments and measure things. E.g. you could try automating certain things using LLMs (and tools such as https://github.com/gregpr07/browser-use) and then do a study to measure how many typical tasks of a student can already be automated. We'll share project ideas early this coming week. If you want to propose your own project, we strongly encourage you to email us before so that we can give some feedback re feasibility, ethical concerns, and so on. If we find your proposed project inappropriate for the seminar, then we might request that you pick one of our projects. Have a nice weekend! Ingmar |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Data and Society, Presentations Starting this FridayWritten on 20.11.24 by Annika Hass Dear all,
We would like to remind you of the criteria for the paper presentations starting this Friday. You can find them in the section Materials under Grading.
On Friday, we will listen to the following paper presentations:
Parking occupancy estimation on planetscope… Read more Dear all,
We would like to remind you of the criteria for the paper presentations starting this Friday. You can find them in the section Materials under Grading.
On Friday, we will listen to the following paper presentations:
Parking occupancy estimation on planetscope satellite images, Chaitanya (https://ieeexplore.ieee.org/abstract/document/9323104)
Ideational diffusion and the great witch hunt in Central Europe, Prakhar Narian (https://link.springer.com/article/10.1007/s11186-024-09576-1)
Persistent Pre-Training Poisoning of LLMs, Prakhar (https://arxiv.org/abs/2410.13722)
Please ensure you are well-prepared for the discussion and have reviewed the key figures in advance.
If desired, we could record the presentations and provide more detailed feedback on presentation style.
Looking forward to hearing the talks and discussing the papers with you on Friday!
Best regards,
Your Data and Society Team |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Written on 20.11.24 by Annika Hass Dear all, Elisa has sent me the slights of her talk, which you can find under materials. They are for your private use. Please do not share them. Annika |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Links I showed on Friday re "must published research findings are false"Written on 18.11.24 by Ingmar Weber Pimeyes: scary facial recognition service, https://pimeyes.com/en "Why Most Published Research Findings Are False", the paper that kicked of the "replication crisis"… Read more Pimeyes: scary facial recognition service, https://pimeyes.com/en "Why Most Published Research Findings Are False", the paper that kicked of the "replication crisis" (https://en.wikipedia.org/wiki/Replication_crisis), https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124 "Chocolate promotes weight loss", the problem with trying out lots of things and only reporting the one that works, https://gizmodo.com/i-fooled-millions-into-thinking-chocolate-helps-weight-1707251800, https://www.cbsnews.com/news/how-the-chocolate-diet-hoax-fooled-millions/ Bonferroni Correction, one of the ways to deal with this "multiple hypothesis" setting: https://en.wikipedia.org/wiki/Bonferroni_correction Same data, different analysts, different conclusions: two studies show that the _same_ data and the _same_ research question can lead to different results: https://journals.sagepub.com/doi/full/10.1177/2515245917747646, https://www.sciencedirect.com/science/article/pii/S0749597821000200 Reproducibility crisis in machine learning, partly caused by "leakage" where some information from the training data leaks into the test data: https://reproducible.cs.princeton.edu/ One example of a meta analysis of if [insert some food] is good or bad for you. Coffee in this case: https://www.bmj.com/content/359/bmj.j5024
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Paper assignmentsWritten on 15.11.24 (last change on 25.11.24) by Till Koebe Dear all, please find below the assignment of papers for the upcoming four sessions. A few points to remember: 1. 3 Papers per session: 15 min presentation, 15 min discussion each. Stick to the time limit, we will cut you off. 2. Please truly understand the key figure of each of the papers… Read more Dear all, please find below the assignment of papers for the upcoming four sessions. A few points to remember: 1. 3 Papers per session: 15 min presentation, 15 min discussion each. Stick to the time limit, we will cut you off. 2. Please truly understand the key figure of each of the papers presented beforehand, whatever it takes. 3. When doing your presentation, switch in the role of the listener and tune your presentation in a way that maximises the value added for them. A final note: One paper has not been assigned yet. The presentation date for that will be Nov 29. I will update you which paper has been assigned in the upcoming session. And finally, thanks to many of you for sticking around for the lecture series afterwards today. I think for a speaker it is always a good feeling to see every seat being taken. Best, Till
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No seminar on Fri, Oct 18 - We start on Fri, Oct 25Written on 15.10.24 by Ingmar Weber The first seminar will be on Friday, October 25, 10am (c.t.) - noon in building E1.7, 3rd floor, room 3.23. See you then! Your Data and Society Team. |
Data and Society
From finding a mate, to booking a holiday, our lives are increasingly mediated by online platforms. Digital traces left by these interactions provide opportunities to study societal phenomena while creating challenges around the responsible use of data. In this seminar, students will learn how computational methods and machine learning can be applied to study society through such data.
The first part of the seminar will familiarize students with existing work in computational social science with each week focused on a topic such as “Digital Democracy” or “Gender Gaps” and methods to quantify it. The second part of the seminar will be about projects in which students are asked to quantify a societal phenomenon of their choice using computational methods. Here, students can both propose topics or choose from topics defined by the lecturers.
The overall course performance will be based on (i) overall course participation, (ii) assigned paper presentations, (iii) literature review and “project pitch” (prior to in-depth work) and (iv) the written project report.
Apart from learning about interdisciplinary research and applications of machine learning, students will also learn research skills such as how to read and discuss papers, how to plan a project, how to present their work, how to write a scientific paper, and how to work in teams.
Students can take this course as a seminar.
Requirements: Msc students only – the project-based element of the seminar will require some Python programming and data analysis experience. An interest beyond foundations of CS, and caring about societal problems is a must.