News
Project Idea SubmissionsWritten on 03.12.24 (last change on 03.12.24) by Till Koebe Dear all, a kind reminder to submit your project ideas as soon as possible, ideally by Wednesday, Dec 4th and latest by Friday, Dec 6th. Please send an email with your team members' names, the project idea and a brief project description (incl. research question and the kind of data you intend to… Read more Dear all, a kind reminder to submit your project ideas as soon as possible, ideally by Wednesday, Dec 4th and latest by Friday, Dec 6th. Please send an email with your team members' names, the project idea and a brief project description (incl. research question and the kind of data you intend to use for that) to till.koebe@uni-saarland.de. While we encourage you to develop your own project idea, please find in the materials section of the CMS a list of projects that we propose and that you can choose to work on. In case you opt for a pre-defined project, please share with us your top-three projects, so we can again allocate project ideas based on your stated preferences. Please get in touch with us in case you have any questions.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Some dataset pointers and project commentsWritten on 29.11.24 by Ingmar Weber There are some high-level pointers re datasets here: https://docs.google.com/presentation/d/1CXfA9eWV_GtAcVDwNPcu2hhrQ1FwzWfno_DHGNzMz-s/edit As mentioned, searching at https://datasetsearch.research.google.com/ for existing datasets is a good idea. If you want to build your own then options… Read more There are some high-level pointers re datasets here: https://docs.google.com/presentation/d/1CXfA9eWV_GtAcVDwNPcu2hhrQ1FwzWfno_DHGNzMz-s/edit As mentioned, searching at https://datasetsearch.research.google.com/ for existing datasets is a good idea. If you want to build your own then options include: (i) Using websites with usable APIs. E.g. Wikimedia has APIs that tell you how often a Wikipedia page is viewed/edited and so on. (ii) Using existing tools to download data. E.g. Arcshift (https://arctic-shift.photon-reddit.com/download-tool) has a great tool for downloading Reddit data. And pyTrends (https://pypi.org/project/pytrends/) is a great tool for downloading Google Trends data. (iii) Scrape your own data. Before doing this, make sure to run a search on Google/Github for existing scraping code for your target website. If none exists, write your own. (iv) Use data from one of the projects we're proposing. E.g. we'll propose projects involving review data from Google Maps (which we have collected) or satellite imagery (which we also have collected). (v) Collect your own. This can be done through "data donations" (Google this) and services such as https://takeout.google.com/. You can also run experiments and measure things. E.g. you could try automating certain things using LLMs (and tools such as https://github.com/gregpr07/browser-use) and then do a study to measure how many typical tasks of a student can already be automated. We'll share project ideas early this coming week. If you want to propose your own project, we strongly encourage you to email us before so that we can give some feedback re feasibility, ethical concerns, and so on. If we find your proposed project inappropriate for the seminar, then we might request that you pick one of our projects. Have a nice weekend! Ingmar |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Data and Society, Presentations Starting this FridayWritten on 20.11.24 by Annika Hass Dear all,
We would like to remind you of the criteria for the paper presentations starting this Friday. You can find them in the section Materials under Grading.
On Friday, we will listen to the following paper presentations:
Parking occupancy estimation on planetscope… Read more Dear all,
We would like to remind you of the criteria for the paper presentations starting this Friday. You can find them in the section Materials under Grading.
On Friday, we will listen to the following paper presentations:
Parking occupancy estimation on planetscope satellite images, Chaitanya (https://ieeexplore.ieee.org/abstract/document/9323104)
Ideational diffusion and the great witch hunt in Central Europe, Prakhar Narian (https://link.springer.com/article/10.1007/s11186-024-09576-1)
Persistent Pre-Training Poisoning of LLMs, Prakhar (https://arxiv.org/abs/2410.13722)
Please ensure you are well-prepared for the discussion and have reviewed the key figures in advance.
If desired, we could record the presentations and provide more detailed feedback on presentation style.
Looking forward to hearing the talks and discussing the papers with you on Friday!
Best regards,
Your Data and Society Team |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Written on 20.11.24 by Annika Hass Dear all, Elisa has sent me the slights of her talk, which you can find under materials. They are for your private use. Please do not share them. Annika |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Links I showed on Friday re "must published research findings are false"Written on 18.11.24 by Ingmar Weber Pimeyes: scary facial recognition service, https://pimeyes.com/en "Why Most Published Research Findings Are False", the paper that kicked of the "replication crisis"… Read more Pimeyes: scary facial recognition service, https://pimeyes.com/en "Why Most Published Research Findings Are False", the paper that kicked of the "replication crisis" (https://en.wikipedia.org/wiki/Replication_crisis), https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124 "Chocolate promotes weight loss", the problem with trying out lots of things and only reporting the one that works, https://gizmodo.com/i-fooled-millions-into-thinking-chocolate-helps-weight-1707251800, https://www.cbsnews.com/news/how-the-chocolate-diet-hoax-fooled-millions/ Bonferroni Correction, one of the ways to deal with this "multiple hypothesis" setting: https://en.wikipedia.org/wiki/Bonferroni_correction Same data, different analysts, different conclusions: two studies show that the _same_ data and the _same_ research question can lead to different results: https://journals.sagepub.com/doi/full/10.1177/2515245917747646, https://www.sciencedirect.com/science/article/pii/S0749597821000200 Reproducibility crisis in machine learning, partly caused by "leakage" where some information from the training data leaks into the test data: https://reproducible.cs.princeton.edu/ One example of a meta analysis of if [insert some food] is good or bad for you. Coffee in this case: https://www.bmj.com/content/359/bmj.j5024
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Paper assignmentsWritten on 15.11.24 (last change on 25.11.24) by Till Koebe Dear all, please find below the assignment of papers for the upcoming four sessions. A few points to remember: 1. 3 Papers per session: 15 min presentation, 15 min discussion each. Stick to the time limit, we will cut you off. 2. Please truly understand the key figure of each of the papers… Read more Dear all, please find below the assignment of papers for the upcoming four sessions. A few points to remember: 1. 3 Papers per session: 15 min presentation, 15 min discussion each. Stick to the time limit, we will cut you off. 2. Please truly understand the key figure of each of the papers presented beforehand, whatever it takes. 3. When doing your presentation, switch in the role of the listener and tune your presentation in a way that maximises the value added for them. A final note: One paper has not been assigned yet. The presentation date for that will be Nov 29. I will update you which paper has been assigned in the upcoming session. And finally, thanks to many of you for sticking around for the lecture series afterwards today. I think for a speaker it is always a good feeling to see every seat being taken. Best, Till
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No seminar on Fri, Oct 18 - We start on Fri, Oct 25Written on 15.10.24 by Ingmar Weber The first seminar will be on Friday, October 25, 10am (c.t.) - noon in building E1.7, 3rd floor, room 3.23. See you then! Your Data and Society Team. |