News

Project Idea Submissions

Written on 03.12.24 (last change on 03.12.24) by Till Koebe

Dear all,

a kind reminder to submit your project ideas as soon as possible, ideally by Wednesday, Dec 4th and latest by Friday, Dec 6th. Please send an email with your team members' names, the project idea and a brief project description (incl. research question and the kind of data you intend to… Read more

Dear all,

a kind reminder to submit your project ideas as soon as possible, ideally by Wednesday, Dec 4th and latest by Friday, Dec 6th. Please send an email with your team members' names, the project idea and a brief project description (incl. research question and the kind of data you intend to use for that) to till.koebe@uni-saarland.de.

While we encourage you to develop your own project idea, please find in the materials section of the CMS a list of projects that we propose and that you can choose to work on. In case you opt for a pre-defined project, please share with us your top-three projects, so we can again allocate project ideas based on your stated preferences.

Please get in touch with us in case you have any questions.

Best,
Till

 

Some dataset pointers and project comments

Written on 29.11.24 by Ingmar Weber

There are some high-level pointers re datasets here: https://docs.google.com/presentation/d/1CXfA9eWV_GtAcVDwNPcu2hhrQ1FwzWfno_DHGNzMz-s/edit

As mentioned, searching at https://datasetsearch.research.google.com/ for existing datasets is a good idea. If you want to build your own then options… Read more

There are some high-level pointers re datasets here: https://docs.google.com/presentation/d/1CXfA9eWV_GtAcVDwNPcu2hhrQ1FwzWfno_DHGNzMz-s/edit

As mentioned, searching at https://datasetsearch.research.google.com/ for existing datasets is a good idea. If you want to build your own then options include:

(i) Using websites with usable APIs. E.g. Wikimedia has APIs that tell you how often a Wikipedia page is viewed/edited and so on.

(ii) Using existing tools to download data. E.g. Arcshift (https://arctic-shift.photon-reddit.com/download-tool) has a great tool for downloading Reddit data. And pyTrends (https://pypi.org/project/pytrends/) is a great tool for downloading Google Trends data.

(iii) Scrape your own data. Before doing this, make sure to run a search on Google/Github for existing scraping code for your target website. If none exists, write your own.

(iv) Use data from one of the projects we're proposing. E.g. we'll propose projects involving review data from Google Maps (which we have collected) or satellite imagery (which we also have collected).

(v) Collect your own. This can be done through "data donations" (Google this) and services such as https://takeout.google.com/. You can also run experiments and measure things. E.g. you could try automating certain things using LLMs (and tools such as https://github.com/gregpr07/browser-use) and then do a study to measure how many typical tasks of a student can already be automated.

We'll share project ideas early this coming week. If you want to propose your own project, we strongly encourage you to email us before so that we can give some feedback re feasibility, ethical concerns, and so on. If we find your proposed project inappropriate for the seminar, then we might request that you pick one of our projects.

Have a nice weekend!

Ingmar 

Data and Society, Presentations Starting this Friday

Written on 20.11.24 by Annika Hass

Dear all,

 

We would like to remind you of the criteria for the paper presentations starting this Friday. You can find them in the section Materials under Grading.

 

On Friday, we will listen to the following paper presentations:

 

Parking occupancy estimation on planetscope… Read more

Dear all,

 

We would like to remind you of the criteria for the paper presentations starting this Friday. You can find them in the section Materials under Grading.

 

On Friday, we will listen to the following paper presentations:

 

Parking occupancy estimation on planetscope satellite images, Chaitanya

(https://ieeexplore.ieee.org/abstract/document/9323104)

 

Ideational diffusion and the great witch hunt in Central Europe, Prakhar Narian

(https://link.springer.com/article/10.1007/s11186-024-09576-1)

 

Persistent Pre-Training Poisoning of LLMs, Prakhar

(https://arxiv.org/abs/2410.13722)

 

Please ensure you are well-prepared for the discussion and have reviewed the key figures in advance.

 

If desired, we could record the presentations and provide more detailed feedback on presentation style.

 

Looking forward to hearing the talks and discussing the papers with you on Friday!

 

Best regards,

 

Your Data and Society Team

Written on 20.11.24 by Annika Hass

Dear all,

Elisa has sent me the slights of her talk, which you can find under materials.

They are for your private use. Please do not share them.
Best regards,

Annika

Links I showed on Friday re "must published research findings are false"

Written on 18.11.24 by Ingmar Weber

Pimeyes: scary facial recognition service, https://pimeyes.com/en

"Why Most Published Research Findings Are False", the paper that kicked of the "replication crisis"… Read more

Pimeyes: scary facial recognition service, https://pimeyes.com/en

"Why Most Published Research Findings Are False", the paper that kicked of the "replication crisis" (https://en.wikipedia.org/wiki/Replication_crisis), https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124

"Chocolate promotes weight loss", the problem with trying out lots of things and only reporting the one that works, https://gizmodo.com/i-fooled-millions-into-thinking-chocolate-helps-weight-1707251800, https://www.cbsnews.com/news/how-the-chocolate-diet-hoax-fooled-millions/

Bonferroni Correction, one of the ways to deal with this "multiple hypothesis" setting: https://en.wikipedia.org/wiki/Bonferroni_correction

Same data, different analysts, different conclusions: two studies show that the _same_ data and the _same_ research question can lead to different results: https://journals.sagepub.com/doi/full/10.1177/2515245917747646, https://www.sciencedirect.com/science/article/pii/S0749597821000200

Reproducibility crisis in machine learning, partly caused by "leakage" where some information from the training data leaks into the test data: https://reproducible.cs.princeton.edu/

One example of a meta analysis of if [insert some food] is good or bad for you. Coffee in this case: https://www.bmj.com/content/359/bmj.j5024

 

 

 

Paper assignments

Written on 15.11.24 (last change on 25.11.24) by Till Koebe

Dear all, please find below the assignment of papers for the upcoming four sessions. A few points to remember:

1. 3 Papers per session: 15 min presentation, 15 min discussion each. Stick to the time limit, we will cut you off.
2. Please truly understand the key figure of each of the papers… Read more
Dear all, please find below the assignment of papers for the upcoming four sessions. A few points to remember:

1. 3 Papers per session: 15 min presentation, 15 min discussion each. Stick to the time limit, we will cut you off.
2. Please truly understand the key figure of each of the papers presented beforehand, whatever it takes.
3. When doing your presentation, switch in the role of the listener and tune your presentation in a way that maximises the value added for them.

A final note: One paper has not been assigned yet. The presentation date for that will be Nov 29. I will update you which paper has been assigned in the upcoming session.

And finally, thanks to many of you for sticking around for the lecture series afterwards today. I think for a speaker it is always a good feeling to see every seat being taken.

Best,
Till

 

# Paper Title Student Name Presentation Date
1 High-resolution satellite images reveal the prevalent positive indirect impact of urbanization on urban tree canopy coverage in South America    
2 Modelling and evaluation of land use changes through satellite images in a multifunctional catchment: Social, economic and environmental implications    
3 The Social Impact of Generative AI: An Analysis on ChatGPT Abaad Dec 6
4 Social media influence on students' knowledge sharing and learning: An empirical study Shiraz Dec 13
5 Urban Flood Mapping With Bitemporal Multispectral Imagery Via a Self-Supervised Learning Framework Rishant Dec 13
6 Parking occupancy estimation on planetscope satellite images Chaitanya Nov 22
7 The Evolution of the Manosphere Across the Web Bharat Nov 29
8 From individual to group privacy in big data analytics Parnian Dec 6
9 Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Najia Dec 6
10 Ideational diffusion and the great witch hunt in Central Europe Prakhar Narian Nov 22
11 Conceptual structure and the growth of scientific knowledge    
12 Fact-checker warning labels are effective even for those who distrust fact-checkers David Nov 29
13 A 27-country test of communicating the scientific consensus on climate change Khalid Dec 13
14 Persistent Pre-Training Poisoning of LLMs Prakhar Nov 29

No seminar on Fri, Oct 18 - We start on Fri, Oct 25

Written on 15.10.24 by Ingmar Weber

The first seminar will be on Friday, October 25, 10am (c.t.) - noon in building E1.7, 3rd floor, room 3.23.

See you then!

Your Data and Society Team.

Privacy Policy | Legal Notice
If you encounter technical problems, please contact the administrators.