Middle East Violence (content may offend)

Gerry_A_Trick · Aug 2, 2025

Do you just copy and paste into an Ai tool, or are you doing something smarter?

Mamma Mia · Aug 2, 2025

Gerry_A_Trick said:
Do you just copy and paste into an Ai tool, or are you doing something smarter?

I made a thing. It'll scrape a given thread, topic keyword, or username, and put all posts, tagged with topic, sentiment analysis, category, into a DB and rag for LLM analysis and context etc.

I'll post the details in a spoiler tag a bit later.

Mamma Mia · Aug 2, 2025

I was going to put it online and share it, but when I started a web UI I realised I was just distracting myself with another hobby project instead of working on more important things, so it's all CLI menu driven at the moment.

Mamma Mia · Aug 2, 2025

As promised, for anyone interested in how the "thing" works under the hood, here's a breakdown of the architecture. It's a full data pipeline now.

The Overall Flow
The system works in four main stages:
1. **Scrape & Parse:** Collects the raw data from the forum.
2. **Store & Structure:** Puts the data into a relational database.
3. **Enrich & Embed:** Adds layers of AI-generated meaning to the data.
4. **Retrieve & Generate:** Uses the enriched data to answer questions intelligently.

---

1. The Scraper Engine
This isn't a basic scraper. It's a hybrid system designed to be robust.

**Authentication:** It uses a headless browser (`Playwright`) to handle the full login flow, including the 2FA step. Once logged in, it extracts the session cookies.
**Scraping:** For the actual high-volume scraping, it uses the authenticated session with the `requests` library for speed. This is much faster than running a full browser for every page.
**Parsing:** It uses `BeautifulSoup` to parse the HTML. Crucially, it has two different parser modes: one for the detailed structure of thread pages and another for the more compact structure of search result pages. It automatically detects the URL type and uses the correct parser.
**Data Extraction:** It's not just grabbing post text. It pulls out everything relationally: `post_id`, `user_id`, `thread_id`, post number, timestamps, and it also parses out all the **quotes** and **reactions** for each post.

---

2. The Database (The "Long-Term Memory")
All the scraped data goes into a local **SQLite database**. This is the system's memory. Instead of a pile of messy JSON files, the data is organized into tables that are all linked together:

`users`: Stores user info.
`threads`: Stores thread titles and IDs.
`posts`: The core table with all post content.
`quotes`: A relational table linking which post quotes which other post.
`reactions`: A table linking users and posts through reactions (e.g., 'Like').

---

3. The Enrichment Pipeline (The "Intelligence" Layer)
This is where the raw data gets its "intelligence." This is a multi-step, asynchronous process using Google's Gemini models.

a) AI Enrichment (`gemini-2.5-flash-lite`):
Every single post in the database is sent to Gemini Flash to generate:

A concise 1-2 sentence **summary**.
A list of 2-5 relevant **topic tags** (e.g., "settler expansion," "media bias").
A **sentiment score** (positive, neutral, negative).

This process took about 10 hours for the ~9k posts, respecting API rate limits.

b) Vector Embeddings (`gemini-embedding-001`):
This is the core of the RAG system. Every post's raw text is converted into a 768-dimensional vector (basically a list of 768 numbers). Think of it as a "semantic fingerprint" or a coordinate on a map of meaning. Posts that discuss similar concepts will have vectors that are mathematically close to each other. This is what allows us to search by *idea*, not just by keyword.

---

4. The Agent (The "Brain")
This is the `agent.py` script that you can interact with. It uses a method called **Retrieval-Augmented Generation (RAG)**.

Here's how it works when you ask it a question like, "What does Judge Jules think about settler expansion?":

**Fuzzy Matching:** First, it uses fuzzy logic to correct typos in the username ("Judge Julze" → "Judge Jules").
**Hybrid Search:** It then performs a **hybrid search**. It does a fast SQL query to get all posts by "Judge Jules," and *then* it calculates the semantic similarity between your question's vector and the vector of each of that user's posts.
**Context Retrieval:** It grabs the Top 50 most semantically relevant posts from that user.

**Prompt Engineering:** It builds a new, complex prompt for a powerful AI model (`gemini-2.5-pro`). The prompt looks something like this:

Code:

You are an expert forum analyst. Answer the user's question based ONLY on the following context posts from the forum.

--- CONTEXT ---
Post 1 (by Judge Jules): "..."
Post 2 (by Judge Jules): "..."
Post 3 (by Judge Jules): "..."
--- END CONTEXT ---

User's Question: "What does Judge Jules think about settler expansion?"

**Generation:** The AI then generates an answer, but its knowledge is restricted to *only* the context posts we provided. This forces it to be factual to our specific forum and prevents it from making things up.

TL;DR: It's a full-stack data analysis platform. It scrapes, structures, and enriches forum data, then uses a sophisticated RAG architecture to provide context-aware answers based on a semantic understanding of our own discussions.

Atlas · Aug 2, 2025

A- thats not even English.
B- Im a full retard with this stuff.. im fucked come the takeover of the Machines.
C- however a machine seems to understand my intentions and purpose on posts better than some humans.

Gerry_A_Trick · Aug 2, 2025

Mamma Mia said:
As promised, for anyone interested in how the "thing" works under the hood, here's a breakdown of the architecture. It's a full data pipeline now.

The Overall Flow
The system works in four main stages:
1. **Scrape & Parse:** Collects the raw data from the forum.
2. **Store & Structure:** Puts the data into a relational database.
3. **Enrich & Embed:** Adds layers of AI-generated meaning to the data.
4. **Retrieve & Generate:** Uses the enriched data to answer questions intelligently.

---

1. The Scraper Engine
This isn't a basic scraper. It's a hybrid system designed to be robust.

**Authentication:** It uses a headless browser (`Playwright`) to handle the full login flow, including the 2FA step. Once logged in, it extracts the session cookies.

**Scraping:** For the actual high-volume scraping, it uses the authenticated session with the `requests` library for speed. This is much faster than running a full browser for every page.

**Parsing:** It uses `BeautifulSoup` to parse the HTML. Crucially, it has two different parser modes: one for the detailed structure of thread pages and another for the more compact structure of search result pages. It automatically detects the URL type and uses the correct parser.

**Data Extraction:** It's not just grabbing post text. It pulls out everything relationally: `post_id`, `user_id`, `thread_id`, post number, timestamps, and it also parses out all the **quotes** and **reactions** for each post.

---

2. The Database (The "Long-Term Memory")
All the scraped data goes into a local **SQLite database**. This is the system's memory. Instead of a pile of messy JSON files, the data is organized into tables that are all linked together:

`users`: Stores user info.

`threads`: Stores thread titles and IDs.

`posts`: The core table with all post content.

`quotes`: A relational table linking which post quotes which other post.

`reactions`: A table linking users and posts through reactions (e.g., 'Like').

---

3. The Enrichment Pipeline (The "Intelligence" Layer)
This is where the raw data gets its "intelligence." This is a multi-step, asynchronous process using Google's Gemini models.

a) AI Enrichment (`gemini-2.5-flash-lite`):
Every single post in the database is sent to Gemini Flash to generate:

A concise 1-2 sentence **summary**.

A list of 2-5 relevant **topic tags** (e.g., "settler expansion," "media bias").

A **sentiment score** (positive, neutral, negative).

This process took about 10 hours for the ~9k posts, respecting API rate limits.

b) Vector Embeddings (`gemini-embedding-001`):
This is the core of the RAG system. Every post's raw text is converted into a 768-dimensional vector (basically a list of 768 numbers). Think of it as a "semantic fingerprint" or a coordinate on a map of meaning. Posts that discuss similar concepts will have vectors that are mathematically close to each other. This is what allows us to search by *idea*, not just by keyword.

---

4. The Agent (The "Brain")
This is the `agent.py` script that you can interact with. It uses a method called **Retrieval-Augmented Generation (RAG)**.

Here's how it works when you ask it a question like, "What does Judge Jules think about settler expansion?":

**Fuzzy Matching:** First, it uses fuzzy logic to correct typos in the username ("Judge Julze" → "Judge Jules").

**Hybrid Search:** It then performs a **hybrid search**. It does a fast SQL query to get all posts by "Judge Jules," and *then* it calculates the semantic similarity between your question's vector and the vector of each of that user's posts.

**Context Retrieval:** It grabs the Top 50 most semantically relevant posts from that user.

**Prompt Engineering:** It builds a new, complex prompt for a powerful AI model (`gemini-2.5-pro`). The prompt looks something like this:

Code:

You are an expert forum analyst. Answer the user's question based ONLY on the following context posts from the forum. --- CONTEXT --- Post 1 (by Judge Jules): "..." Post 2 (by Judge Jules): "..." Post 3 (by Judge Jules): "..." --- END CONTEXT --- User's Question: "What does Judge Jules think about settler expansion?"

**Generation:** The AI then generates an answer, but its knowledge is restricted to *only* the context posts we provided. This forces it to be factual to our specific forum and prevents it from making things up.

TL;DR: It's a full-stack data analysis platform. It scrapes, structures, and enriches forum data, then uses a sophisticated RAG architecture to provide context-aware answers based on a semantic understanding of our own discussions.

Actually going to write a quick one myself now, interested to see what comes out. Not as robust, just a scraper, and then I'll manually send it off for analysis and see what matches as I'm sure it'll depend on what prompts given.

Mamma Mia · Aug 2, 2025

Gerry_A_Trick said:
Actually going to write a quick one myself now, interested to see what comes out. Not as robust, just a scraper, and then I'll manually send it off for analysis and see what matches as I'm sure it'll depend on what prompts given.

I started off just like that, with a scraper that saved to json files, then I attached those json's with some initial prompts to different LLM's to see results.

I'm busy with a RAG implementation for a work project, so I used this scraper as a way to teach myself what works well and how to best implement.

Atlas · Aug 2, 2025

View: https://x.com/SaulStaniforth/status/1951203223689216037

Atlas · Aug 3, 2025

View: https://x.com/NoahHurowitz/status/1952009041233998109

Athens · Aug 4, 2025

I would have thought the locals telling the cruise ship to fuck off should have been a hint.

Atlas · Aug 4, 2025

There isn't enough bandwith to cover the utter tragedies across the planet atm.
Pretty much all of these are man made!

The info coming out of Sudan is horrific, tens of thousands of children starved to death. Murder, rape and starvation continues and millions of people are at risk.

Mamma Mia · Aug 4, 2025

Atlas said:
There isn't enough bandwith to cover the utter tragedies across the planet atm.
Pretty much all of these are man made!

The info coming out of Sudan is horrific, tens of thousands of children starved to death. Murder, rape and starvation continues and millions of people are at risk.

I'm thinking of reverting to my previous mindset of, if it doesn't happen in my daily life, it's not my reality.

Selfish, but as you say, it's just too much

Atlas · Aug 4, 2025

Hundreds of Israeli ex-officials appeal to Trump to help end Gaza war

It comes amid reports that PM Netanyahu is pushing to expand Israel's military operations in Gaza.

www.bbc.co.uk

600(!!?) ex Israeli 'officials' have defected and are now Iranian shills and Hamas mouthpieces.

Seems like a load of people.

Atlas · Aug 4, 2025

Mamma Mia said:
I'm thinking of reverting to my previous mindset of, if it doesn't happen in my daily life, it's not my reality.

Selfish, but as you say, it's just too much

Cos whats even the point atm.
No one seems to really give a shit.

And anyone who does is pissing in the wind

rubans · Aug 4, 2025

Mamma Mia said:
As promised, for anyone interested in how the "thing" works under the hood, here's a breakdown of the architecture. It's a full data pipeline now.

The Overall Flow
The system works in four main stages:
1. **Scrape & Parse:** Collects the raw data from the forum.
2. **Store & Structure:** Puts the data into a relational database.
3. **Enrich & Embed:** Adds layers of AI-generated meaning to the data.
4. **Retrieve & Generate:** Uses the enriched data to answer questions intelligently.

---

1. The Scraper Engine
This isn't a basic scraper. It's a hybrid system designed to be robust.

**Authentication:** It uses a headless browser (`Playwright`) to handle the full login flow, including the 2FA step. Once logged in, it extracts the session cookies.

**Scraping:** For the actual high-volume scraping, it uses the authenticated session with the `requests` library for speed. This is much faster than running a full browser for every page.

**Parsing:** It uses `BeautifulSoup` to parse the HTML. Crucially, it has two different parser modes: one for the detailed structure of thread pages and another for the more compact structure of search result pages. It automatically detects the URL type and uses the correct parser.

**Data Extraction:** It's not just grabbing post text. It pulls out everything relationally: `post_id`, `user_id`, `thread_id`, post number, timestamps, and it also parses out all the **quotes** and **reactions** for each post.

---

2. The Database (The "Long-Term Memory")
All the scraped data goes into a local **SQLite database**. This is the system's memory. Instead of a pile of messy JSON files, the data is organized into tables that are all linked together:

`users`: Stores user info.

`threads`: Stores thread titles and IDs.

`posts`: The core table with all post content.

`quotes`: A relational table linking which post quotes which other post.

`reactions`: A table linking users and posts through reactions (e.g., 'Like').

---

3. The Enrichment Pipeline (The "Intelligence" Layer)
This is where the raw data gets its "intelligence." This is a multi-step, asynchronous process using Google's Gemini models.

a) AI Enrichment (`gemini-2.5-flash-lite`):
Every single post in the database is sent to Gemini Flash to generate:

A concise 1-2 sentence **summary**.

A list of 2-5 relevant **topic tags** (e.g., "settler expansion," "media bias").

A **sentiment score** (positive, neutral, negative).

This process took about 10 hours for the ~9k posts, respecting API rate limits.

b) Vector Embeddings (`gemini-embedding-001`):
This is the core of the RAG system. Every post's raw text is converted into a 768-dimensional vector (basically a list of 768 numbers). Think of it as a "semantic fingerprint" or a coordinate on a map of meaning. Posts that discuss similar concepts will have vectors that are mathematically close to each other. This is what allows us to search by *idea*, not just by keyword.

---

4. The Agent (The "Brain")
This is the `agent.py` script that you can interact with. It uses a method called **Retrieval-Augmented Generation (RAG)**.

Here's how it works when you ask it a question like, "What does Judge Jules think about settler expansion?":

**Fuzzy Matching:** First, it uses fuzzy logic to correct typos in the username ("Judge Julze" → "Judge Jules").

**Hybrid Search:** It then performs a **hybrid search**. It does a fast SQL query to get all posts by "Judge Jules," and *then* it calculates the semantic similarity between your question's vector and the vector of each of that user's posts.

**Context Retrieval:** It grabs the Top 50 most semantically relevant posts from that user.

**Prompt Engineering:** It builds a new, complex prompt for a powerful AI model (`gemini-2.5-pro`). The prompt looks something like this:

Code:

You are an expert forum analyst. Answer the user's question based ONLY on the following context posts from the forum. --- CONTEXT --- Post 1 (by Judge Jules): "..." Post 2 (by Judge Jules): "..." Post 3 (by Judge Jules): "..." --- END CONTEXT --- User's Question: "What does Judge Jules think about settler expansion?"

**Generation:** The AI then generates an answer, but its knowledge is restricted to *only* the context posts we provided. This forces it to be factual to our specific forum and prevents it from making things up.

TL;DR: It's a full-stack data analysis platform. It scrapes, structures, and enriches forum data, then uses a sophisticated RAG architecture to provide context-aware answers based on a semantic understanding of our own discussions.

In your RAG, what are you storing? All the posts or a subset? Curious.

Mamma Mia · Aug 4, 2025

rubans said:
In your RAG, what are you storing? All the posts or a subset? Curious.

All posts that get scraped are embedded and stored.

So far about 9k posts, 250 users, and about 10 threads I think. I'm not actively scraping more.

If I'm interested in a user, or thread, or subject, then I initiate the targeted scrape. For example if I asked the system to scrape for the topic "RAG", your post and my reply would end up scraped, tagged, embedded, and all that would be stored.

TorresElNino9 · Aug 5, 2025

Atlas said:
There isn't enough bandwith to cover the utter tragedies across the planet atm.
Pretty much all of these are man made!

The info coming out of Sudan is horrific, tens of thousands of children starved to death. Murder, rape and starvation continues and millions of people are at risk.

lol what a fucking muppet.
Talk about phoning it in.
thousands of posts about the Jews, and 1 miserable post about Sudan.
And imagine to yourself that the people of Sudan did not even slaughter thousands to trigger this atrocity for them.

Atlas · Aug 5, 2025

https://www.itv.com/news/2025-08-04/aerial-footage-filmed-by-itv-news-shows-scale-of-gazas-destruction

May anyone who championed this be cursed for eternity.

Anyone who gaslit, called ppl who challenged a racist be similarly cursed.

Silence is complicity and too many cunts are silent.

TorresElNino9 · Aug 5, 2025

Atlas said:
https://www.itv.com/news/2025-08-04/aerial-footage-filmed-by-itv-news-shows-scale-of-gazas-destruction

May anyone who championed this be cursed for eternity.

Anyone who gaslit, called ppl who challenged a racist be similarly cursed.

Silence is complicity and too many cunts are silent.

Mild punishment for October 7th imo.

Mamma Mia · Aug 5, 2025

TorresElNino9 said:
Mild punishment for October 7th imo.

OK, now I'm ready to ban this WUM cunt

MomoWASright · Aug 5, 2025

TorresElNino9 said:
Mild punishment for October 7th imo.

Maybe death to all Israelis and their supporters would be mild punishment for the genocide they are inflicting on Palestine.

FREE PALESTINE.

737Max · Aug 5, 2025

MomoWASright said:
Maybe death to all Israelis and their supporters would be mild punishment for the genocide they are inflicting on Palestine.

FREE PALESTINE.

FROM THE RIVER TO THE SEA

The Nomad · Aug 5, 2025

MomoWASright said:
Maybe death to all Israelis and their supporters would be mild punishment for the genocide they are inflicting on Palestine.

FREE PALESTINE.

Ok thats too far. There are many Israelis who are just as dismayed at what's happening. Same as Nazi occupied territories back in 30s and 40s.

Blitz is an out and out cunt though. That much is true.

Gerry_A_Trick · Aug 5, 2025

The Nomad said:
Ok thats too far. There are many Israelis who are just as dismayed at what's happening. Same as Nazi occupied territories back in 30s and 40s.

Blitz is an out and out cunt though. That much is true.

I assume he was reversing the statement back to him, to show the absurdity of it, rather than actually meaning that.

MomoWASright · Aug 5, 2025

Gerry_A_Trick said:
I assume he was reversing the statement back to him, to show the absurdity of it, rather than actually meaning that.

This.

The Nomad · Aug 5, 2025

MomoWASright said:
This.

Fair enough. However the cunt would never have picked up on it either.

Mamma Mia · Aug 6, 2025

The Nomad said:
Fair enough. However the cunt would never have picked up on it either.

He's beyond hope, I think we reply to him only for the sake of other readers.

And 'cos he's a reasonably successful troll.

TorresElNino9 · Aug 6, 2025

lol @ you lot.
What a pathetic bunch of leftist idiots.

You people are the vessel on which human suffering continues to deliver.
You have a problem with reality, not with me.

You have already ruined Europe with psychopathic immigration laws, now you are on a mission to ruin the Middle East.

I'll keep going, showing you how you are supporting a colonial bunch of uneducated, barbaric, sadistic, terror farming group of people in a fight against a modern, democratic, native and peace seeking people and nation.

It's tough I know, truth can be like that sometimes.
Seems that for the lot of you, truth is hard all the time.

TorresElNino9 · Aug 6, 2025

Atlas said:
https://www.itv.com/news/2025-08-04/aerial-footage-filmed-by-itv-news-shows-scale-of-gazas-destruction

May anyone who championed this be cursed for eternity.

Anyone who gaslit, called ppl who challenged a racist be similarly cursed.

Silence is complicity and too many cunts are silent.

PhotoofdevastatedDresden%20-%20Jason%20Dawsey.jpg

May anyone who championed this be cursed for eternity.

Anyone who gaslit, called ppl who challenged a racist be similarly cursed.

Silence is complicity and too many cunts are silent.

You fucking muppet.

TorresElNino9 · Aug 6, 2025

cd20ac038c10565ce5a345e528f5500e,f882e715

This Gaza photographer stages Hamas propaganda

Even the Bild has managed to pick up on what you pathetic terror supporting lot have not.
Everything is fair game for these barbaric people.
Every ounce of western morality that you may think that all humans share or hold dear, for them is nothing, it's a tool to be used against you.
It's evil to the core, this is Gaza.

Middle East Violence (content may offend)

Very Well-Known

Head Chef

Head Chef

Head Chef

Well-Known

Very Well-Known

Head Chef

Well-Known

Well-Known

Greatest Bloke Ever [Citation Needed]

Well-Known

Head Chef

Well-Known

Well-Known

Well-Known

Head Chef

6CM Follower

Well-Known

6CM Follower

Head Chef

If you take me seriously then you’re an idiot

Stupid f*****g bolt

Part of the Furniture

Very Well-Known

If you take me seriously then you’re an idiot

Part of the Furniture

Head Chef

6CM Follower

6CM Follower

6CM Follower

Similar threads