Merge pull request #281 from bellingcat/add_inst_api_script

Add InstagrAPI server script to be used with the Instagram API Extractor.
2025-03-28 13:58:37 +00:00 · 2025-03-28 13:58:37 +00:00 · 96efdcbba1
commit 96efdcbba1
--- a/docs/source/how_to/run_instagrapi_server.md
+++ b/docs/source/how_to/run_instagrapi_server.md
@ -0,0 +1,169 @@
+# InstagrAPI Server
+
+The instagram API Extractor requires access to a running instance of the InstagrAPI server. 
+We have a lightweight script with the endpoints required for our Instagram API Extractor module which you can run locally, or via Docker.
+
+
+
+⚠️ Warning: Remember that it's best not to use your own personal account for archiving. [Here's why](../installation/authentication.md#recommendations-for-authentication).
+## Quick Start: Using Docker
+
+We've provided a convenient shell script (`run_instagrapi_server.sh`) that simplifies the process of setting up and running the Instagrapi server in Docker. This script handles building the Docker image, setting up credentials, and starting the container.
+
+### 🔧 Running the script:
+
+Run this script either from the repository root or from within the `scripts/instagrapi_server` directory:
+
+```bash
+./scripts/instagrapi_server/run_instagrapi_server.sh
+```
+
+This script will:
+- Prompt for your Instagram username and password.
+- Create the necessary `.env` file.
+- Build the Docker image.
+- Start the Docker container and authenticate with Instagram, creating a session automatically.
+
+### ⏱ To run the server again later:
+```bash
+docker start ig-instasrv
+```
+
+### 🐛 Debugging:
+View logs:
+```bash
+docker logs ig-instasrv
+```
+
+
+### Overview: How the Setup Works
+
+1. You enter your Instagram credentials in a local `.env` file
+2. You run the server **once locally** to generate a session file
+3. After that, you can choose to run the server again locally or inside Docker without needing to log in again
+
+---
+
+## Optional: Manual / Local Setup
+
+If you'd prefer to run the server manually (without Docker), you can follow these steps:
+
+
+1. **Navigate to the server folder (and stay there for the rest of this guide)**:
+   ```bash
+   cd scripts/instagrapi_server
+   ```
+
+2. **Create a `secrets/` folder** (if it doesn't already exist in `scripts/instagrapi_server`):
+   ```bash
+   mkdir -p secrets
+   ```
+
+3. **Create a `.env` file** inside `secrets/` with your Instagram credentials:
+   ```dotenv
+   INSTAGRAM_USERNAME="your_username"
+   INSTAGRAM_PASSWORD="your_password"
+   ```
+
+4. **Install dependencies** using the pyproject.toml file:
+  
+   ```bash
+   poetry install --no-root
+   ```
+
+5. **Run the server locally**:
+   ```bash
+   poetry run uvicorn src.instaserver:app --port 8000
+   ```
+
+6. **Watch for the message**:
+   ```
+   Login successful, session saved.
+   ```
+
+✅ Your session is now saved to `secrets/instagrapi_session.json`.
+
+### To run it again locally:
+```bash
+poetry run uvicorn src.instaserver:app --port 8000
+```
+
+---
+
+## Adding the API Endpoint to Auto Archiver
+
+The server should now be running within that session, and accessible at  http://127.0.0.1:8000 
+
+You can set this in the Auto Archiver orchestration.yaml file like this:
+```yaml
+instagram_api_extractor:
+  api_endpoint: http://127.0.0.1:8000
+```
+
+
+---
+
+## 2. Running the Server Again
+
+Once the session file is created, you should be able to run the server without logging in again.
+
+### To run it locally (from scripts/instagrapi_server):
+```bash
+poetry run uvicorn src.instgrapinstance.instaserver:app --port 8000
+```
+
+---
+
+## 3. Running via Docker (After Setup is Complete, either locally or via the script)
+
+Once the `instagrapi_session.json` and `.env` files are set up, you can pass them Docker and it should authenticate successfully.
+
+### 🔨 Build the Docker image manually:
+```bash
+docker build -t instagrapi-server .
+```
+
+### ▶️ Run the container:
+```bash
+docker run -d \
+  --env-file secrets/.env \
+  -v "$(pwd)/secrets:/app/secrets" \
+  -p 8000:8000 \
+  --name ig-instasrv \
+  instagrapi-server
+```
+
+This passes the /secrets/ directory to docker as well as the environment variables from the `.env` file.
+
+
+
+---
+
+## 4. Optional Cleanup
+
+- **Stop the Docker container**:
+  ```bash
+  docker stop ig-instasrv
+  ```
+
+- **Remove the container**:
+  ```bash
+  docker rm ig-instasrv
+  ```
+
+- **Remove the Docker image**:
+  ```bash
+  docker rmi instagrapi-server
+  ```
+
+### ⏱ To run again later:
+```bash
+docker start ig-instasrv
+```
+
+---
+
+##  Notes
+
+- Never share your `.env` or `instagrapi_session.json` — these contain sensitive login data. 
+- If you want to reset your session, simply delete the `secrets/instagrapi_session.json` file and re-run the local server.
--- a/scripts/instagrapi_server/.gitignore
+++ b/scripts/instagrapi_server/.gitignore
@ -0,0 +1,2 @@
+secrets*
+*instagrapi_session.json
--- a/scripts/instagrapi_server/Dockerfile
+++ b/scripts/instagrapi_server/Dockerfile
@ -0,0 +1,19 @@
+FROM python:3.12-slim
+WORKDIR /app
+
+# Install Poetry
+RUN pip install --upgrade pip
+RUN pip install poetry
+
+# Copy all source code
+COPY . .
+
+# Prevent Poetry from creating a virtual environment
+RUN poetry config virtualenvs.create false
+
+# Install dependencies
+RUN poetry install --no-root
+
+
+# Use uvicorn to run the FastAPI app
+CMD ["poetry", "run", "uvicorn", "src.instaserver:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/scripts/instagrapi_server/pyproject.toml
+++ b/scripts/instagrapi_server/pyproject.toml
@ -0,0 +1,18 @@
+[project]
+name = "instaserver"
+version = "0.1.0"
+description = "A FastAPI InstagrAPI server"
+package-mode = false
+requires-python = ">=3.10"
+dependencies = [
+    "fastapi (>=0.115.12,<0.116.0)",
+    "instagrapi (>=2.1.3,<3.0.0)",
+    "uvicorn (>=0.34.0,<0.35.0)",
+    "pillow (>=11.1.0,<12.0.0)",
+    "python-dotenv (>=1.1.0,<2.0.0)"
+]
+
+
+[build-system]
+requires = ["poetry-core>=2.0.0,<3.0.0"]
+build-backend = "poetry.core.masonry.api"
--- a/scripts/instagrapi_server/run_instagrapi_server.sh
+++ b/scripts/instagrapi_server/run_instagrapi_server.sh
@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+#
+# run_instagrapi_server.sh
+# Usage:
+#   From repo root:   ./scripts/instagrapi_server/run_instagrapi_server.sh
+#   Or from script dir: ./run_instagrapi_server.sh
+#
+
+set -e
+
+# Step 1: cd to the script's directory (contains Dockerfile and secrets/)
+cd "$(dirname "$0")" || exit 1
+
+# Create secrets/ if it doesn't exist
+if [[ ! -d "secrets" ]]; then
+  echo "Creating secrets/ directory..."
+  mkdir secrets
+fi
+
+echo "Enter your Instagram credentials to store in secrets/.env"
+read -rp "Instagram Username: " IGUSER
+read -rsp "Instagram Password: " IGPASS
+echo ""
+
+cat <<EOF > secrets/.env
+INSTAGRAM_USERNAME=$IGUSER
+INSTAGRAM_PASSWORD=$IGPASS
+EOF
+echo "Created secrets/.env with your credentials."
+
+# Build Docker image
+IMAGE_NAME="instagrapi-server"
+echo "Building Docker image '$IMAGE_NAME'..."
+docker build -t "$IMAGE_NAME" .
+
+# Run container
+CONTAINER_NAME="ig-instasrv"
+echo "Running container '$CONTAINER_NAME'..."
+docker run -d \
+  --env-file secrets/.env \
+  -v "$(pwd)/secrets:/app/secrets" \
+  -p 8000:8000 \
+  --name "$CONTAINER_NAME" \
+  "$IMAGE_NAME"
+
+echo "Done! Instagrapi server is running on port 8000."
+echo "Use 'docker logs $CONTAINER_NAME' to view logs."
+echo "Use 'docker stop $CONTAINER_NAME' and 'docker rm $CONTAINER_NAME' to stop/remove the container."
--- a/scripts/instagrapi_server/src/instaserver.py
+++ b/scripts/instagrapi_server/src/instaserver.py
@ -0,0 +1,157 @@
+"""https://subzeroid.github.io/instagrapi/
+
+Run using the following command:
+ uvicorn src.instgrapinstance.instaserver:app --host 0.0.0.0 --port 8000 --reload
+"""
+
+import logging
+import os
+import sys
+from dotenv import load_dotenv
+
+from fastapi import FastAPI, HTTPException
+from instagrapi import Client
+from instagrapi.exceptions import LoginRequired, BadCredentials
+
+load_dotenv(dotenv_path="secrets/.env")
+logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+
+INSTAGRAM_USERNAME = os.getenv("INSTAGRAM_USERNAME")
+INSTAGRAM_PASSWORD = os.getenv("INSTAGRAM_PASSWORD")
+SESSION_FILE = "secrets/instagrapi_session.json"
+
+app = FastAPI()
+cl = Client()
+
+
+@app.on_event("startup")
+def startup_event():
+    """Login automatically when server starts"""
+    try:
+        login_instagram()
+    except RuntimeError as e:
+        logging.error(f"API failed to start: {e}")
+        sys.exit(1)
+
+
+def login_instagram():
+    """Ensures Instagrapi is logged in and session is persistent"""
+    if not INSTAGRAM_USERNAME or not INSTAGRAM_PASSWORD:
+        raise RuntimeError("Instagram credentials are missing.")
+
+    if os.path.exists(SESSION_FILE):
+        try:
+            cl.load_settings(SESSION_FILE)
+            cl.get_timeline_feed()
+            logging.info("Using saved session.")
+            return
+        except LoginRequired:
+            logging.info("Session expired. Logging in again...")
+
+    try:
+        cl.login(INSTAGRAM_USERNAME, INSTAGRAM_PASSWORD)
+        cl.dump_settings(SESSION_FILE)
+        logging.info("Login successful, session saved.")
+    except BadCredentials as bc:
+        raise RuntimeError("Incorrect Instagram username or password.") from bc
+    except Exception as e:
+        raise RuntimeError(f"Login failed: {e}") from e
+
+
+@app.get("/v1/media/by/id")
+def get_media_by_id(id: str):
+    """Fetch post details by media ID"""
+    logging.info(f"Fetching media by ID: {id}")
+    try:
+        media = cl.media_info(id)
+        return media.model_dump()
+    except Exception as e:
+        logging.warning(f"Media not found for ID {id}: {e}")
+        raise HTTPException(status_code=404, detail="Post not found") from e
+
+
+@app.get("/v1/media/by/code")
+def get_media_by_code(code: str):
+    """Fetch post details by shortcode"""
+    logging.info(f"Fetching media by shortcode: {code}")
+    try:
+        media_id = cl.media_pk_from_code(code)
+        media = cl.media_info(media_id)
+        return media.model_dump()
+    except Exception as e:
+        logging.warning(f"Media not found for code {code}: {e}")
+        raise HTTPException(status_code=404, detail="Post not found") from e
+
+
+@app.get("/v2/user/tag/medias")
+def get_user_tagged_medias(user_id: str, page_id: str = None):
+    logging.info(f"Fetching tagged medias for user_id={user_id} page_id={page_id}")
+    try:
+        # Placeholder for now
+        items, next_page_id = [], None
+        return {"response": {"items": items}, "next_page_id": next_page_id}
+    except Exception as e:
+        logging.warning(f"Tagged media not found for {user_id}: {e}")
+        raise HTTPException(status_code=404, detail="Tagged media not found") from e
+
+
+@app.get("/v1/user/highlights")
+def get_user_highlights(user_id: str):
+    logging.info(f"Fetching highlights list for user_id={user_id}")
+    try:
+        highlights = cl.user_highlights(user_id)
+        return [h.model_dump() for h in highlights]
+    except Exception as e:
+        logging.warning(f"Highlights not found for {user_id}: {e}")
+        raise HTTPException(status_code=404, detail="No highlights found") from e
+
+
+@app.get("/v2/highlight/by/id")
+def get_highlight_by_id(id: str):
+    logging.info(f"Fetching highlight details for id={id}")
+    try:
+        highlight = cl.highlight_info(id)
+        return {"response": {"reels": {f"highlight:{id}": highlight.model_dump()}}}
+    except Exception as e:
+        logging.warning(f"Highlight not found for id {id}: {e}")
+        raise HTTPException(status_code=404, detail="Highlight not found") from e
+
+
+@app.get("/v1/user/stories/by/username")
+def get_stories(username: str):
+    logging.info(f"Fetching stories for username={username}")
+    try:
+        user_id = cl.user_id_from_username(username)
+        stories = cl.user_stories(user_id)
+        return [story.model_dump() for story in stories]
+    except Exception as e:
+        logging.warning(f"Stories not found for {username}: {e}")
+        raise HTTPException(status_code=404, detail="Stories not found") from e
+
+
+@app.get("/v2/user/by/username")
+def get_user_by_username(username: str):
+    logging.info(f"Fetching user profile for username={username}")
+    try:
+        user = cl.user_info_by_username(username)
+        return {"user": user.model_dump()}
+    except Exception as e:
+        logging.warning(f"User not found: {username}: {e}")
+        raise HTTPException(status_code=404, detail="User not found") from e
+
+
+@app.get("/v1/user/medias/chunk")
+def get_user_medias(user_id: str, end_cursor: str = None):
+    logging.info(f"Fetching paginated medias for user_id={user_id}, end_cursor={end_cursor}")
+    try:
+        posts, next_cursor = cl.user_medias_paginated(user_id, end_cursor=end_cursor)
+        return [[post.model_dump() for post in posts], next_cursor]
+    except Exception as e:
+        logging.warning(f"No posts found for user_id={user_id}: {e}")
+        raise HTTPException(status_code=404, detail="No posts found") from e
+
+
+if __name__ == "__main__":
+    import uvicorn
+
+    uvicorn.run(app, host="0.0.0.0", port=8000)
--- a/src/auto_archiver/modules/instagram_api_extractor/manifest.py
+++ b/src/auto_archiver/modules/instagram_api_extractor/manifest.py
@ -31,9 +31,11 @@
        },
    },
    "description": """
-Archives various types of Instagram content using the Instagrapi API.
+Archives Instagram content using a deployment of the [Instagrapi API](https://subzeroid.github.io/instagrapi/).

-Requires setting up an Instagrapi API deployment and providing an access token and API endpoint.
+Requires either getting a token from using a hosted [(paid) service](https://api.instagrapi.com/docs) and setting this in the configuration file.
+Alternatively you can run your own server. We have a basic script which you can use for this which can be ran locally or using Docker.
+For more information, read the [how to guide](https://auto-archiver.readthedocs.io/en/latest/how_to/run_instagrapi_server.html) on this.

 ### Features
 - Connects to an Instagrapi API deployment to fetch Instagram profiles, posts, stories, highlights, reels, and tagged content.