Merge pull request #281 from bellingcat/add_inst_api_script

Add InstagrAPI server script to be used with the Instagram API Extractor.
2025-03-28 13:58:37 +00:00 · 2025-03-28 13:58:37 +00:00 · 96efdcbba1
commit 96efdcbba1
--- a/docs/source/how_to/run_instagrapi_server.md
+++ b/docs/source/how_to/run_instagrapi_server.md
@ -0,0 +1,169 @@
 # InstagrAPI Server
 The instagram API Extractor requires access to a running instance of the InstagrAPI server. 
 We have a lightweight script with the endpoints required for our Instagram API Extractor module which you can run locally, or via Docker.
 ⚠️ Warning: Remember that it's best not to use your own personal account for archiving. [Here's why](../installation/authentication.md#recommendations-for-authentication).
 ## Quick Start: Using Docker
 We've provided a convenient shell script (`run_instagrapi_server.sh`) that simplifies the process of setting up and running the Instagrapi server in Docker. This script handles building the Docker image, setting up credentials, and starting the container.
 ### 🔧 Running the script:
 Run this script either from the repository root or from within the `scripts/instagrapi_server` directory:
 ```bash
 ./scripts/instagrapi_server/run_instagrapi_server.sh
 ```
 This script will:
 - Prompt for your Instagram username and password.
 - Create the necessary `.env` file.
 - Build the Docker image.
 - Start the Docker container and authenticate with Instagram, creating a session automatically.
 ### ⏱ To run the server again later:
 ```bash
 docker start ig-instasrv
 ```
 ### 🐛 Debugging:
 View logs:
 ```bash
 docker logs ig-instasrv
 ```
 ### Overview: How the Setup Works
 1. You enter your Instagram credentials in a local `.env` file
 2. You run the server **once locally** to generate a session file
 3. After that, you can choose to run the server again locally or inside Docker without needing to log in again
 ---
 ## Optional: Manual / Local Setup
 If you'd prefer to run the server manually (without Docker), you can follow these steps:
 1. **Navigate to the server folder (and stay there for the rest of this guide)**:
   ```bash
   cd scripts/instagrapi_server
   ```
 2. **Create a `secrets/` folder** (if it doesn't already exist in `scripts/instagrapi_server`):
   ```bash
   mkdir -p secrets
   ```
 3. **Create a `.env` file** inside `secrets/` with your Instagram credentials:
   ```dotenv
   INSTAGRAM_USERNAME="your_username"
   INSTAGRAM_PASSWORD="your_password"
   ```
 4. **Install dependencies** using the pyproject.toml file:
   ```bash
   poetry install --no-root
   ```
 5. **Run the server locally**:
   ```bash
   poetry run uvicorn src.instaserver:app --port 8000
   ```
 6. **Watch for the message**:
   ```
   Login successful, session saved.
   ```
 ✅ Your session is now saved to `secrets/instagrapi_session.json`.
 ### To run it again locally:
 ```bash
 poetry run uvicorn src.instaserver:app --port 8000
 ```
 ---
 ## Adding the API Endpoint to Auto Archiver
 The server should now be running within that session, and accessible at  http://127.0.0.1:8000 
 You can set this in the Auto Archiver orchestration.yaml file like this:
 ```yaml
 instagram_api_extractor:
  api_endpoint: http://127.0.0.1:8000
 ```
 ---
 ## 2. Running the Server Again
 Once the session file is created, you should be able to run the server without logging in again.
 ### To run it locally (from scripts/instagrapi_server):
 ```bash
 poetry run uvicorn src.instgrapinstance.instaserver:app --port 8000
 ```
 ---
 ## 3. Running via Docker (After Setup is Complete, either locally or via the script)
 Once the `instagrapi_session.json` and `.env` files are set up, you can pass them Docker and it should authenticate successfully.
 ### 🔨 Build the Docker image manually:
 ```bash
 docker build -t instagrapi-server .
 ```
 ### ▶️ Run the container:
 ```bash
 docker run -d \
  --env-file secrets/.env \
  -v "$(pwd)/secrets:/app/secrets" \
  -p 8000:8000 \
  --name ig-instasrv \
  instagrapi-server
 ```
 This passes the /secrets/ directory to docker as well as the environment variables from the `.env` file.
 ---
 ## 4. Optional Cleanup
 - **Stop the Docker container**:
  ```bash
  docker stop ig-instasrv
  ```
 - **Remove the container**:
  ```bash
  docker rm ig-instasrv
  ```
 - **Remove the Docker image**:
  ```bash
  docker rmi instagrapi-server
  ```
 ### ⏱ To run again later:
 ```bash
 docker start ig-instasrv
 ```
 ---
 ##  Notes
 - Never share your `.env` or `instagrapi_session.json` — these contain sensitive login data. 
 - If you want to reset your session, simply delete the `secrets/instagrapi_session.json` file and re-run the local server.
--- a/scripts/instagrapi_server/.gitignore
+++ b/scripts/instagrapi_server/.gitignore
@ -0,0 +1,2 @@
 secrets*
 *instagrapi_session.json
--- a/scripts/instagrapi_server/Dockerfile
+++ b/scripts/instagrapi_server/Dockerfile
@ -0,0 +1,19 @@
 FROM python:3.12-slim
 WORKDIR /app
 # Install Poetry
 RUN pip install --upgrade pip
 RUN pip install poetry
 # Copy all source code
 COPY . .
 # Prevent Poetry from creating a virtual environment
 RUN poetry config virtualenvs.create false
 # Install dependencies
 RUN poetry install --no-root
 # Use uvicorn to run the FastAPI app
 CMD ["poetry", "run", "uvicorn", "src.instaserver:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/scripts/instagrapi_server/pyproject.toml
+++ b/scripts/instagrapi_server/pyproject.toml
@ -0,0 +1,18 @@
 [project]
 name = "instaserver"
 version = "0.1.0"
 description = "A FastAPI InstagrAPI server"
 package-mode = false
 requires-python = ">=3.10"
 dependencies = [
    "fastapi (>=0.115.12,<0.116.0)",
    "instagrapi (>=2.1.3,<3.0.0)",
    "uvicorn (>=0.34.0,<0.35.0)",
    "pillow (>=11.1.0,<12.0.0)",
    "python-dotenv (>=1.1.0,<2.0.0)"
 ]
 [build-system]
 requires = ["poetry-core>=2.0.0,<3.0.0"]
 build-backend = "poetry.core.masonry.api"
--- a/scripts/instagrapi_server/run_instagrapi_server.sh
+++ b/scripts/instagrapi_server/run_instagrapi_server.sh
@ -0,0 +1,48 @@
 #!/usr/bin/env bash
 #
 # run_instagrapi_server.sh
 # Usage:
 #   From repo root:   ./scripts/instagrapi_server/run_instagrapi_server.sh
 #   Or from script dir: ./run_instagrapi_server.sh
 #
 set -e
 # Step 1: cd to the script's directory (contains Dockerfile and secrets/)
 cd "$(dirname "$0")" || exit 1
 # Create secrets/ if it doesn't exist
 if [[ ! -d "secrets" ]]; then
  echo "Creating secrets/ directory..."
  mkdir secrets
 fi
 echo "Enter your Instagram credentials to store in secrets/.env"
 read -rp "Instagram Username: " IGUSER
 read -rsp "Instagram Password: " IGPASS
 echo ""
 cat <<EOF > secrets/.env
 INSTAGRAM_USERNAME=$IGUSER
 INSTAGRAM_PASSWORD=$IGPASS
 EOF
 echo "Created secrets/.env with your credentials."
 # Build Docker image
 IMAGE_NAME="instagrapi-server"
 echo "Building Docker image '$IMAGE_NAME'..."
 docker build -t "$IMAGE_NAME" .
 # Run container
 CONTAINER_NAME="ig-instasrv"
 echo "Running container '$CONTAINER_NAME'..."
 docker run -d \
  --env-file secrets/.env \
  -v "$(pwd)/secrets:/app/secrets" \
  -p 8000:8000 \
  --name "$CONTAINER_NAME" \
  "$IMAGE_NAME"
 echo "Done! Instagrapi server is running on port 8000."
 echo "Use 'docker logs $CONTAINER_NAME' to view logs."
 echo "Use 'docker stop $CONTAINER_NAME' and 'docker rm $CONTAINER_NAME' to stop/remove the container."
--- a/scripts/instagrapi_server/src/instaserver.py
+++ b/scripts/instagrapi_server/src/instaserver.py
@ -0,0 +1,157 @@
 """https://subzeroid.github.io/instagrapi/
 Run using the following command:
 uvicorn src.instgrapinstance.instaserver:app --host 0.0.0.0 --port 8000 --reload
 """
 import logging
 import os
 import sys
 from dotenv import load_dotenv
 from fastapi import FastAPI, HTTPException
 from instagrapi import Client
 from instagrapi.exceptions import LoginRequired, BadCredentials
 load_dotenv(dotenv_path="secrets/.env")
 logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
 INSTAGRAM_USERNAME = os.getenv("INSTAGRAM_USERNAME")
 INSTAGRAM_PASSWORD = os.getenv("INSTAGRAM_PASSWORD")
 SESSION_FILE = "secrets/instagrapi_session.json"
 app = FastAPI()
 cl = Client()
@app.on_event("startup")
 def startup_event():
    """Login automatically when server starts"""
    try:
        login_instagram()
    except RuntimeError as e:
        logging.error(f"API failed to start: {e}")
        sys.exit(1)
 def login_instagram():
    """Ensures Instagrapi is logged in and session is persistent"""
    if not INSTAGRAM_USERNAME or not INSTAGRAM_PASSWORD:
        raise RuntimeError("Instagram credentials are missing.")
    if os.path.exists(SESSION_FILE):
        try:
            cl.load_settings(SESSION_FILE)
            cl.get_timeline_feed()
            logging.info("Using saved session.")
            return
        except LoginRequired:
            logging.info("Session expired. Logging in again...")
    try:
        cl.login(INSTAGRAM_USERNAME, INSTAGRAM_PASSWORD)
        cl.dump_settings(SESSION_FILE)
        logging.info("Login successful, session saved.")
    except BadCredentials as bc:
        raise RuntimeError("Incorrect Instagram username or password.") from bc
    except Exception as e:
        raise RuntimeError(f"Login failed: {e}") from e
@app.get("/v1/media/by/id")
 def get_media_by_id(id: str):
    """Fetch post details by media ID"""
    logging.info(f"Fetching media by ID: {id}")
    try:
        media = cl.media_info(id)
        return media.model_dump()
    except Exception as e:
        logging.warning(f"Media not found for ID {id}: {e}")
        raise HTTPException(status_code=404, detail="Post not found") from e
@app.get("/v1/media/by/code")
 def get_media_by_code(code: str):
    """Fetch post details by shortcode"""
    logging.info(f"Fetching media by shortcode: {code}")
    try:
        media_id = cl.media_pk_from_code(code)
        media = cl.media_info(media_id)
        return media.model_dump()
    except Exception as e:
        logging.warning(f"Media not found for code {code}: {e}")
        raise HTTPException(status_code=404, detail="Post not found") from e
@app.get("/v2/user/tag/medias")
 def get_user_tagged_medias(user_id: str, page_id: str = None):
    logging.info(f"Fetching tagged medias for user_id={user_id} page_id={page_id}")
    try:
        # Placeholder for now
        items, next_page_id = [], None
        return {"response": {"items": items}, "next_page_id": next_page_id}
    except Exception as e:
        logging.warning(f"Tagged media not found for {user_id}: {e}")
        raise HTTPException(status_code=404, detail="Tagged media not found") from e
@app.get("/v1/user/highlights")
 def get_user_highlights(user_id: str):
    logging.info(f"Fetching highlights list for user_id={user_id}")
    try:
        highlights = cl.user_highlights(user_id)
        return [h.model_dump() for h in highlights]
    except Exception as e:
        logging.warning(f"Highlights not found for {user_id}: {e}")
        raise HTTPException(status_code=404, detail="No highlights found") from e
@app.get("/v2/highlight/by/id")
 def get_highlight_by_id(id: str):
    logging.info(f"Fetching highlight details for id={id}")
    try:
        highlight = cl.highlight_info(id)
        return {"response": {"reels": {f"highlight:{id}": highlight.model_dump()}}}
    except Exception as e:
        logging.warning(f"Highlight not found for id {id}: {e}")
        raise HTTPException(status_code=404, detail="Highlight not found") from e
@app.get("/v1/user/stories/by/username")
 def get_stories(username: str):
    logging.info(f"Fetching stories for username={username}")
    try:
        user_id = cl.user_id_from_username(username)
        stories = cl.user_stories(user_id)
        return [story.model_dump() for story in stories]
    except Exception as e:
        logging.warning(f"Stories not found for {username}: {e}")
        raise HTTPException(status_code=404, detail="Stories not found") from e
@app.get("/v2/user/by/username")
 def get_user_by_username(username: str):
    logging.info(f"Fetching user profile for username={username}")
    try:
        user = cl.user_info_by_username(username)
        return {"user": user.model_dump()}
    except Exception as e:
        logging.warning(f"User not found: {username}: {e}")
        raise HTTPException(status_code=404, detail="User not found") from e
@app.get("/v1/user/medias/chunk")
 def get_user_medias(user_id: str, end_cursor: str = None):
    logging.info(f"Fetching paginated medias for user_id={user_id}, end_cursor={end_cursor}")
    try:
        posts, next_cursor = cl.user_medias_paginated(user_id, end_cursor=end_cursor)
        return [[post.model_dump() for post in posts], next_cursor]
    except Exception as e:
        logging.warning(f"No posts found for user_id={user_id}: {e}")
        raise HTTPException(status_code=404, detail="No posts found") from e
 if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
--- a/src/auto_archiver/modules/instagram_api_extractor/manifest.py
+++ b/src/auto_archiver/modules/instagram_api_extractor/manifest.py
@ -31,9 +31,11 @@
        },
    },
    "description": """
-Archives various types of Instagram content using the Instagrapi API.
+Archives Instagram content using a deployment of the [Instagrapi API](https://subzeroid.github.io/instagrapi/).
-Requires setting up an Instagrapi API deployment and providing an access token and API endpoint.
+Requires either getting a token from using a hosted [(paid) service](https://api.instagrapi.com/docs) and setting this in the configuration file.
 Alternatively you can run your own server. We have a basic script which you can use for this which can be ran locally or using Docker.
 For more information, read the [how to guide](https://auto-archiver.readthedocs.io/en/latest/how_to/run_instagrapi_server.html) on this.
 ### Features
 - Connects to an Instagrapi API deployment to fetch Instagram profiles, posts, stories, highlights, reels, and tagged content.