AWS Lambda Functions as iPhone Apps

1/29/2025

I wrote this post on paper and published it here by making a picture of it with a personal app on my iPhone. Well, part of it, and for a particular understanding of 'iPhone app'.

[inserts GIF blog]

Paper Website, a project by Ben Stoke, inspired me. If you're curious, Ben writes about creating the service on his blog Tiny Projects. He created the first iteration of Paper Website in 2021 using standard OCR techniques to read the handwriting, and then using GPT-3 to clean up the typos.

Transcribing text to Markdown with an LLM

As I read his post in 2025, I realized that multi-modal LLMs could now do both steps at once: from picture directly to Markdown. I tried it—it worked!

I had to do some prompt engineering to get Claude 3.5 Sonnet to parse my preferred style of indicating headers, dates, and links. (And if you're eagle-eyed, you'll see it's still far from perfect.) I gave up before I could get it to consistently recognize when I was trying to reorder content with circles and arrows.

But, no reason to let the perfect be the enemy of the good.

Next I wanted to be able to take a picture on my phone and have it publish the entry directly to this website. This is a static site hosted by DigitalOcean. It's automatically redeployed whenever there's a commit to main on the GitHub repo hosting its source code. The app needs this flow:

  1. take a picture,
  2. send that picture to a multi-modal LLM like GPT-4o or Claude 3.5 Sonnet, and then
  3. edit the appropriate HTML files with the response and commit the changes to the repo.

Apple makes it inconvenient to have your own home-cooked apps on your iPhone. You can avoid the $99 annual developer fee by registering your iPhone as a test device, but any apps you install stop working after seven days.

I had a terrible idea. Could I create a webpage with this functionality and bookmark the public URL to my phone's homescreen, so it seemed like an app? The page would need to have access to an API key for the LLM service, and write access to the GitHub repo for this website, so it'd need to have a backend.

AWS Lambda functions can be given public URLs.

AWS Lambda as iPhone app

I built a frontend that captures an image with the camera and sends it to a Lambda in a POST request, expecting a URL back. When it gets the URL, it redirects the user to that page. Then I set up a Lambda function that would serve the front end, and another Lambda function that would take requests from the first function and return the URL of the published post. (I could have combined these two, but I was designing the architecture as I was building it: call it technical debt.) The backend is:

def lambda_handler(event, context):
    image = json.loads(event["body"])["image"]
    markdown = get_markdown(image) # Call Anthropic's API
    url = add_post(markdown) # Use GitHub's API
    return {"statusCode": 200, "body": json.dumps(url)}

where get_markdown makes a call to Anthropic with the image received:


def get_markdown(image):
    client = anthropic.Anthropic()
    message = client.messages.create(
        max_tokens=128,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": "image/jpeg",
                            "data": image,
                        },
                    },
                    {"type": "text", "text": PROMPT},
                ],
            }
        ],
        model="claude-3-5-sonnet-latest",
    )
    return message.content[0].text

and add_post uses the GitHub Python library -- because the Lambda Python environment does not include git -- to make edits directly to the main branch of the repo hosting this site's files:


def add_post(markdown) -> str:
    slug = get_url(markdown)
    title = get_title(markdown)

    github_auth = Auth.Token(os.environ["GITHUB_ACCESS_TOKEN"])
    github_client = Github(auth=github_auth)
    repo = github_client.get_repo("KevinJBoyer/kevinjb.com")

    # Write the markdown to a file
    post_html = render_post(markdown)
    repo.create_file(f"post/{slug}", f"Publish {title}", post_html)

    # Update the posts index
    posts_file = repo.get_contents("/posts.html")
    posts_html = str(posts_file.decoded_content, "utf-8")
    updated_posts_html = render_posts(posts_html, markdown)
    sha = posts_file.sha
    repo.update_file(
        "posts.html", f"Publish to posts: {title}", updated_posts_html, sha
    )

    return f"https://www.kevinjb.com/post/{slug}"

One bookmarked Lambda URL later (the one hosting the frontend), and voilà, I have an iPhone app. When I tap the icon, my phone's camera opens; when I snap a pic, it gets published to this site. I even get a pleasingly abstract icon, courtesy of there being no favicon.ico and the first letter of the URL being an "o").

Security

Security by obscurity: better hope no one can figure out the url of the Lambda function (e.g., by watching a GIF of me opening the app), or anyone can publish arbitrary code to this site. Whoops.

Maybe the first time the user (me) opens the frontend, I ask the user for a secret. That secret then gets stored in the user's cookies and included with POST requests to the Lambda. Requests without the secret are rejected.

Why not go further: instead of asking for a secret to verify the Lambda, I can ask the user directly for the Anthropic API key and a GitHub authentication key. The frontend can handle 100% of the business logic. There's no need for a Lambda at all, saving a good bit of technical complexity and more importantly, cents per year in AWS costs. (As a bonus, I could control the app icon too, by providing a suitable favicon.) Is this a progressive web app?

Product-market fit

Before I did any refactoring, I realized I'd validated technical feasibility early but not the value hypothesis. I tried actually using what I'd built and discovered that, while I like writing on paper for journaling, I prefer using a computer for anything I'll have to edit. Whoops.

The friends I made along the way

Posts like these can give the impression that building is a straight road. You have a destination in mind, you know what the hard parts of the journey are going to be, you complete the journey step by step. Anyone who's built anything knows this is not the case. It makes sense to omit the drudgery from write-ups like this one: it's not as interesting to read about how I struggled for a few hours to get the Python dependencies to run in the Lambda runtime environment.

But I did struggle for hours to get that part working.

I worry these omissions can discourage people when they run into similar roadblocks: they think, “this should be the simple part, therefore since I'm struggling, I'm not ready to do this yet.” Nothing is simple. Banging your head against the keyboard until you get past the latest roadblock is, really, what most of software engineering is. While building this prototype, most of my time was spent on “simple” problems:

And in the end I didn't need a Lambda at all, or even the entire app. That's the story of most of my personal projects. I tell myself “this will be useful” to rationalize projects like this one, but the fun is in the building.

Read some of my other writing, or return to home.