Category Archives: Boot13

Automattic sold your site data for years

If you installed and activated the popular Jetpack plugin on a self-hosted WordPress web site after 2013, and didn’t bother to read the fine print when accepting Jetpack’s Terms of Service, Automattic (the company that makes Jetpack) surreptitiously gathered your site’s data and sold it to social media and data analytics companies.

Jetpack is a free plugin that adds a useful collection of features to WordPress, including social media buttons and sharing, Markdown support, security, backups, anti-spam, stats, and so on. Some of these features have been very useful for the sites I’ve managed over the years.

How was Automattic able to do this?

There’s a somewhat hidden setting that controls whether Jetpack siphons data from your site and sends it to the Automattic mothership. Navigate to the Jetpack Dashboard, scroll to the bottom of the page, and click ‘Modules’. The setting you’re looking for (prior to Jetpack 13.3) is ‘Enhanced Distribution’. It should be named ‘Donate your content to Automattic and allow them to sell it and keep all the proceeds’.

Even if all the more obvious Jetpack features are disabled, if ‘Enhanced Distribution’ is enabled, Jetpack is sending your data to Automattic.

Making matters even worse, Jetpack updates have a nasty habit of re-enabling previously-disabled features or reverting to default settings. Whether this affected ‘Enhanced Distribution’ or not is unclear.

The Firehose

Automattic sold your site data as part of a product called Firehose, which potentially contained all of the original content from your site. Here’s the first paragraph from the Firehose product page:

WordPress publishers and visitors produce thousands of new posts and comments every hour. These content streams are available in three real-time formats from redundant servers. These streams are intended for partners like search engines, artificial intelligence (AI) products and market intelligence providers who would like to ingest a real-time stream of new content from a wide spectrum of publishers.

What does Automattic say about this?

A recent post on the wordpress.org support forum asked about Jetpack Backup & AI. Here’s how Automattic responded:

They will retire Firehose, but…

We have sold our Firehose to social and data analytics companies, and we have also used some distribution partners (like Socialgist) to sell the Firehose to these types of end users.

The release notes for Jetpack 13.3 (2024-April-03) shows this: “Enhanced Distribution: begin deprecation process as the Firehose is winding down.” The only obvious difference is that ‘Enhanced Distribution’ is no longer listed on Jetpack’s Modules page. Hopefully that means the option is now also disabled for all sites, not just further hidden.

They never sold to AI companies and don’t plan to

Neither we or our distribution partners sell the Firehose to any companies that are training LLMs or to any generative AI companies.

Enhanced distribution is a feature that was released in 2013 with the purpose of driving traffic by giving blogs additional readership in the WordPress.com Reader. Content from those sites were gathered with approval by accepting the terms of service. Our partners were social and data analytics companies.

Automattic also published an article titled ‘Protecting User Choice’, a response to concerns about selling data to AI companies.

Okay, but…

If you were about to point out that posting anything on a public-facing web site makes it available for anyone to use: okay, sure, but Automattic SOLD the data they gathered. I never expected to make any money from this site, but that doesn’t mean I’m happy about anyone else making money from it.

Recommendations

Stop using Jetpack. Automattic has done, is doing, and will in all likelihood continue to do some shady things. I regret ignoring the advice I received years ago to stop using Jetpack, and can only hope that any damage caused to clients due to my recommendation and use of Jetpack is minimal.

If you can’t avoid using Jetpack, please disable the ‘Enhanced Distribution’ module. Unfortunately, if you’re using version 13.3, it’s not clear how this can be accomplished.

Most of the features provided by Jetpack can be found in other free plugins. Switching to alternatives for the functions you actually need has the additional advantage of eliminating the overhead of what is now quite a bulky Jetpack.

Here are a few alternatives to Jetpack for specific functionality:

And there are many more possibilities. Jetpack certainly was a handy and simple way to add a lot of useful functionality to WordPress. But Automattic has demonstrated that they are willing to sneakily sell your site data, and I just can’t trust them anymore.

ChatGPT: experiments in writing

As I’m sure you’ve noticed by now, I’m using ChatGPT to generate some posts defining computing terms. I’m labeling them all clearly as generated text, but I’m also checking them carefully before posting.

Checking ChatGPT’s output is extremely important, because one can be lulled into believing that the program knows what it’s talking about. The text it generates always sounds authoritative, even when it’s horribly wrong. And it’s wrong more often that you might imagine.

Note: I’m not going to call ChatGPT an Artificial Intelligence (AI), because it’s nothing of the kind. It’s just a program that is able to generate believable-sounding text, regardless of its accuracy, based on analysis of a crapload of content written by humans. True AI will include this particular skill, just as it will include the ability to play chess, to generate art, to drive cars, to make quick decisions based on unpredictable circumstances, and many thousands more specific skills that humans possess.

A recent post at Futurism does a nice job of providing real-world examples of a mainstream news site (CNET) using ChatGPT to generate articles that sound fine, but include serious errors. In this case, the publisher did not initially identify the generated articles as being from ChatGPT, and while I can understand their reluctance, I think that was a bad call.

The risks of depending on ChatGPT are enumerated in this post from KnowBe4:

  • A big downside to ChatGPT is that its sources are unknown, meaning the creators themselves may not even know where the answers it produces come from. This often leads to incorrect information being given.
  • The concerning thing about this flaw is that the inaccuracies are tough to notice, especially because it sounds confident but can really be utter nonsense.
  • You cannot use ChatGPT to find out if text has been written by ChatGPT. There is no antidote.
  • It cannot detect scams. ChatGpt is a natural language processing tool that can answer questions and generate text based on the input it receives, but it does not have the ability to detect scams or other fraudulent activities.

Meanwhile, when I asked ChatGPT “Can ChatGPT’s writing be relied upon as accurate?” it responded as follows:

ChatGPT is a machine learning model and its accuracy is based on the data it was trained on. It can provide accurate information and responses to a wide range of questions and prompts, but it may not always be completely accurate or up-to-date. It’s always a good idea to verify any information provided by ChatGPT with additional sources.

Update 2024Jul01: We should think of ChatGPT as a bullshit machine. To find out why, read the detailed analysis over at Springer.com (Michael Townsen Hicks, James Humphries & Joe Slater). Excerpt:

Calling their mistakes ‘hallucinations’ isn’t harmless: it lends itself to the confusion that the machines are in some way misperceiving but are nonetheless trying to convey something that they believe or have perceived. This, as we’ve argued, is the wrong metaphor. The machines are not trying to communicate something they believe or perceive. Their inaccuracy is not due to misperception or hallucination. As we have pointed out, they are not trying to convey information at all. They are bullshitting.

Recent Infosec highlights

It sometimes feels like news in the world of information security (infosec) is a never-ending tsunami. With the almost-daily reports of breaches, malware, phishing, vulnerabilities, exploits, zero-days, ransomware, and the Internet of Things (IoT), it can be difficult to identify stories that are likely to be of interest to typical computer users.

Stories about infosec issues that are primarily academic may be interesting, but they’re unlikely to affect most users. Sometimes the impact of a security issue is exaggerated. Occasionally the threat is later found to be nonexistent or the result of faulty reporting.

In the past, I collected infosec stories and wrote about the most interesting and relevant ones in a single month-end roundup. This helped to manage the load, but it introduced an arbitrary and unrealistic schedule.

Starting today, I will occasionally post a few selected infosec stories in a single ‘highlights’ article. Without further ado…

Don’t be a victim of your own curiosity

Researchers in Germany discovered that most people click phishing links in emails, even when they don’t know the sender, and even when they know they shouldn’t do it. Why? Curiosity, apparently. It doesn’t just kill cats any more.

Promising new anti-phishing technology

On a related note, there’s a new reason to be optimistic in the fight against phishing. A proof-of-concept, prototype DNS greylisting service called ‘Foghorn’ would prevent access to unknown domains for 24 hours, or until the domain is identified as legitimate and whitelisted. Hopefully Foghorn will prove effective, and become available for regular users in the near future.

Scope of 2012 breaches of Last.fm and Dropbox finally revealed

Popular Internet radio service Last.fm suffered a breach way back in 2012, but the details were not revealed until very recently. According to a report from LeakedSource, as many as 43 million user passwords were leaked, and the passwords were stored using very weak security. If you had a Last.fm account in 2012, you were probably instructed to change your password. If you didn’t do it then, you should do it now.

Massively popular file sharing service DropBox was also breached in 2012, but again, the complete details of the breach are only coming to light now: passwords for as many as 60 million Dropbox user accounts were stolen. The validity of this information has been verified by SANS and Troy Hunt.

The usual advice applies:

  • If you have accounts for these services, change your passwords now, if you haven’t already.
  • Avoid using the same password for more than one service or site.
  • Use complex passwords.
  • Use password management software so you don’t have to remember all those unique passwords.

New: browse boot13.com securely

You may have noticed that web sites everywhere are moving toward secure browsing. There are a couple of reasons for this. First, Ed Snowden confirmed our fears, revealing that the NSA and partner organizations are snooping on everything we do. Second, Google is pushing for encryption everywhere by penalizing sites that don’t offer secure browsing.

Boot13 may now be browsed securely, by pointing your web browser to https://boot13.com.

A big shout out and thank-you to Let’s Encrypt, an organization that provides free security certificates and related tools to anyone who operates a site or service that can use them. The certificate we’re using on Boot13 was provided by Let’s Encrypt.

Secunia’s Online Security Inspector is no more

The formerly excellent free OSI service provided by Secunia has been discontinued. I used the OSI service because it was an easy way to check for vulnerable software on any Windows computer.

Recently, OSI stopped working, and Secunia chose to retire the service rather than fix it. There’s probably more to their decision, but they’re not saying, at least not publicly. The OSI web site says only “We have discontinued the Secunia Online Software Inspector (OSI).” and recommends alternatives.

The primary alternative to OSI offered by Secunia is the “Personal Software Inspector”. As with OSI, PSI was developed in Java and requires Java to run. Unlike OSI, however, PSI runs as an application outside the context of your web browser. This has at least one advantage, in that there’s now one less reason to leave Java enabled in your web browser.

Unlike OSI, which was a strictly on-demand service, PSI by default sets itself up to start with Windows, checking for vulnerable software and updating it automatically. I’m not a fan of automatic updates: I want to be in control of what gets updated and when. Fortunately, PSI can be configured to only notify you of software that can be updated. You can also configure it NOT to start with Windows, but there are some additional steps you’ll need to take if you want to use PSI strictly on-demand.

PSI installs two services: Secunia PSI Agent and Secunia Update Agent. These services are configured to start automatically with Windows. If you want to run PSI on-demand only, you’ll need to change the Startup Type for both of these services from Automatic to Manual. When you run PSI, it will start both of these services. When you close PSI, it will stop the Secunia PSI Agent service, but leave the Secunia Update Agent running (it appears as sua.exe in the Windows process list). You’ll have to stop it manually.

Once PSI is running, it presents a list of installed software, along with status and options for each. We recommend changing the display to ‘Detailed View’ – click ‘Settings’ at the bottom of the PSI screen and enable that setting. While you’re there, you can also disable ‘Start on boot’ and select ‘Update handling: Notify’. For each application listed, the Status column shows the most obvious options, including ‘Download’ and ‘Update’. Right-clicking the entry for an application will show a context menu that allows you to see additional details about available updates, or choose to ignore updates for that application.

Warning: PSI seems to start scanning your computer before it presents any part of its user interface. That means you have to act quickly the first time you run it, if you want to configure it for on-demand scans only. Hopefully now that OSI users are migrating to PSI, Secunia will listen to their requests and make PSI more friendly to people who prefer the on-demand approach.

Additional information on setting up and using Secunia’s PSI can be found on this site’s ‘Scan for vulnerable software‘ page.

What the heck is boot13?

Why boot13?  It’s the first program I ever ran on a microcomputer.  The computer was an Apple II+, and the full command was BRUNBOOT13:

BRUNBOOT13

I was trying to run a game for the first time: The Dragon’s Eye.  It wouldn’t boot from the 5 ¼” floppy disk I had.  So I called Wally, the guy who provided the computer.

Wally realized that the game disk used a slightly older format, with 13 sectors per track, instead of the newer 16 sector format.  The solution was to boot from the Apple II+ System Disk, then enter the command above from the command line.

On the Apple II+, parsing of command lines was a bit strange, in that commands built into the operating system were reliably parsed even when not separated from arguments.  In this case, the built in command was BRUN, which loads a binary program from disk and runs it.  The program was BOOT13, which, when run, allowed booting from 13 sector disks.

It worked.  The Dragon’s Eye turned out to be one of my favourite games, and I ended up figuring out how to modify it, first removing the copy protection, converting it to a 16 sector disk format, then changing the game’s Applesoft BASIC code.  I added a few features, most notably a system for recording and displaying high scores.

I still have a heavily-customized, home-built Apple II+ and that hacked version of the game, but these days when I want to play it, I use an Apple II+ emulator like AppleWin.

So: first program run, first command entered, so that I could run the first game on my first microcomputer. BOOT13.