Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
c427a12
Upgrade to JUnit 5
ato May 18, 2025
21c81e5
Remove dependency on Apache Commons HttpClient 3.1
ato May 19, 2025
52641f6
Merge pull request #106 from iipc/junit5
ato May 19, 2025
52f8abf
Merge pull request #107 from iipc/remove-httpclient-3.1
ato May 20, 2025
8364832
Remove deprecated class org.archive.io.ArchiveFileConstants
ato May 21, 2025
a851937
Remove deprecated class org.archive.io.warc.WARCConstants
ato May 21, 2025
9ebbfa9
Remove deprecated methods
ato May 21, 2025
21a9100
Remove deprecated class org.archive.io.arc.ARCConstants
ato May 21, 2025
b44924c
Document HttpClient 3 removal in CHANGES.md
ato May 21, 2025
ba22f96
Upgrade dependencies for 2.0.0
ato May 21, 2025
c3299bb
Bump maven-compiler-plugin to 3.14.0
ato May 21, 2025
53f7009
Remove deprecated URL canonicalizer classes
ato May 21, 2025
f578a14
Merge pull request #109 from iipc/remote-deprecated
ato May 21, 2025
cc85f05
Add RecordingInputStream.asOutputStream()
ato May 20, 2025
ef054d1
Merge pull request #108 from iipc/RecordingInputStream-getOutputStream
ato May 21, 2025
76fb20f
Fix javadoc errors
ato May 21, 2025
5c42251
[maven-release-plugin] prepare release webarchive-commons-2.0.0
ato May 21, 2025
aafab50
[maven-release-plugin] prepare for next development iteration
ato May 21, 2025
0e65973
Update plugin versions
ato May 21, 2025
cf21eb2
Limit permissions on CI action
ato May 21, 2025
e3f0682
CI: Remove dependency graph step
ato May 21, 2025
840ae37
Re-add and undeprecate Reporter.shortReportLineTo(PrintWriter)
ato May 21, 2025
511a9da
Update CHANGES.md for 2.0.1
ato May 21, 2025
37dee96
[maven-release-plugin] prepare release webarchive-commons-2.0.1
ato May 21, 2025
6688337
[maven-release-plugin] prepare for next development iteration
ato May 21, 2025
c28cb73
feat: handle unicode, handle unsorted input edge cases, namespace pub…
adam-miller Jul 15, 2025
d8d850a
chore: update to latest public suffixes effective_tld_names.dat
adam-miller Jul 15, 2025
c57f059
Merge pull request #110 from adam-miller/fix_public_suffixes_tld_parsing
ato Jul 15, 2025
71fe7e1
Update CHANGES.md for 2.0.2
ato Jul 15, 2025
1765320
Update from OSSRH to Central portal
ato Jul 15, 2025
e7fdd30
Bump junit-jupiter from 5.12.2 to 5.13.3
ato Jul 15, 2025
40f11d8
[maven-release-plugin] prepare release webarchive-commons-2.0.2
ato Jul 15, 2025
7c84862
[maven-release-plugin] prepare for next development iteration
ato Jul 15, 2025
7c1cb7f
Upgrade from commons-lang 2.6 to commons-lang3 3.18.0
ato Jul 15, 2025
ed94261
Add .idea to .gitignore
ato Jul 15, 2025
056d323
Merge pull request #111 from iipc/commons-lang3
ato Jul 21, 2025
9a2b1d8
Bump commons-io from 2.19.0 to 2.20.0
ato Jul 21, 2025
eeb10f8
Update CHANGES.md for 3.0.0
ato Jul 21, 2025
b30ff6f
[maven-release-plugin] prepare release webarchive-commons-3.0.0
ato Jul 21, 2025
83ffa44
[maven-release-plugin] prepare for next development iteration
ato Jul 21, 2025
d69c8aa
Merge remote-tracking branch 'iipc/master' into upgrade-webarchive-co…
sebastian-nagel Aug 26, 2025
e9b12d6
Require a recent version of the Maven surefire plugin to support JUnit 5
sebastian-nagel Aug 26, 2025
ab99765
Upgrade dependency jsoup 1.18.3 -> 1.21.2
sebastian-nagel Aug 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions .github/workflows/maven.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
name: Java CI with Maven

permissions:
contents: read

on:
push:
branches: [ "master" ]
Expand Down Expand Up @@ -31,9 +34,4 @@ jobs:
restore-keys: |
${{ runner.os }}-maven-
- name: Build with Maven
run: mvn -B package --file pom.xml

# Optional: Uploads the full dependency graph to GitHub to improve the quality of Dependabot alerts this repository can receive
- name: Update dependency graph
if: ${{ github.event_name == 'push' }}
uses: advanced-security/[email protected]
run: mvn -B package --file pom.xml
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.idea
*.pydevproject
.project
.metadata
Expand Down
117 changes: 117 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,120 @@
Unreleased
----------

3.0.0
-----

### Changes

`FileUtils.pagedLines()` and `FileUtils.expandRange()` now return the Apache Commons Lang 3 version of `LongRange`.
Users of these methods may need to make the following changes:

| Old | New |
|-------------------------------------------------|---------------------------------------------|
| `import org.apache.commons.lang.math.LongRange` | `import org.apache.commons.lang3.LongRange` |
| `new LongRange(min, max)` | `LongRange.of(min, max)` |
| `longRange.getMaximumLong()` | `longRange.getMaximum()` |
| `longRange.getMinimumLong()` | `longRange.getMinimum()` |

### Dependency upgrades

- **commons-io**: 2.19.0 → 2.20.0
- **commons-lang**: 2.6 → 3.18.0

2.0.2
-----

### Fixes

* Fixes for `org.archive.net.PublicSuffixes` [#110](https://github.com/iipc/webarchive-commons/pull/110)
* Updated to the latest version of the public suffix list.
* Fixed parsing failures with newer list versions.
* Moved `effective_tld_names.dat` to `org/archive/effective_tld_names.dat` to prevent conflict with `crawler-commons`.

2.0.1
-----

### Changes

* Re-added `Reporter.shortReportLineTo(PrintWriter)` as it turned out to be important to Heritrix.


2.0.0
-----

### New features

- Added `RecordingInputStream.asOutputStream()` for direct writing of recorded data without an input stream. [#108](https://github.com/iipc/webarchive-commons/pull/108)

### Removals

#### Removed Apache HttpClient 3.1

`HTTPSeekableLineReaderFactory` and `ZipNumBlockLoader` now default to HttpClient 4.3.

| Removed | Replacement |
|-----------------------------------------------------------|--------------------------------------|
| `org.apache.commons.httpclient.URIException` | `org.archive.url.URIException` |
| `org.apache.commons.httpclient.Header` | `org.archive.format.http.HttpHeader` |
| `org.archive.httpclient.HttpRecorderGetMethod` | |
| `org.archive.httpclient.HttpRecorderMethod` | |
| `org.archive.httpclient.HttpRecorderPostMethod` | |
| `org.archive.httpclient.SingleHttpConnectionManager` | |
| `org.archive.httpclient.ThreadLocalHttpConnectionManager` | |

#### Removed deprecated versions of renamed classes

| Removed | Replacement |
|-----------------------------------------------|--------------------------------------------------|
| `org.archive.io.ArchiveFileConstants` | `org.archive.format.ArchiveFileConstants` |
| `org.archive.io.GzipHeader` | `org.archive.util.zip.GzipHeader` |
| `org.archive.io.GZIPMembersInputStream` | `org.archive.util.zip.GZIPMembersInputStream` |
| `org.archive.io.NoGzipMagicException` | `org.archive.util.zip.NoGzipMagicException` |
| `org.archive.io.arc.ARCConstants` | `org.archive.format.arc.ARCConstants` |
| `org.archive.io.warc.WARCConstants` | `org.archive.format.warc.WARCConstants` |
| `org.archive.url.DefaultIACanonicalizerRules` | `org.archive.url.AggressiveIACanonicalizerRules` |
| `org.archive.url.DefaultIAURLCanonicalizer` | `org.archive.url.AggressiveIAURLCanonicalizer` |
| `org.archive.url.GoogleURLCanonicalizer` | `org.archive.url.BasicURLCanonicalizer` |

#### Removed deprecated methods

| Removed | Replacement |
|-----------------------------------------------|-------------------------------------------|
| `ANVLRecord(int)` | `ANVLRecord()` |
| `DevUtils.betterPrintStack(RuntimeException)` | `Throwable.printStackStrace()` |
| `Recorder.getReplayCharSequence()` | `Recorder.getContentReplayCharSequence()` |
| `Reporter.shortReportLineTo(PrintWriter)` | `Reporter.reportTo(PrintWriter)` |

##### Removed usages of constant interfaces

Static imports should be used instead.

* `ArchiveFileConstants` is no longer implemented by:
* `ArchiveReader`
* `ArchiveReaderFactory`
* `WARCWriter`
* `WriterPool`
* `WriterPoolMember`
* `ARCConstants` is no longer implemented by:
* `ARCReader`
* `ARCReaderFactory`
* `ARCRecord`
* `ARCRecordMetaData`
* `ARCUtils`
* `ARCWriter`
* `WARCConstants` is no longer implemented by:
* `WARCReader`
* `WARCReaderFactory`
* `WARCRecord`
* `WARCWriter`

### Dependency upgrades

- **commons-io**: 2.18.0 → 2.19.0
- **guava**: 33.3.1-jre → 33.4.8-jre
- **json**: 20240303 → 20250517
- **junit**: 4.13.2 → 5.12.2

1.3.0
-----

Expand Down
36 changes: 21 additions & 15 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

<groupId>org.commoncrawl</groupId>
<artifactId>ia-web-commons</artifactId>
<version>1.3.1-SNAPSHOT</version>
<version>3.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<name>ia-web-commons</name>
Expand Down Expand Up @@ -41,7 +41,7 @@
<connection>scm:git:[email protected]:iipc/webarchive-commons.git</connection>
<developerConnection>scm:git:[email protected]:iipc/webarchive-commons.git</developerConnection>
<url>https://github.com/iipc/webarchive-commons</url>
<tag>HEAD</tag>
<tag>webarchive-commons-2.0.0</tag>
</scm>

<properties>
Expand All @@ -52,15 +52,16 @@

<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.2</version>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>5.13.3</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>33.3.1-jre</version>
<version>33.4.8-jre</version>
</dependency>

<dependency>
Expand All @@ -82,9 +83,9 @@
</dependency>

<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.18.0</version>
</dependency>

<dependency>
Expand Down Expand Up @@ -113,21 +114,21 @@
</dependency>

<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.18.0</version>
</dependency>

<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.18.3</version>
<version>1.21.2</version>
</dependency>

<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.11.0</version>
<version>2.20.0</version>
</dependency>

<dependency>
Expand All @@ -148,7 +149,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<version>3.14.0</version>
<configuration>
<source>8</source>
<target>8</target>
Expand Down Expand Up @@ -193,6 +194,11 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.5</version>
</plugin>
</plugins>

<resources>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import java.io.PrintStream;
import java.util.List;

import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang3.StringUtils;
import org.archive.format.json.JSONView;
import org.archive.resource.Resource;
import org.archive.util.StreamCopy;
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/org/archive/format/cdx/FieldSplitLine.java
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import java.util.ArrayList;
import java.util.List;

import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang3.StringUtils;

/**
* Base class for text lines that are split by a delimiter Some examples will be
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
package org.archive.format.gzip.zipnum;

import org.apache.commons.lang.math.NumberUtils;
import org.apache.commons.lang3.math.NumberUtils;
import org.archive.util.iterator.CloseableIterator;

public class TimestampBestPickDedupIterator extends TimestampDedupIterator {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ public class ZipNumBlockLoader {
protected int signDurationSecs = DEFAULT_SIG_DURATION_SECS;

protected boolean useNio = false;
protected String httpLib = HttpLibs.APACHE_31.name();
protected String httpLib = HttpLibs.APACHE_43.name();

protected boolean bufferFully = true;
protected boolean noKeepAlive = true;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import java.util.logging.Level;
import java.util.logging.Logger;

import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang3.StringUtils;

public class CrossProductOfLists<T> {
private static final Logger LOG =
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/org/archive/format/json/JSONView.java
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import java.util.logging.Level;
import java.util.logging.Logger;

import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang3.StringUtils;
import com.github.openjson.JSONObject;

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import java.util.logging.Level;
import java.util.logging.Logger;

import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang3.StringUtils;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
Expand Down
Loading