|
1 |
| -# linktest |
2 |
| -Pure CLI implementation of httpreserve |
| 1 | +<div> |
| 2 | +<p align="center"> |
| 3 | +<img id="logo" src="https://github.com/httpreserve/httpreserve/raw/master/src/images/httpreserve-logo.png" alt="httpreserve"/> |
| 4 | +</p> |
| 5 | +</div> |
| 6 | + |
| 7 | +# linkstat |
| 8 | + |
| 9 | +CLI implementation of httpreserve that can test links and retrieve Internet |
| 10 | +Archive replacements. The tool can output the result of individual links, or |
| 11 | +take a CSV list to output collected information in JSON, BoltDB, or CSV format. |
| 12 | + |
| 13 | +## Usage |
| 14 | +```bash |
| 15 | +Usage: linkstat [Optional -link] [Optional -label] |
| 16 | + [Optional -list] [Optional -json] |
| 17 | + [Optional -bolt] |
| 18 | + [Optional -csv] |
| 19 | + [Optional -version -v] |
| 20 | + |
| 21 | +Output: [Json] |
| 22 | +Output: [CSV] |
| 23 | +Output: [BoltDB] |
| 24 | +Output: [Version] 'exponentialDK-httpreserve/0.0.9 ...' |
| 25 | + |
| 26 | +Usage of ./linkstat: |
| 27 | + -bolt |
| 28 | + Output to static BoltDB. |
| 29 | + -csv |
| 30 | + Output to CSV. |
| 31 | + -json |
| 32 | + Output to JSON. |
| 33 | + -label string |
| 34 | + Annotate single URL check response with label. |
| 35 | + -link string |
| 36 | + Seek the status of a single URL: JSON |
| 37 | + -list string |
| 38 | + Provide a list of URLs to test against in CSV format. |
| 39 | + -v Return httpreserve version. |
| 40 | + -version |
| 41 | + Return httpreserve version. |
| 42 | +``` |
| 43 | + |
| 44 | +## Examples |
| 45 | + |
| 46 | +#### Example combining [tikalinkextract][httpreserve-1] |
| 47 | + |
| 48 | +Inspired by Harvard Innovation Labs to test the ability of |
| 49 | +httpreserve-workbench at the time. This CLI version is a simplification of that |
| 50 | +work but should still produce decent results. HTTPreserve |
| 51 | +[Million Dollar Webpage Project][httpreserve-2] |
| 52 | + |
| 53 | +[httpreserve-1]: https://github.com/httpreserve/tikalinkextract |
| 54 | +[httpreserve-2]: https://github.com/httpreserve/million-dollar-webpage |
| 55 | + |
| 56 | +#### CSV input |
| 57 | + |
| 58 | +An input CSV `example.csv` might look as follows: |
| 59 | +```csv |
| 60 | +"BBC News", "http://www.bbc.co.uk/news" |
| 61 | +"BBC Home", "http://www.bbc.co.uk/" |
| 62 | +"BBC Radio", "http://www.bbc.co.uk/radio" |
| 63 | +"Google", "http://www.google.com" |
| 64 | +"exponentialdecay.co.uk", "http://www.exponentialdecay.co.uk" |
| 65 | +"Internet Archive", "http://www.archive.org" |
| 66 | +"perma.cc", "http://perma.cc" |
| 67 | +"wikipedia.org", "http://wikipedia.org" |
| 68 | +"The Million Dollar Homepage", "http://www.getpixel.net" |
| 69 | +``` |
| 70 | + |
| 71 | +To output a CSV collecting all of the linkstat results, you can run a command |
| 72 | +as follows: |
| 73 | +```bash |
| 74 | +$ ./linkstat -csv --list example.csv > output.csv |
| 75 | +``` |
| 76 | + |
| 77 | +And the output looks as follows: |
| 78 | +``` |
| 79 | +"id","filename","link","response code","response text","title","content-type","archived","internet archive response code","internet archive response text","wayback earliest date","internet archive earliest","wayback latest date","internet archive latest","internet archive save link","protocol error","protocol error","analysis version number","analysis version text","stats creation time" |
| 80 | +"1651a00b16a12ba06fc6c6b049c7cf7c","BBC News","https://www.bbc.co.uk/news","200","OK","home - bbc news","text/html;charset=utf-8","true","302","Found","09 October 1997","http://web.archive.org/web/19971009011901/http://www.bbc.co.uk/news/","19 March 2019","http://web.archive.org/web/20190319173721/https://www.bbc.co.uk/news","http://web.archive.org/save/https://www.bbc.co.uk/news","","","0.0.9","exponentialDK-httpreserve/0.0.9","1.574649021s" |
| 81 | +"57ab6349a47b53b982a939fb1da54fef","BBC Radio","https://www.bbc.co.uk/sounds","200","OK","bbc sounds - music. radio. podcasts","text/html; charset=utf-8","true","302","Found","19 March 2008","http://web.archive.org/web/20080319074038/http://www.bbc.co.uk/sounds","18 March 2019","http://web.archive.org/web/20190318211158/https://www.bbc.co.uk/sounds","http://web.archive.org/save/https://www.bbc.co.uk/sounds","","","0.0.9","exponentialDK-httpreserve/0.0.9","1.660729358s" |
| 82 | +"c85da5e372ffe2200e46527b74537ba3","BBC Home","https://www.bbc.co.uk/","200","OK","bbc - home","text/html; charset=utf-8","true","302","Found","21 December 1996","http://web.archive.org/web/19961221203254/http://www0.bbc.co.uk/","19 March 2019","http://web.archive.org/web/20190319141018/https://www.bbc.co.uk/","http://web.archive.org/save/https://www.bbc.co.uk/","","","0.0.9","exponentialDK-httpreserve/0.0.9","1.95442772s" |
| 83 | +"b3bd672c1014e07e87ef4a357a161528","exponentialdecay.co.uk","http://www.exponentialdecay.co.uk","206","Partial Content","ross spencer, digital preservation, archives, python developer, golang developer, uk, nz","text/html","true","302","Found","17 September 2008","http://web.archive.org/web/20080917054811/http://www.exponentialdecay.co.uk/","13 November 2018","http://web.archive.org/web/20181113021338/http://exponentialdecay.co.uk/","http://web.archive.org/save/http://www.exponentialdecay.co.uk","","","0.0.9","exponentialDK-httpreserve/0.0.9","425.368183ms" |
| 84 | +``` |
| 85 | + |
| 86 | +#### An individual link |
| 87 | + |
| 88 | +The command: `./linkstat -link https://github.com/ -label "GitHub"` will |
| 89 | +output: |
| 90 | +```json |
| 91 | +{ |
| 92 | + "FileName": "GitHub", |
| 93 | + "AnalysisVersionNumber": "0.0.9", |
| 94 | + "AnalysisVersionText": "exponentialDK-httpreserve/0.0.9", |
| 95 | + "SimpleRequestVersion": "httpreserve-simplerequest/0.0.4", |
| 96 | + "Link": "https://github.com/", |
| 97 | + "Title": "the world’s leading software development platform · github", |
| 98 | + "ContentType": "text/html; charset=utf-8", |
| 99 | + "ResponseCode": 200, |
| 100 | + "ResponseText": "OK", |
| 101 | + "ScreenShot": "snapshots are not currently enabled", |
| 102 | + "InternetArchiveLinkLatest": "http://web.archive.org/web/20190319223453/https://github.com/", |
| 103 | + "InternetArchiveLinkEarliest": "http://web.archive.org/web/20080514210148/http://github.com/", |
| 104 | + "InternetArchiveSaveLink": "http://web.archive.org/save/https://github.com/", |
| 105 | + "InternetArchiveResponseCode": 302, |
| 106 | + "InternetArchiveResponseText": "Found", |
| 107 | + "Archived": true, |
| 108 | + "Error": false, |
| 109 | + "ErrorMessage": "", |
| 110 | + "StatsCreationTime": "4.295493892s" |
| 111 | +} |
| 112 | +``` |
| 113 | + |
| 114 | +## Archiving Weblinks |
| 115 | + |
| 116 | +* [Find and Connect Project:][linkstat-1] Nicola Laurent on the impact of |
| 117 | +broken links. |
| 118 | +* [Binary Trees? Automatically Identifying the links between born digital records:][linkstat-2] |
| 119 | +I write about hyperlinks as a public record in own right when submitted as part |
| 120 | +of a documentary heritage. |
| 121 | +* [HiberActive Pilot][linkstat-3] A scholarly publishing tool that extracts |
| 122 | +URLs, returns both the original URL and a perma-link. |
| 123 | +* [IIPC Awesome List][linkstat-4] A list of web-archiving links that invites |
| 124 | +contributions from the community to keep it up-to-date. |
| 125 | + |
| 126 | +[linkstat-1]: http://www.findandconnectwrblog.info/2016/11/broken-links-broken-trust/ |
| 127 | +[linkstat-2]: https://www.youtube.com/watch?v=Ked9GRmKlRw |
| 128 | +[linkstat-3]: https://www.era.lib.ed.ac.uk/handle/1842/23366 |
| 129 | +[linkstat-4]: https://github.com/iipc/awesome-web-archiving |
| 130 | + |
| 131 | +## License |
| 132 | + |
| 133 | +GNU General Public License Version 3. [Full Text](LICENSE) |
0 commit comments