Running Behind

Just a quick update to mention that things are running a little behind getting API updates uploaded. The PATCH/PUT API routes are now available but I got sidetracked over the break with family and have not been able to complete the POST API route. I hope to get to it soon. Will post more once I have added the new route.

BTW: Yes I did see the latest from Openclipart, and their advancements in getting the site back up and running. I have many questions but doubt I will ever get answers, they do not seem to be very forthcoming with answers about anything and just seem to expect blind trust. It would be nice and go a long way in my opinion if they released the data on the clipart (tags, dates, etc) since that is copyrighted by them and not covered under the CC0. But I am not a lawyer so if someone knows more than me please inform me.

More to come!

Stage Two Complete

I am happy to announce that nine days after stage one was complete all of the files in stage two have now been imported into the site. To explain a little more about what is included in each stage. Stage one included all the files that I was able to extract meta data from the SVG files. That usually included things like a title, date, description, tags and a creator. Not all stage one files had each meta data field but often more than one. When the title was not available in stage one files the title was created from the filename if it was possible.

Stage two files included all files that had no meta data included in the SVG file. In this case the filename occasionally had the creator, date and a title was able to be created from the filename. It is not exact since the file name scheme is inconsistent over time. Dates were harder, since there were no dates in the meta data the dates are often estimated based on the last file that had a date. When no date was able to be estimated based on the last date available by ID a default date of Jan 1, 1970 should have been assigned. Some creator names we included in the filenames, and when I noticed the name, or was able to pick it out those creator names were removed from the title and added as the creator. I am sure I missed many.

Stage three files, are all the files from stage one and stage two that have incomplete meta data and little to no clue in the filename about what they are. These files need to be done by hand. I might see if I can cross reference them with what is available in the Wayback machine but these will take longer to process. These files will also include files that have already been added that I simply missed or are having problems processing as SVG’s or images. I will probably leave many of these for a bit since I think I need a break from looking at spreadsheets of files.

Next on the site will be some clean up of things that will just make it easier to maintain the site. Here are some things on the list the next little while:

  • Ability to submit changes to existing listings from the listing pages themselves when you are logged in.
  • Improvement to how the quick search bar at the top and the search results page.
  • Getting back to the JSON API to include the ability to POST new clipart to the site via the API and accept edits to existing items.
  • Some automated processes to flag when images are not processing correctly.
  • Friendlier login, registration and profile pages. This includes nicer edit pages for creators to edit listings.

If any of the original Openclipart creators would like to join it is possible for you to have the Openclipart submissions linked to your account here. This allows you to do several things:

  • Edit the titles, description, dates and tags of the already included Openclipart items here on the site.
  • Eventually export a CSV file of just your submissions.
  • Get some statistics of downloads.
  • Other suggestions? Leave a comment.

To get started you need to create an account and drop me a note with your original Openclipart user name and we will sort out what items are yours. This is still a work in progress and I am still working out some of the edit screens.

I think it is important that the original creators get credit for their work here. While that is not required under the CC0 I think the creators behind the artwork should know that their work is appreciated.

I have also updated the CSV exports of the clipart listings. The CSV file export tool will be getting a little bit of an update in the next little bit since it currently only exports the Openclipart sourced files. Now that new files can be uploaded here I need to make sure those other listings get included.

Stage One Complete

I am happy to announce that stage one of restoring Openclipart files is complete. That does not mean all the work is done, but all of the files that were scanned and had meta data embedded in them are now uploaded on the website. There are a few exceptions for files that turned out to be bad for some reason or other. The files are still available in the archive on Google Drive and at some point I hope to actually take a look at them and see if they can be fixed/restored.

Since this is a bit of a milestone I also decided to release a backup CSV of all the data from the SVG files. All the CSV backup data files will be available on Google Drive. Here is what you will find:

Title,Description,Date,Creator,Tags,OCALid,Filename
75 Stroke And Fill,,2019-10-26 20:18,,Fill|Stroke,300280,id-300280---75-stroke-and-fill.svg
38 Stroke And Fill,,2019-10-26 20:18,,Fill|Stroke,300287,id-300287---38-stroke-and-fill.svg
AUDIO I2S DAC GY PCM5102,,2019-10-26 20:18,,AUDIOI|SDACGYPCM51,299722,id-299722---audio_i2s_dac_gy_pcm5102.svg
7bbn Oyv Bo W,,2019-10-26 20:18,,,300706,id-300706---7bbnoyvbow.svg
TFT LCD Screen,,2019-10-26 20:18,,145TFTLCDSCREEN,299721,id-299721---1.45_tft_lcd_screen.svg
2.1 Stroke And Fill,,2019-10-26 20:18,,Fill|Stroke,300281,id-300281---2.1-stroke-and-fill.svg
Pointing Hand,,2019-10-26 20:18,Unknown,Body|Hand|Pointing,298683,id-298683---1500150517_v2.svg

If you open the CVS file in a spreadsheet you should be able to sort and filter the data. The filenames all correspond with the folders and filenames in the Google Drive backup of the original files.

Some things to consider, just because the meta data in the SVG files was there does not mean it was accurate or complete. Many files would have meta data for one field but not have complete data about the file. This is why you see some very strange titles, dates that make no sense (obviously none of these files were published in October 2019). There are also several files that had similar meta data embedded in the files but it seemed to be copied from one file to the next. From what I can tell many of the newer remixes have that issue. My guess is creators copied an original SVG file, edited it and saved it but did not edit the meta data from the original file.

If you are interested in the CSV backup file feel free to download it. It is also possible for me to mass update those listing from a similar formatted CSV file, if you are a creator and know what that data should be, or just know what some are feel free to create a copy edit the items you know and send me the link via the Contact form where I can download it. I can then process the CSV file and update the listings. Just make sure the new CSV file has either the OCALid field or Filename field included.

It would be nice if the other Openclipart restoration projects would be release their data as well, but I won’t hold my breath on that happening.

Hoping to have uploads and editing on the site available by this coming weekend. Testing them now and have a few bugs left to fix.

In other news Openclipart actually tweeted for the first time since August 12th. I was shocked.

The conversation from that tweet is more interesting. Particularly what does this mean?

Personally I don’t think Openclipart can be trusted any longer but that is a topic for another post.

Openclipart SVG Restoration Update

The Openclipart SVG collection completed processing last night, after six straight days of running. To be clear what I mean by processing. I already had all the files, but I was processing them to make sure the files were valid SVG files, had no errors and then I minified them for use on the website. That means that the SVG files you can download here are not the original files. They have been minified to save on space. All the original files, including whatever metadata they contain, are available on Google Drive.

Here are the stats on the files for those that might want to know.

Good Files

Original Files: 157,692

Size: 82.5GB

49,856 had meta data of some kind. Title, Description, Author, Date, or Tags. Some had all metadata attributes, some had only one of those attributes.

107,836 had no metadata in the SVG file. However it is probably possible to create a title from the filename from 87,230 of those files.

That is a total of 137,086 files that are probably to recover fairly accurate titles for.

Bad Files

Bad Files: 718

Size: 6.03GB

These files failed for any number of reasons. Some failed XML checks, some files were just bad. They might still be recoverable but I will not look at them again for awhile.

Website SVG Files

Files: 157,692

Size: 80.1GB

By minifying the SVG files the website is able to save some space on storing and hosting the SVG files. Minification was done using SVG Sanitizer, a fantastic project BTW. By running the files through SVG Sanitizer is also how many of the bad files were identified and now moved to lower priority.

While 2.4GB of space saving might not seem like a great deal every little bit helps.

Now that all the files are done processing I will be continuing to add them to the site so they can be searched using the Search API. I also have been testing a POSTing API to add new files to the site, but it is still in the early stages and not ready for the live site yet.

Welcome to FreeSVGClipart!

I would like to welcome you to FreeSVGClipart! It probably seems odd to show up at a clipart site and not see a ton of clipart on the home page, but there is a reason for that but first let me introduce myself. My name is Lee and I have long been a fan and part of the Openclipart community. By day I spend most of my time coding for clients and the rest of the time it seems I spend driving my kids around. In between those times I like to work on other projects and that has now turned into FreeSVGClipart!

When the Openclipart site went down it was a great loss to the clipart community and to the Internet as a whole. Personally I had several work and personal projects on the go that were utilizing the Openclipart API and those projects suddenly ground to halt. As the downtime of Openclipart dragged on and the message on the Openclipart site gave little to no information many people, including myself looked to Twitter for answers. Unfortunately all that many of us got were odd replies asking to show love. There just seemed to be no answers and no one that could answer the question of what really happened.

Slowly there have been some new projects pop up that have offered some hope to the clipart community, most notably FreeSVG and ClipartZero. They are great projects, and they seem to be starting to get a following. It is nice to see the community starting to recover from the loss of Openclipart, but I realized that what those projects are offering are not what I need.

I want to lay out what is coming here at FreeSVGClipart and why I am spending my time and energy on this project.

First this site is about restoring as much of the Openclipart collection as possible. The complete collection of the original files from Openclipart will be made available in its entirety so anyone will be able to not only use the collection independent of any website. The original thought was to use Github for this, but the sheer size of the collection makes that impossible. At this point the collection will be uploaded to Google Drive and the link to the shared files will available on the site so anyone can have access to the files. As I type this the uploads have been running for a few hours now. The collection is available here for anyone who wants it.

Second, any new files that are added to this site will also be made available in the complete collection for anyone to access.

The point is that these files are not the property of anyone person or corporate entity and the community can do with them as they please.

Third, the API will be built out to enable as much use as possible. Currently I was able to build out a search API that defaults to searching titles. Searching by tags, creator and original Openclipart IDS is also available to registered users. Once the majority of the Openclipart collection has been added to the site the plan is to include PATCH and POST options to update and add new clipart to the collection.

To come is roughly 156,000+ clipart files. They are being uploaded based on how much information there is about the files. Many of the Openclipart SVG files have metadata included with them. Those files are first, second will be the files that we can gather some data from the filenames, and last will be the files that have very little information about. It will be an interesting journey and I look forward to working on this.