Reading metadata from PDF 1.6 documents in the released OpenText Web Site Management Server
versions is not working at the moment, as a quite old 3rd party component (pdfinfo.exe) used by the
AssetManager is not supporting newer PDF versions.
There is a pretty easy way to come around this limitation. All you need is the freely available
iTextSharp library and this small piece of code to create a console application called PDFInfo.exe
that behaves like the old version, but can handle the newest PDF versions:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text.pdf;
namespace PDFInfo
{
class PDFInfo
{
static void Main(string[] args)
{
PdfReader reader = new PdfReader(args[0]);
Dictionary<string,string> infos = reader.Info;
foreach (string key in infos.Keys)
{
string value = infos[key];
switch (key) {
// check if the date has to be transformed otherwise
case "CreationDate":
case "ModDate":
value = value.Replace("D:","");
value = value.Trim();
break;
}
Console.WriteLine("{0}: {1}",key.ToLower(),value);
}
Console.WriteLine("Pages: {0}",reader.NumberOfPages);
Console.WriteLine("Encrypted: {0}",reader.IsEncrypted() ? "yes" : "no");
}
}
}
A full featured sample can be downloaded here. Please be aware that this
sample is using iTextsharp that is licensed under the Affero General Public License.
To actually use this small console program in your Web Site Management Server, build the console
app and copy the .exe file into your %RDCMS%\MediaCatalog folder. Please make a backup copy of
the old before.
Please also note that WSM is only executing the process for 5 seconds. If the process needs more
time, e.g. with large documents, the process will be killed and no metadata is read. There is no
configuration possibility to change this threshold value.
The sample is provided without any warranty. Please try it on your own risk.
Keine Kommentare:
Kommentar veröffentlichen