dbaspot
Tags Register FAQ Calendar Search Today's Posts Mark Forums Read

An issue with Full Text Search against PDF blobs... - ms-sqlserver

This is a discussion on An issue with Full Text Search against PDF blobs... - ms-sqlserver ; My client is trying to use SQL Server Full Text Search to search PDFs stored in a varbinary(max) column. Full text search works fine for other Microsoft Office documents stored in the same table but it does NOT give back ...


Home > Database Forum > Microsoft SQL Server > ms-sqlserver > An issue with Full Text Search against PDF blobs...

Reply

 

LinkBack Thread Tools Display Modes
  #1  
Old 08-09-2007, 06:26 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default An issue with Full Text Search against PDF blobs...

My client is trying to use SQL Server Full Text Search to search PDFs stored
in a varbinary(max) column. Full text search works fine for other Microsoft
Office documents stored in the same table but it does NOT give back any
results from the PDF documents.

Following is a high-level view of what the client did:

1. Installed Adobe PDF IFilter 6.0
2. Ran the stored procedure sp_fulltext_service (as documented)
3. Restarted the server
4. Verified that the filter got properly installed by querying the system
view
sys.fulltext_document_types
5. Created a full text index on the table with the documents
6. Started a full population of the index
7. Ran a sample query with a string he knows is in the PDF file like the
following:
select * from documents where freetext(document, ‘Review’) and got no
results
back
8. Ran the same sample query with a string he knows is in some Word files
like the
following: select * from documents where freetext(document, ‘SQL’) the
query
returned several rows back as expected.

Does anybody know what might be happening here?

Thank you!

Camilo Leon

Reply With Quote
  #2  
Old 08-10-2007, 12:04 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: An issue with Full Text Search against PDF blobs...

Camilo,

Are you using 64-bit Windows and SQL Server? If so, last time I looked the
Adobe PDF IFilter was only 32-bit.

RLF
"Camilo" wrote in message
news:92BEFB18-89BF-4C7E-8B11-769B0C329B7B@microsoft.com...
> My client is trying to use SQL Server Full Text Search to search PDFs
> stored
> in a varbinary(max) column. Full text search works fine for other
> Microsoft
> Office documents stored in the same table but it does NOT give back any
> results from the PDF documents.
>
> Following is a high-level view of what the client did:
>
> 1. Installed Adobe PDF IFilter 6.0
> 2. Ran the stored procedure sp_fulltext_service (as documented)
> 3. Restarted the server
> 4. Verified that the filter got properly installed by querying the system
> view
> sys.fulltext_document_types
> 5. Created a full text index on the table with the documents
> 6. Started a full population of the index
> 7. Ran a sample query with a string he knows is in the PDF file like the
> following:
> select * from documents where freetext(document, 'Review') and got no
> results
> back
> 8. Ran the same sample query with a string he knows is in some Word files
> like the
> following: select * from documents where freetext(document, 'SQL') the
> query
> returned several rows back as expected.
>
> Does anybody know what might be happening here?
>
> Thank you!
>
> Camilo Leon
>



Reply With Quote
  #3  
Old 08-10-2007, 01:52 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: An issue with Full Text Search against PDF blobs...

Russell,

No, we are using a 32-bit machine in this case.

I did make FTS work with PDF files by creating a new table, importing the
data, creating a new catalog and populating it again.

Thanks!

Camilo

"Russell Fields" wrote:

> Camilo,
>
> Are you using 64-bit Windows and SQL Server? If so, last time I looked the
> Adobe PDF IFilter was only 32-bit.
>
> RLF
> "Camilo" wrote in message
> news:92BEFB18-89BF-4C7E-8B11-769B0C329B7B@microsoft.com...
> > My client is trying to use SQL Server Full Text Search to search PDFs
> > stored
> > in a varbinary(max) column. Full text search works fine for other
> > Microsoft
> > Office documents stored in the same table but it does NOT give back any
> > results from the PDF documents.
> >
> > Following is a high-level view of what the client did:
> >
> > 1. Installed Adobe PDF IFilter 6.0
> > 2. Ran the stored procedure sp_fulltext_service (as documented)
> > 3. Restarted the server
> > 4. Verified that the filter got properly installed by querying the system
> > view
> > sys.fulltext_document_types
> > 5. Created a full text index on the table with the documents
> > 6. Started a full population of the index
> > 7. Ran a sample query with a string he knows is in the PDF file like the
> > following:
> > select * from documents where freetext(document, 'Review') and got no
> > results
> > back
> > 8. Ran the same sample query with a string he knows is in some Word files
> > like the
> > following: select * from documents where freetext(document, 'SQL') the
> > query
> > returned several rows back as expected.
> >
> > Does anybody know what might be happening here?
> >
> > Thank you!
> >
> > Camilo Leon
> >

>
>
>

Reply With Quote
  #4  
Old 08-13-2007, 11:00 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: An issue with Full Text Search against PDF blobs...

Check the gatherer logs to see what the status of your population was.

--
relevantNoise - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Camilo" wrote in message
news:92BEFB18-89BF-4C7E-8B11-769B0C329B7B@microsoft.com...
> My client is trying to use SQL Server Full Text Search to search PDFs
> stored
> in a varbinary(max) column. Full text search works fine for other
> Microsoft
> Office documents stored in the same table but it does NOT give back any
> results from the PDF documents.
>
> Following is a high-level view of what the client did:
>
> 1. Installed Adobe PDF IFilter 6.0
> 2. Ran the stored procedure sp_fulltext_service (as documented)
> 3. Restarted the server
> 4. Verified that the filter got properly installed by querying the system
> view
> sys.fulltext_document_types
> 5. Created a full text index on the table with the documents
> 6. Started a full population of the index
> 7. Ran a sample query with a string he knows is in the PDF file like the
> following:
> select * from documents where freetext(document, 'Review') and got no
> results
> back
> 8. Ran the same sample query with a string he knows is in some Word files
> like the
> following: select * from documents where freetext(document, 'SQL') the
> query
> returned several rows back as expected.
>
> Does anybody know what might be happening here?
>
> Thank you!
>
> Camilo Leon
>



Reply With Quote
  #5  
Old 09-17-2009, 12:07 AM
Database Newbie
 
Join Date: Sep 2009
Posts: 1
oyen.kai is on a distinguished road
Default Re: An issue with Full Text Search against PDF blobs...

Quote:
Originally Posted by usenet View Post
Camilo,

Are you using 64-bit Windows and SQL Server? If so, last time I looked the
Adobe PDF IFilter was only 32-bit.
There IS a 64-bit version of PDF IFilter. And that's what I installed.

I'm experiencing the same problem: no results from PDFs are returned.
Reply With Quote
  #6  
Old 11-07-2009, 02:01 AM
Database Newbie
 
Join Date: Nov 2009
Posts: 4
bizexpert is on a distinguished road
Default Re: An issue with Full Text Search against PDF blobs...

I am trying to get the full-text search thesaurus to work in Sql
Server 2005 Express edition. I live in the USA so I assume the
tx.ENU.xml is the appropriate file to modify. I used Notepad to
modify the tx.ENU.xml file and saved as unicode:
Reply With Quote
Reply

Thread Tools
Display Modes



All times are GMT -4. The time now is 01:09 PM.