19 Content Management
001 |
This chapter provides an overview of Oracle's content management
features. |
本章将概述 Oracle 的内容管理(content management)特性。 |
002 | This chapter contains the following topics: | 本章包含以下主题: |
003 |
Introduction to Content Management |
19.1 内容管理简介 |
004 |
Oracle Database includes datatypes to handle all the types of rich
Internet content such as relational data, object-relational data, XML,
text, audio, video, image, and spatial. These datatypes appear as
native
types in the database. They can all be queried using SQL. A single SQL
statement can include data belonging to any or all of these datatypes. |
Oracle 数据库中提供了多种数据类型(datatype)供用户存储 Internet 时代的各种信息,例如关系型数据(relational
data),对象-关系型数据(object-relational
data),XML,文本(text),音频(audio),视频(video),图像(image),及地理空间信息(spatial)等。这些数据类型在
Oracle 中均为原生数据类型(native type),且均可通过 SQL 进行查询。在一个 SQL 语句中可以同时包含上述的所有数据类型。 |
005 |
As applications evolve to encompass increasingly richer semantics, they
encounter the need to deal with the following kinds of data:
|
随着应用程序所处理的信息类型日趋丰富多样,其中可能会使用以下类型的数据:
|
006 |
Traditionally, the relational model has been very successful at dealing
with simple structured data -- the kind which can fit into simple
tables. Oracle added object-relational features so that applications can
deal with complex structured data --
collections,
references,
user-defined types and so on. Queuing technologies, such as Oracle
Streams Advanced Queuing, deal with messages and other semi-structured
data. This chapter discusses Oracle's technologies to support
unstructured data. |
在传统的应用程序中,关系型模型就可以很好地处理简单结构化数据,即适宜采用表进行存储的数据。Oracle
还增加了对象-关系特性供应用程序处理复杂结构化数据,如集合(collections),引用(reference),及用户定义类型(user-defined
type)等。Oracle Streams Advanced
Queuing(数据流高级队列)等队列技术用于处理消息(message)及其他半结构化数据。而本章主要讲述 Oracle
中用于支持非结构化数据的技术。 |
007 |
Unstructured data cannot be decomposed into standard components. Data
about an employee can be 'structured' into a name (probably a character
string), an identification (likely a number), a salary, and so on. But
if you are given a photo, you find that the data really consists of a
long stream of 0s and 1s. These 0s and 1s are used to switch pixels on
or off, so that you see the photo on a display, but it cannot be broken
down into any finer structure in terms of database storage. |
非结构化数据无法像结构化数据那样被分解为标准的子结构(standard
component)。以描述雇员的数据为例,她可以被结构化为名字(可以用字符串存储),身份证明(可用用数字存储),及薪水等子结构。而处理照片等非结构化数据时,数据本身只是由
0 和 1 所构成的长数据流。这些 0 和 1 表示像素的属性,用于显示图像,无法被继续分解为更底层的数据库存储结构。 |
008 |
Unstructured data such as text, graphic images,
still video clips,
full
motion video, and sound waveforms tend to be large -- a typical employee
record may be a few hundred bytes, but even small amounts of multimedia
data can be thousands of times larger. Some multimedia data may reside
on operating system files, and it is desirable to access them from the
database. |
从数据量上看,文本,图像,视频剪辑,电影,音频波形等非结构化数据所需的存储空间十分庞大。一条典型的雇员记录可能只需数百字节(byte),而一条普通的多媒体数据就可能数千倍于前者。从存储位置上看,有时多媒体数据会存储在操作系统的文件系统中,但应用系统需要通过数据库对这些
数据进行统一管理。 |
009 |
Overview of XML in Oracle |
19.2 Oracle 的 XML 特性概述 |
010 |
Extensible Markup Language (XML) is a tag-based markup language that
lets developers create their own tags to describe data that's exchanged
between applications and systems. XML is widely adopted as the common
language of information exchange between companies. It is
human-readable; that is, it is plain text. Because it is plain text, XML
documents and XML-based messages can be sent easily using common
protocols, such as HTTP or FTP. |
XML(Extensible Markup
Language,可扩展标记语言)是一种基于标签(tag-based)的标记语言,开发者可以根据需要创建标签,以便在不同应用程序和系统之间交换数据。XML
目前已经被业界作为一种通用的信息交换语言。XML 脚本采用纯文本格式,且为可以阅读的(human-readable)。由于 XML
采用纯文本格式保存,因此 XML 文档及基于 XML 的消息(XML-based message)可以通过 HTTP 或 FTP
等常见通讯协议传输。 |
011 |
Oracle XML DB treats XML as a native datatype in the database. Oracle
XML DB is not a separate server. The XML data model encompasses both
unstructured content and structured data. Applications can use standard
SQL and XML operators to generate complex XML documents from SQL queries
and to store XML documents. |
Oracle XML DB 将 XML 作为数据库的原生数据类型(native datatype)。Oracle XML DB
并非一个单独的服务器。XML 数据模型既可以存储非结构化内容,也可以存储结构化数据。应用程序可以使用标准 SQL 及 XML
操作符(operator)从 SQL 查询中生成复杂的 XML 文档并进行存储。 |
012 |
Oracle XML DB provides capabilities for both content-oriented and
data-oriented access. For developers who see XML as documents (news
stories, articles, and so on), Oracle XML DB provides an XML repository
accessible from standard protocols and SQL. |
Oracle XML DB
既支持免面向内容(content-oriented)的数据访问,也支持面向数据(data-oriented)的数据访问。开发者可以将 XML
数据作为内容处理(例如,新闻,文章等),Oracle XML DB 具有 XML 资料库(用于存储内容的描述信息),此资料库可以通过标准协议及
SQL 访问。 |
013 |
For others, the structured-data aspect of XML (invoices, addresses, and
so on) is more important. For these users, Oracle XML DB provides a
native XMLType, support for XML Schema, XPath, XSLT, DOM, and so on. The
data-oriented access is typically more query-intensive. |
开发者还可以使用 XML 处理结构化数据(例如,发票,地址等)。Oracle XML DB 具有原生的 XML 类型(native
XMLType),并支持 XML Schema(XML 模式),XPath,XSLT,DOM 等技术。查询量大的系统适合采用面向数据的数据访问。 |
014 |
The Oracle XML developer's kits (XDK) contain the basic building blocks
for reading, manipulating, transforming, and viewing XML documents,
whether on a file system or stored in a database. They are available for
Java, C, and C++. Unlike many shareware and trial XML components, the
production Oracle XDKs are fully supported and come with a commercial
redistribution license. Oracle XDKs consist of the following components:
|
Oracle XML developer's kits(XDK,Oracle XML
开发包)中含有用于读取(reading),操作(manipulating),转换(transforming),及展示(viewing)文件系统或数据库中的
XML 文档的基础开发组件。XDK 以 Java,C,及 C++ 实现。与其他共享版或试用版的 XML 组件不同,Oracle XDKs
能够提供全面的技术支持,且具有商业发布许可(commercial redistribution license)。Oracle XDKs
包括以下组件:
|
015 |
|
|
016 |
Overview of LOBs |
19.3 LOB 数据类型概述 |
017 |
The large object (LOB) datatypes BLOB,
CLOB, NCLOB,
and BFILE enable you to store and
manipulate large blocks of unstructured data (such as text, graphic
images, video clips, and sound waveforms) in binary or character format.
They provide efficient, random, piece-wise access to the data. |
大对象(LOB,large object)数据类型包括 BLOB,CLOB,NCLOB,及 BFILE
类型,供用户存储及操作二进制或字符格式的大型非结构化数据(unstructured
data)(例如文本,图形,视频及音频波形)。大对象类型存取效率较高,且支持随机(random)及按块(piece-wise)地访问数据。 |
018 |
With the growth of the internet and content-rich applications, it has
become imperative that databases support a datatype that fulfills the
following:
|
随着 internet 的飞速发展,以及应用程序所面对的数据类型日趋丰富,对于数据库所支持的数据类型提出了以下要求:
|
019 |
|
另见: |
020 |
Overview of Oracle Text |
19.4 Oracle Text 概述 |
021 |
Oracle Text indexes any document or textual content to add fast,
accurate retrieval of information to internet content management
applications, e-Business catalogs, news services, job postings, and so
on. It can index content stored in file systems, databases, or on the
Web. |
Oracle Text 组件能够对文档及文本信息进行索引,实现快速准确的信息获取。此组件可用于 internet
内容管理应用系统(content management application),电子商务编目系统(e-Business
catalog),新闻服务,及招聘系统等。Oracle Text 组件能够将索引数据存储在文件系统,数据库,或 Web 上。 |
022 |
Oracle Text allows text searches to be combined with regular database
searches in a single SQL statement. It can find documents based on their
textual content, metadata, or attributes. The Oracle Text SQL API makes
it simple and intuitive to create and maintain Text indexes and run Text
searches. |
Oracle Text 组件能够将文本搜索与常规数据库查询合并到同一 SQL 语句中。Oracle Text
组件可以依据文档的内容,元数据,或各种属性进行搜索。用户可以使用 Oracle Text SQL API 创建并维护文本索引(Text
index),或进行文本搜索(text search)。 |
023 |
Oracle Text is completely integrated with the Oracle database, making it
inherently fast and scalable. The Text index is in the database, and
Text queries are run in the Oracle process. The Oracle optimizer can
choose the best execution plan for any query, giving the best
performance for ad hoc queries involving Text and structured criteria.
Additional advantages include the following:
|
Oracle Text 组件与 Oracle 数据库是集成在一起的,因此同样具备高性能及高可伸缩性。文本索引存储在数据库中,而文本查询(Text
query)运行于 Oracle 进程(process)中。Oracle
优化器(optimizer)能够为各类查询选择最优的执行计划(execution plan),从而确保混合了文本条件及结构化条件的即席查询(ad
hoc)的性能。Oracle Text 组件还具备以下特点:
|
024 |
Oracle Text Index Types |
19.4.1 Oracle Text 索引类型 |
025 |
There are three Text index types to cover all text search needs.
|
为满足各种文本搜索(text search)的需要,Oracle Text 组件提供了三种文本索引类型(Text index type):
|
026 |
Oracle Text also provides substring and prefix indexes. Substring
indexing improves performance for left-truncated or double-truncated
wildcard queries. Prefix indexing improves performance for right
truncated wildcard queries. |
Oracle Text 组件还支持子字符串索引(substring index)及前缀索引(prefix
index)。子字符串索引能够提高左通配符(left-truncated wildcard)或两侧通配符(ouble-truncated
wildcard)查询的性能;而前缀索引能够提高左通配符(right-truncated wildcard)查询的性能。 |
027 |
Oracle Text Document Services |
19.4.2 Oracle Text 文档服务 |
028 |
Oracle Text provides a number of utilities to view text, no matter how
that text is stored.
|
Oracle Text 组件提供了多种浏览文档的工具,广泛支持各种文档存储格式。
|
029 |
Oracle Text Query Package |
19.4.3 Oracle Text 查询包 |
030 |
The CTX_QUERY PL/SQL package can be
used to generate query feedback, count hits, and create stored query
expressions. |
CTX_QUERY PL/SQL 包用于生成查询反馈结果(query
feedback),出现频次(count hits),还可以将查询表达式(query expression)存储到数据库中。 |
031 |
|
另见: |
032 |
Oracle Text Advanced Features |
19.4.4 Oracle Text 高级特性 |
033 |
With Oracle Text, you can find, classify, and cluster documents based on
their text, metadata, or attributes. |
通过 Oracle Text
组件,用户可以根据文档的文本,元数据,或其他属性对文档进行查询,分类(classify),及聚类(cluster)操作。 |
034 |
Document classification performs an action based on document content.
Actions can be assigned category IDs to a document for future lookup or
for sending a document to a user. The result is a set, or stream, of
categorized documents. For example, assume that there is an incoming
stream of news articles. You can define a rule to represent the category
of Finance. The rule is essentially one or more queries that select
documents about the subject of finance. The rule might have the form
'stocks or bonds or earnings.' When a document arrives that satisfies
the rules for this category, the application takes an action, such as
tagging the document as Finance or e-mailing one or more users. |
文档分类(document classification)是指基于文档的内容执行某种动作。此动作可以是赋予文档一个分类 ID(category
ID),以方便今后的查找,也可以是将文档传送给某个用户。分类操作的结果是一组经过分类的文档。例如,现在有一系列新闻文章。用户可以定义一个规则(rule)表示金融类文章。此规则实质上是以金融为主题对文档进行的一个或多个查询。例如,规则可能是关于股票,债券,或收入的查询。如果文档满足某个分类的规则,应用程序将会执行某种操作,例如将文档标记为金融类,或将其发送给某个用户。 |
035 |
Clustering is the unsupervised division of patterns into groups. The
interface lets users select the appropriate clustering algorithm. Each
cluster contains a subset of documents of the collection. A document
within a cluster is believed to be more similar with documents inside
the cluster than with outside documents. Clusters can be used to build
features like presenting similar documents in the collection. |
聚类是一种基于模式(pattern)的无监督(unsupervised)分类方法。用户可以选择适当的聚类算法(clustering
algorithm)。每个聚类包含文档集(documents of the
collection)的一个子集。同一聚类内的文档相比其他聚类内的文档来说具有更大的相似性。聚类的作用是从文档集中找出相似性较高的文档。 |
036 |
|
另见: |
037 |
Overview of Oracle Ultra Search |
19.5 Oracle Ultra Search 概述 |
038 |
Oracle Ultra Search is built on the Oracle database server and Oracle
Text technology that provides uniform search-and-locate capabilities
over multiple repositories: Oracle databases, other ODBC compliant
databases, IMAP mail servers, HTML documents served up by a Web server,
files on disk, and more. |
Oracle Ultra Search 组件基于 Oracle 数据库及 Oracle Text
技术,提供了针对多种资源的统一的查询及定位能力,其可搜索的资源包括 Oracle 数据库,其他 ODBC 兼容数据库,IMAP
邮件服务器,Web 服务器中的 HTML 文档,及磁盘中的文件等。 |
039 |
Ultra Search uses a 'crawler' to index documents; the documents stay in
their own repositories, and the crawled information is used to build an
index that stays within your firewall in a designated Oracle database.
Ultra Search also provides APIs for building content management
solutions. |
Ultra Search
组件使用爬虫(crawler)对文档进行索引。被索引的文档依旧存储在原来的位置,索引依据爬虫获取的信息建立,可以存储在防火墙内用户指定的
Oracle 数据库中。Ultra Search 组件还提供了构建内容管理系统所需的 API。 |
040 |
Ultra Search offers the following:
|
Ultra Search 组件还具有以下特性:
|
041 |
|
另见: |
042 |
Overview of Oracle interMedia |
19.6 Oracle interMedia 概述 |
043 |
Oracle interMedia ("interMedia") is a feature that enables Oracle
Database to store, manage, and retrieve images, audio, and video data in
an integrated fashion with other enterprise information. Oracle
interMedia extends Oracle Database reliability, availability, and data
management to media content in traditional, Internet, electronic
commerce, and media-rich applications. |
Oracle interMedia 组件使 Oracle
数据库可以存储,管理,并查询图像,音频,及视频数据,且提供了与其他类型数据相集成的操作接口。Oracle interMedia 组件扩展了
Oracle 数据库在传统应用系统,Internet 系统,电子商务系统,及多媒体系统中的可靠性,可用性,及对多媒体数据的管理能力。 |
044 |
interMedia manages media content by providing the following:
|
interMedia 组件提供以下功能供用户管理多媒体数据:
|
045 |
interMedia provides media content services to Oracle JDeveloper 10g,
Oracle Content Management SDK, Oracle Application Server Portal, Oracle
applications, and Oracle partners. |
interMedia 组件为 Oracle JDeveloper 10g,Oracle Content Management
SDK,Oracle Application Server Portal,各种 Oracle 应用系统,及 Oracle
合作伙伴的第三方应用系统提供了多媒体数据操作接口。 |
046 |
|
|
047 |
Overview of Oracle Spatial |
19.7 Oracle Spatial 概述 |
048 |
Oracle Spatial is designed to make spatial data management easier and
more natural to users of location-enabled applications and geographic
information system (GIS) applications. When spatial data is stored in an
Oracle database, it can be easily manipulated, retrieved, and related to
all other data stored in the database. |
Oracle Spatial 组件能够帮助位置应用系统(location-enabled
application)或地理信息系统(geographic information
system,GIS)的用户以更简单更自然的方式管理空间数据(spatial data)。把空间数据存储于 Oracle
数据库内,便于空间数据与数据库内的其他数据一同操作,一同查询,或进行关联。 |
049 |
A common example of spatial data can be seen in a road map. A road map
is a two-dimensional object that contains points, lines, and polygons
that can represent cities, roads, and political boundaries such as
states or provinces. A road map is a visualization of geographic
information. The location of cities, roads, and political boundaries
that exist on the surface of the Earth are projected onto a
two-dimensional display or piece of paper, preserving the relative
positions and relative distances of the rendered objects. |
道路图(road
map)是一种常见的空间数据。道路图是一种二维对象,其中包含点,线,多边形以代表城市,道路,及省或州的行政区划边界。道路图是可视化的地理信息。实际的城市,道路,及行政区划边界被投影为二维的图形,保留了原来的相对位置及相对距离,并可进行显示或打印。 |
050 |
The data that indicates the Earth location (such as longitude and
latitude) of these rendered objects is the spatial data. When the map is
rendered, this spatial data is used to project the locations of the
objects on a two-dimensional piece of paper. A GIS is often used to
store, retrieve, and render this Earth-relative spatial data. |
用于表示空间对象地标(例如经度和纬度)的数据就是空间数据。在生成可视化道路图时,需要利用空间数据将空间对象的位置投影到二维的平面上。GIS
系统通常用于存储,查询,及绘制空间信息。 |
051 |
Types of spatial data (other than GIS data) that can be stored using
Spatial include data from computer-aided design (CAD) and computer-aided
manufacturing (CAM) systems. Instead of operating on objects on a
geographic scale, CAD/CAM systems work on a smaller scale, such as for
an automobile engine or printed circuit boards. |
Oracle Spatial 组件还可以存储另一类空间数据(不是 GIS 数据),主要用于 CAD(computer-aided
design)及 CAM(computer-aided manufacturing)系统。CAD/CAM 系统与 GIS
系统相比,其中所管理的对象尺度更小,例如汽车引擎或印刷电路板等。 |
052 |
The differences among these systems are in the size and precision of the
data, not the data's complexity. The systems might all involve the same
number of data points. On a geographic scale, the location of a bridge
can vary by a few tenths of an inch without causing any noticeable
problems to the road builders, whereas if the diameter of an engine's
pistons is off by a few tenths of an inch, the engine will not run. |
CAD/CAM 系统与 GIS
系统所处理的数据的区别在于其规模与精度,而非数据的复杂性。两类系统需要处理的数据量可能相同。对于地理信息,一座桥梁的位置信息偏离十分之几英寸不会给施工带来显著问题。而对于
CAD/CAM 信息,如果引擎活塞的直径有十分之几英寸的偏差,引擎肯定无法工作。 |
053 |
In addition, the complexity of data is independent of the absolute scale
of the area being represented. For example, a printed circuit board is
likely to have many thousands of objects etched on its surface,
containing in its small area information that may be more complex than
the details shown on a road builder's blueprints. |
此外,数据的复杂性与空间对象的绝对规模无关。例如,一块印刷电路板上可能会有上千个零件。尽管一块印刷电路板的绝对面积较小,但其复杂程度可能更甚于一张道路施工蓝图。 |
054 |
These applications all store, retrieve, update, or query some collection
of features that have both nonspatial and spatial attributes. Examples
of nonspatial attributes are name, soil_type, landuse_classification,
and part_number. The spatial attribute is a coordinate geometry, or
vector-based representation of the shape of the feature. |
上述两种系统都需要同时对空间数据及非空间数据进行存储,查询,及更新。例如,与空间数据相关的非空间属性包括名称,土壤类型,土地使用类型,零件编号等。而空间数据则为空间对象的几何坐标或向量表示。 |
055 |
Oracle Spatial provides a SQL schema and functions that facilitate the
storage, retrieval, update, and query of collections of spatial features
in an Oracle database. Spatial consists of the following:
|
Oracle Spatial 组件具有一个模式(schema)及一组函数,用于对 Oracle 数据库内的空间数据库进行存储,查询及更新。Oracle Spatial
组件包含以下特性:
|
056 |
|
另见: |
[008] still video clips [008] full motion video [023] count hits [025] routing applications [028] Inso filtering technology [034] earnings [040] concept searching [040] theme analysis [048] location-enabled applications [055] raster image [055] gridded data |
[004] native types [006] collections [006] references |
[023] Oracle Text can automatically create
XML sections for you. [034] Actions can be assigned category IDs to a document for future lookup or for sending a document to a user. [055] Network data model for representing capabilities or objects that are modeled as nodes and links in a network. |
[019] Overview of LOB Datatypes |
1、replication 是 scalability 特性么? [023] Oracle Text leverages scalability features, such as replication. |